I use this blog to store the notes I have collected throughout my time learning computer science. Feel free to browse and comment
ER Diagrams are for concepts, it is used for abstract grouping of entities and relationships Relational Diagram is for logical modeling, tends to be drawn when the ER Diagram is already fleshed out.
Weak entity are those that depends on a foreign entity to identiy itself. AKA child entity that depends on the parent entity
Strong entity, those that have their own primary key
Pertains to the ISA relation. Covering Constraint - Does every Employee have to be a contractor, or temp? Overlap Constraint - Can contractor or temp overlap? ie. Can the employee be both
This is a physical storage of data, we will be installing relational databases, which is a part of DBMS (database management system)
The purpose of a DB is to faciliate access/ storing data. key things to note is to protect data from concurrent users, crash recoveries, etc.
-a schema is a way to describe data
INSERT INTO tableName(a, b, c, d)
VALUES (1,2,3,4)
CREATE VIEW ViewName
(SELECT ...)
DROP VIEW
LIKE (case insensitive)
UNION - joins two select queries, note fields have to be the same
INTERSECT -
EXCEPT
Relational Algebra’s Set division can be translated to SQL like so
e.g. Find sailors who’ve reserved all boats
SELECT S.sname
FROM Sailors S
WHERE NOT EXISTS
((SELECT B.bid
FROM Boats B)
EXCEPT
(SELECT R.bid
FROM Reserves R
WHERE R.sid=S.sid))
Find Entity that have only 1 attribute e.g. Find the snames of suppliers who supply only red parts.
Select S.sid from Suppliers S
where ‘Red’ = ALL (select Parts.color
from Catalog, Parts
where Catalog.sid = S.sid and Parts.pid = Catalog.pid )
a = Any {a,3} True
a = All {a,3} False
a = Any {} False
a = All {} True
Operates on tuple sets, keywords include
the HAVING keyword acts as a filter to the aggregate clause.
can be interpreted as
Rule For Null:
TRUE AND UNKNOWN = UNKNOWN
FALSE AND UNKNOWN = FALSE
FALSE OR UNKNOWN = UNKNOWN
NOT UNKNOWN = UNKNOWN
SELECT *
FROM t1
WHERE col_1 > 5
OR col_1 < 5
OR col_1 = 5
Any record where col_1 has NULL will not be returned in the above query
SQL outer joins include NULL values
SELECT *
FROM a LEFT OUTER JOIN b
ON a.id = b.id
LEFT OUTER JOIN would show dangling tuples from left input table. (null is filled for all attribute of right input)
RIGHT OUTER JOIN does the same but from right table’s perspective
FULL OUTER JOIN shows dangling tuples from both input table
-- ex1
GRANT SELECT, INSERT, DELETE
ON Reserves
TO Yuppy
WITH GRANT OPTION
-- ex2
GRANT UPDATE(rating) ON Sailors TO
Leah
user yuppy can now select, insert, and delete, and also grant other people these privileges.
user leah can now update ratings, but can’t gran this privilege to other.
use view when you want more control on whether the user can read, write, execute depending on authorization level.
View is a relation that is stored for use later.
Types of IC include:
– can also name the check CREATE TABLE … CONSTRAINT noFoo CHECK(‘Foo’ <> ANY(SELECT …))
- Tuple-Based Check (defined in declaration of table, activate on insertion to corrosponding table or update of tuple)
~~~ sql
CREATE TABLE ...
... CHECK(colA >= colB)
CREATE ASSERTION notTooManyReservations
CHECK (10 > ALL (SELECT COUNT(*)
FROM Reserves
GROUP BY sid))
Checks and Assertions are not well supported in SQL.
Check is not in SQL SERVER
Assertion is not supported in postgresSQL
procedures that starts automatically if specificed change occurs in the DBMS.
has
CREATE TRIGGER youngSailorUpdate
AFTER INSERT ON SAILORS /* Event */
REFERENCING NEW TABLE NewSailors
FOR EACH STATEMENT
INSERT /* Action */
INTO YoungSailors(sid, name, age, rating)
SELECT sid, name, age, rating
FROM NewSailors N
WHERE N.age <= 18;
-- Referencing include
NEW TABLE
OLD TABLE
OLD ROW
NEW ROW
--- Use begin and end to include multiple SQL Statements
the above inserts young sailors into separate table.
Triggers can be activated in one SQL statement, but with arbitrary order, trigger can activate other triggers.
trigger is more general than constraints, it can be used to monitor integrity constraints, construct a log, gather database stats. etc.
we can connect our application to a database through librarys with database calls (API).
We would pass SQL string to the API, must include a statement to connect to the right database.
Connecting to Database Flow Chart
Since we don’t know before hand how many records we fetch (a priori) SQL uses a mechanism call cursor to handle row selection
SQL Standard, Stored Procedure PSM Term is Cursor Java Term is ResultSet Visual Studio Term is DataReader
written in general purpose programming language (Persistent stored modules PSM), and executed within the DBS. This extends SQL by basic concepts of a general purpose programming language.
Since SQL is a declarative language while C and other language are procedural. there is different approach as to how data is handle, leading to unnecessary effort. Problem is trying to fit object oriented program into relational database.
-- Sample create procedure
CREATE FUNCTION rateSailor (@sailorId INT)
RETURNS INT
AS
BEGIN
DECLARE @numRes INT
DECLARE @rating INT
SET @numRes = (SELECT COUNT(*)
FROM Reserves R
WHERE R.sid = @sailorId)
IF @numRes > 10
SET @rating = 1
ELSE
SET @rating = 0
RETURN @rating
END
GO;
SELECT dbo.rateSailor(22); go
-- Can call a function
EXEC SQL CALL functionName(@sid, @rating)
3 tier architecture , each tier can be independently maintained.
only does graphic user interface. the business logic, data management depends on the server
implements both graphic and business logic. while server implements data management
Downside -> server has to trust client, has scalability issue. hard to update all clients.
Client receives reply, makes new request
GET ~/index.html HTTP/1.1 – Request Line User-agent: Mozilla/4.0 – Type of client Accept: text/html, image/gif, image/jpeg
HTTP is stateless, every message is self contained. fires and forget. the advantage of this is don’t need memory management. good for static information
Disadvantage is, no shopping baskets, user logins, dynamic content, and security is harder to implement.
In a typical website:
<FORM ACTION=“page.jsp” METHOD=“GET”
NAME=“LoginForm”>
…
</FORM>
in the above, Action is the specific URI that handles the content Method -> HTTP GET or POST method NAME -> name of the form.
adds functionality to the presentation tier.
XML is designed for used by applications, and work across different applications. it uses a digraph to handle the database. the leaf nodes represent the attribute data of some atomic type, while interior nodes represent complex objects.
XML is ordered, S.S.D is not
XML can mix text and element, has attributes, entities, processing instruction, comments, SSD does not.
Labels indicate the relationship between two nodes.
the graph does not need to be a tree structure, but is usualy acyclic.
Object Exchange Model (OEM) -separates complex object by building them from atomic object
<BOOK genre= 'Science' format = 'Hardcover'>..data...</BOOK>
BOOK is the tag element , genre is the attribute, ‘science’ is attribute value
there can be zero or more attribute to every element.
Structure RD uses table, XML is tree/ hiearchal graph Schema RD has fixed schema, XML is flexible, self describing Queries RD has SQL, XML has XPath, which is more complicated. Ordering RD has no order, XML order is implied Implementation RD has native implementation, XML is add on.
is a query language for describing XML, uses tags to traverse through XML tree , and return series of qualifying items.
path expression starts with '/'
/tag1/tag2 return
this returns the set of tags that has /tag1/tag2
attributes - use @id to filter by tag id
<bib>
<book bookID = 'b100'></book>
</bib>
/bib/book/@bookID
returns sequence 'b100'...
/bib/*[publisher = 'McGraw']
returns tags that have values Mcgraw
use * to indicate wildcard
// is for descendants.
note, /bib/*/title and /bib//title is the same
/bib/*[publisher = 'McGraw']
/bib/book[2]/author[1] second paper, first author.
/bib/(paper | book) find title of each element that is a paper or a book.
Exam 3 Prep
DBAs need to know how to configure a database for faster queries. Two issues
Options include
Disks - can retrieve random pages at fixed cost, reading several consecutive pages is much cheaper than reading them in random order.
Tapes can read only in sequence
terminologies: B = # of data pages
R = number of records per page
D = average time to read and write (this time tends to dominate H, D average is 15 msec, H is 100 nano seconds.)
C = average time to process a record
H = time to apply a hash function.
Heap - suitable if typical access is file scan retrieving all records (unsorted)
Need to scan for most operations (costly), insert is fast, just insert at end of file.
Sorted Files - suitable for retrieve in some order, or only a range of records is needed (note can only be sorted by one field)
cost of binary search is
Index - DS organize record via trees or hashing.
Hash - constant cost for hash index Tree - log(height of the tree)
Unclustered index #pages = # matching records
Clustered Index - # matching pages.
Scan -> fetch all records in the file , the pages fetch from disk to the buffer pool
Search With Equality Selection -> ‘Find student records for students with ID ‘ = 23
Insert -> identify the page in which record must be inserted, fetch that page from disk, modify it to include new record, write back to modified page.
Delete -> must identify the page that has the record, fetch from disk, modify it, write it back.
Pro- Are optional structure associated with a table which may improve the performance of a query. Indices increase the speed of join (because foreign keys can be indexed too)
primary vs. secondary - search key contains primary key, then called primary index.
Con- Indices needs to be updated everytime there is an update.
the data and the index is together, think phone book, which is organized by ascending phone numbers, usually the primary key of a table is in clustered
we can only have one clusterd index, b/c the table cannot be sorted by multiple columns.
index + sorted file is the best. Called clustered index.
this means data is not where the record is. think textbook, with a index at the back. The two are separate, the index record points to the data entry, relevant section in the textbook.
we can have multiple non clustered indices, since they are separated objects.
data entries are much smaller than data records.so better than alternative 1.
While alternative 3 is more compact than 2, it is variable sized data entries (dont know how long it can get, if the search key is common occurence.)
(to build a clustered index, you would need to sort the heap file and also keep index sorted)
--Alternative 2 looks like
color, location
red, 1
red, 3
red, 2
blue,6
blue, 4
blue, 5
--Alternative 3 looks like
red 1,2,3
blue 4,5,6
Used for equality search. index is a collection of buckets, and buckets are primary page plus zero or more overflow page.
If alternative 1 is used, the buckets contain the data record.
with alternative 2, bucket contains <key,rid> pairs
with alternative 3, bucket contains <key, rid list> pairs
Assumes 80% page ocupancy
leaf pages contain the data enry and are chained that means they can visit previous and next.
non-leaf pages contain index entries, they direct searches.
Assumes 67% page occupancy
the conjuncts that the index matches as the primary conjuncts in the selection.
hash index matches a selection condition containing no disjunctions if there is a term of the form *attribute = value * for each attribute in the index search key
tree index matches a selection condition containing no disjunctions if there is a term of the form attribute op value for each attribute in a prefix of the index’s search key.
Is a type of query optimizer, works well for <10 joins.
key features,
SELECT s.sname
FROM Reserves R, Sailors S
WHERE R.sid = S.sid
AND R.bid = 100 AND S.rating > 5
the above can be broken in the form of RA expression tree. can be broken down in [insert link for _images/RA_ExpressionTree.png]
To obtain the evaluation plan, we need to implement each relational algebra operation. for example, a JOIN between Reserve and Sailor is a page oriented simple nested loop. The projection and selection commands are ‘on the fly’ generated.
In a RA expression tree, each internal node are relational algebra operator, and each leaf is a table.
Key Takeaway - Pushing Selections since JOIN operations are expensive, it is better to apply selections early, so that the size of the tables been joined is reduced.