Calculate the amount of blocks to store primary index - database

I have the following question that I cannot get my grip around for an exam and my teacher is unable to give the answer.
The Book relation has three fields: ISBN: char(11), title:char(26) and author: char(22). The data of the Book relation consists of 2574883 records and is organised on an unspanned slotted page array which contains a header (11B) and slot references of 3B each. Furthermore, each slotted page has a capacity of 16KB.
Assume that a primary index on ISBN exists for the Book relation with block anchors of 9 Bytes each and the same underlying unspanned organisation of slotted pages. Insert in the field below the number of blocks that are needed to store the primary index.

Related

How does cardinality work in an ER diagram when considering the dimension of time?

I will going to explain my question using the fragment of a problem in which there are two entities:
🔸 Airplane
🔸 Location
And a relationship to link these entities:
🔸 Send
Logic 1:
An airplane send minimum 1 location and at the most it sends many locations (in different moments), therefore, the cardinality is one to many (1,N).
Airplane ——— (1,N) ——— Send ——— (1,N) ——— Location
Logic 2:
An airplane send minimum 1 location but it CAN’T send many locations at the same time, therefore, the minimum is 1 and also the maximum is 1, so the cardinality is one to one (1,1).
Airplane ——— (1,N) ——— Send ——— (1,1) ——— Location
Not only in the ER, but also in a database. Which of these logics is correct?
Many-to-Many
Your business problem may not be clear, so let's look at the canonical example of books and authors. One book can have multiple authors, and each author can potentially contribute to multiple books. So we have a classic Many-to-Many.
A relational database does not handle a many-to-many relationship directly. To do so we add a third table bridging the original two. Naming this third table can be something of a puzzle as it often represents a not-so-concrete business relationship. In this case authorship is appropriate.
Books & Authors
[book]-1-----0-1-M-[authorship]-M-1-0------1-[author]
Tables:
book
pkey (primary key, the unique identifier of each book)
title
planned_publish_date
authorship
pkey (optional, as some folks use the other two columns in a combined key as the primary key for this table)
fkey_book (holds the primary key value of the book on which this author is contributing)
fkey_author (holds the primary key value of author who is contributing to this book)
author
pkey (primary key, unique identifier of each author)
name
phone_number
The cardinality here is:
A book can have any number of authorship rows related: cardinality of 0, 1, M (M meaning more than one).
A planned book can have zero rows in authorship because it is not yet associated with any author. Later, when an author is recruited, we add a row to authorship.
A book with a solo author has one authorship row, linking to the one author.
A book with a pair of authors will have two authorship rows, each with a foreign key linking to the author row.
Ditto for the author-authorship relationship: cardinality of 0, 1, M.
An author who has been recruited but not yet committed to any book will have a row in author table but no rows in authorship.
An author who has worked on only a single book will have one row in authorship.
An prolific author will have many rows in authorship, one row for each book on which he/she has contributed.
An authorship row must be assigned to a book AND assigned to an author: cardinality of 1 with book, and 1 with author.
We do not allow any authorship rows to be “orphaned”, to use the parent-child language some folks like me use in describing table relationships. In other words, on every authorship row, the pkey_book field must have a valid value, and the pkey_author field must have a valid value.
Time
Adding the dimension of time is something of a tricky problem.
One example… To track the period of time when each author's contract for a book starts and stops, we would add pair of DATE columns on the authorship table, titled contract_start & contract_stop.
authorship
pkey (primary key for this table)
fkey_book (holds the primary key value of the book on which this author is contributing)
fkey_author (holds the primary key value of author who is contributing to this book)
contract_start (of type DATE)
contract_stop
A query for contracts currently in effect would compare today’s date as being greater-than-or-equal to contract_start AND less-than contract_stop. Then do a join to get title of book and name of author.
Another example… If our publishing company has a business policy that an author must focus on a single book at a time, and wants the database to enforce that an author's contract cannot overlap… well, that is another problem. I'll not address it for a few reasons, one of which is I do not know if your Question has this issue or not.
Plane-Flight-Location
As for your airplane problem, I am guessing that by Send you mean a flight. If so, I would name the table flight for clarity. By location, I suppose you mean airport. Again, I would name the table airport for clarity.
If this is what you meant, then you have the very same cardinality as discussed above.
A plane joining the fleet may have never flown yet, has flown once, or has flown to many locations. So, 0-1-M.
An airport may become known to our system before we have flown any planes there, so zero flight rows. Later, one or more flight rows as we schedule one or more planes to that airport. So, 0-1-M.
The flight table has columns for date-time of departure and for duration.
flight
pkey (primary key for this table)
fkey_plane (holds the primary key value of the plane to be flown to this airport at this date-time)
fkey_airport (holds the primary key value of the airport to which this plane is departing at this date-time.)
departure (when this flight takes off, of type TIMESTAMP WITH TIME ZONE)
duration (the length of this flight, a number of minutes).
[plane]-1-----0-1-M-[flight]-M-1-0------1-[airport]
Your business rules may vary. If you plan a flight but have not yet assigned an airplane, then we can have a flight row without an assigned airplane. So the cardinality of 1 changes to 0 or 1. But not M, as one particular flight never involves multiple planes. In the case of this business rule, a flight row can be orphaned, lacking a plane parent until eventually when a particular plane is assigned.
[plane]-1-0-----0-1-M-[flight]-M-1-0------1-[airport]

Fact Table With Non-Measure Data

In the model below, description is a free text field that describes why a employee was absent.
Can this description field be in the fact table and considered a degenerate dimension?
The value will mostly be used in listing reports or for dashboards where word clouds are used.
Your design is correct. There is nothing wrong with including free text as a degenerate dimension into a fact table.
Storing comments in a dimension makes sense only if comments are structured (i,e, if they are standardized and effectively have 1:M relations with the fact records). If they are stored as free text, and thus have 1:1 relations with the facts, then converting them into a dimension is a big mistake - you will end up with a dimension as tall as the fact table. In proper designs, dimensions are wide and short, while fact tables are narrow and tall. Tall dimensions are a problem, because they are very expensive in terms of performance.
They are also hard to use. Let's say, you are using a reporting tool such as PowerBI. If you store your free text as a degenerate dimension in a fact table, it's easy and intuitive to use - I can write something like:
Reason for Absence = SELECTEDVALUE( Fact[Description])
and the comment will be properly displayed in a report. Done.
But if you store the same comments in a dimension, well, good luck figuring out how to add them to the report.
Page 65 of The Data Warehouse Toolkit 3rd edition says the following:
Text Comments Dimension: Rather than treating freeform comments as textual metrics in a fact table, they should be stored outside the fact table in a separate comments dimension (or as attributes in a dimension with one row per transaction if the comments’ cardinality matches the number of unique transactions) with a corresponding foreign key in the fact table.
Kimball, Ralph; Ross, Margy. The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling (p. 65). Wiley. Kindle Edition.
On page 47 there is this example of a degenerate dimension:
For example, when an invoice has multiple line items, the line item
fact rows inherit all the descriptive dimension foreign keys of the
invoice, and the invoice is left with no unique content. But the
invoice number remains a valid dimension key for fact tables at the
line item level.
Kimball, Ralph; Ross, Margy. The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling (p. 47). Wiley. Kindle Edition.
No, descriptive text columns should not be included in fact tables. Instead, this column should be included in a dimension.
If you are looking to report on tags (key words) I would create a dimension for these tags and parse the description to find the appropriate tag to associate with the fact. For example, I see 2 tags from the descriptions (funeral and sick). I would create a dimension DimAbsentReason to contain these tags.
If you need to keep the actual description, then you could create a dimension (DimAbsentReason) for the description and make the appropriate association to the fact table.

ER diagram that implements a database for trainee

I edited and remade the ERD. I have a few more questions.
I included participation constraints(between trainee and tutor), cardinality constraints(M means many), weak entities (double line rectangles), weak relationships(double line diamonds), composed attributes, derived attributes (white space with lines circle), and primary keys.
Questions:
Apparently to reduce redundant attributes I should only keep primary keys and descriptive attributes and the other attributes I will remove for simplicity reasons. Which attributes would be redundant in this case? I am thinking start_date, end_date, phone number, and address but that depends on the entity set right? For example the attribute address would be removed from Trainee because we don't really need it?
For the part: "For each trainee we like to store (if any) also previous companies (employers) where they worked, periods of employment: start date and end date."
Isn't "periods of employment: start date, end date" a composed attribute? because the dates are shown with the symbol ":" Also I believe I didn't make an attribute for "where they worked" which is location?
Also how is it possible to show previous companies (employers) when we already have an attribute employers and different start date? Because if you look at the Question Information it states start_date for employer twice and the second time it says start_date and end_date.
I labeled many attributes as primary keys but how am I able to distinguish from derived attribute, primary key, and which attribute would be redundant?
Is there a multivalued attribute in this ERD? Would salary and job held be a multivalued attribute because a employer has many salaries and jobs.
I believe I did the participation constraints (there is one) and cardinality constraints correctly. But there are sentences where for example "An instructor teaches at least a course. Each course is taught by only one instructor"; how can I write the cardinality constraint for this when I don't have a relationship between course and instructor?
Do my relationship names make sense because all I see is "has" maybe I am not correctly naming the actions of the relationships? Also I believe schedules depend on the actual entity so they are weak entities.... so does that make course entity set also a weak entity (I did not label it as weak here)?
For the company address I put a composed attribute, street num, street address, city... would that be correct? Also would street num and street address be primary keys?
Also I added the final mark attribute to courses and course_schedule is this in the right entity set? The statement for this attribute is "Each trainee identified by: unique code, social security number, name, address, a unique telephone number, the courses attended and the final mark for each course."
For this part: "We store in the database all classrooms available on the site" do i make a composed attribute that contains site information?
Question Information:
A trainee may be self-employed or employee in a company
Each trainee identified by:
unique code, social security number, name, address, a unique
telephone number, the courses attended and the final mark for each course.
If the trainee is an employee in a company: store the current company (employer), start date.
For each trainee we like to store (if any) also previous companies (employers) where they worked, periods of employment: start date and end date.
If a trainee is self-employed: store the area of expertise, and title.
For a trainee that works for a company: we store the salary and job
For each company (employer): name (unique), the address, a unique telephone number.
We store in the database all known companies in the
city.
We need also to represent the courses that each trainee is attending.
Each course has a unique code and a title.
For each course we have to store: the classrooms, dates, and times (start time, and duration in minutes) the course is held.
A classroom is characterized by a building name and a room number and the maximum places’ number.
A course is given in at least a classroom, and may be scheduled in many classrooms.
We store in the database all classrooms
available on the site.
We store in the database all courses given at least once in the company.
For each instructor we will store: the social security number, name, and birth date.
An instructor teaches at least a course.
Each course is taught by only one instructor.
All the instructors’ telephone numbers must also be stored (each instructor has at least a telephone number).
A trainee can be a tutor for one or many trainees for a specific
period of time (start date and end date).
For a trainee it is not mandatory to be a tutor, but it is mandatory to have a tutor
The attribute ‘Code’ will be your PK because it’s only use seems to be that of a Unique Identifier.
The relationship ‘is’ will work but having a reference to two tables like that can get messy. Also you have the reference to "Employers" in the Trainee table which is not good practice. They should really be combined. See my helpful hints section to see how to clean that up.
Company looks like the complete table of Companies in the area as your details suggest. This would mean table is fairly static and used as a reference in your other tables. This means that the attribute ‘employer’ in Employed would simply be a Foreign Key reference to the PK of a specific company in Company. You should draw a relationship between those two.
It seems as though when an employee is ‘employed’ they are either an Employee of a company or self-employed.
The address field in Company will be a unique address your current city, yes, as the question states the table is a complete list of companies in the city. However because this is a unique attribute you must have specifics like street address because simply adding the city name will mean all companies will have the same address which is forbidden in an unique field.
Some other helpful hints:
Stay away from adding fields with plurals on them to your diagram. When you have a plural field it often means you need a separate table with a Foreign Key reference to that table. For example in your Table Trainee, you have ‘Employers’. That should be a Employer table with a foreign key reference to the Trainee Code attribute. In the Employer Table you can combine the Self-employed and Employed tables so that there is a single reference from Trainee to Employer.
ERD Link http://www.imagesup.net/?di=1014217878605. Here's a quick ERD I created for you. Note the use of linker tables to prevent Many to Many relationships in the table. It's important to note there are several ways to solve this schema problem but this is just as I saw your problem laid out. The design is intended to help with normalization of the db. That is prevent redundant data in the DB. Hope this helps. Let me know if you need more clarification on the design I provided. It should be fairly self explanatory when comparing your design parameters to it.
Follow Up Questions:
If you are looking to reduce attributes that might be arbitrary perhaps phone_number and address may be ones to eliminate, but start and end dates are good for sorting and archival reasons when determining whether an entry is current or a past record.
Yes, periods_of_employment does not need to be stored as you can derive that information with start and end dates. Where they worked I believe is just meant to say previous employers, so no location but instead it’s meant that you should be able to get a list all the employers the trainee has had. You can get that with the current schema if you query the employer table for all records where trainee code equals requested trainee and sort by start date. The reason it states start_date twice is to let you know that for all ‘previous’ employers the record will have a start and end date. Hence the previous. However, for current employers the employment hasn't ended which means there will be no end_date so it will null. That’s what the problem was stating in my opinion.
To keep it simple PK’s are unique values used to reference a record within another table. Redundant values are values that you essentially don’t need in a table because the same value can be derived by querying another table. In this case most of your attributes are fine except for Final_Mark in the Course table. This is redundant because Course_Schedule will store the Final_Mark that was received. The Course table is meant to simply hold a list of all potential courses to be referenced by Course_Schedule.
There is no multivalued attributes in this design because that is bad practice Job and salary are singular and if and job or salary changes you would add a new record to the employer table not add to that column. Multivalued attributes make querying a db difficult and I would advise against it. That’s why I mentioned earlier to abstract all attributes with plurals into their own tables and use a foreign key reference.
You essentially do have that written here because Course_Schedule is a linker table meaning that it is meant to simplify relationships between tables so you don’t have many to many relationships.
All your relationships look right to me. Also since the schedules are linker tables and cannot exist without the supporting tables you could consider them weak entities. Course in this schema is a defined list of all courses available so can be independent of any other table. This by definition is not a weak entity. When creating this db you’d probably fill in the course table and it probably wouldn’t change after that, except rarely when adding or removing an available course option.
Yes, you can make address a composite attribute, and that would be right in your diagram. To be clear with your use of Primary key, just because an attribute is unique doesn’t make it a primary key. A table can have one and only one primary key so you must pick a column that you are certain will not be repeated. In this example you may think street number might be unique but what if one company leaves an address and another company moves into that spot. That would break that tables primary key. Typically a company name is licensed in a city or state so cannot be repeated. That would be a better choice for your primary key. You can also make composite primary keys, but that is a more advanced topic that I would recommend reading about at a later date.
Take final_mark out of courses. That’s table will contain rows of only courses, those courses won’t be linked to any trainee except by course_schedule table. The Final_Mark should only be in that table. If you add final_mark to Course table then, if you have 10 trainees in a course, You’d have 10 duplicate rows in the course table with only differing final_marks. Instead only hold the course_code and title that way you can assign different instructors, trainees and classrooms using the linker tables.
No composite attribute is needed using this schema. You have a Classroom table that will hold all available classrooms and their relevant information. You then use the Classroom_Schedule linker table to assign a given Classroom to a Course_Schedule. No attributes of Classroom can be broken down to simpler attributes.

Best approach to avoid Too many columns and complexity in database design

Inventory Items :
Paper Size
-----
A0
A1
A2
etc
Paper Weight
------------
80gsm
150gsm etc
Paper mode
----------
Colour
Bw
Paper type
-----------
glass
silk
normal
Tabdividers and tabdivider Type
--------
Binding and Binding Types
--
Laminate and laminate Types
--
Such Inventory items and these all needs to be stored in invoice table
How do you store them in Database using proper RDBMS.
As per my opinion for each list a master table and retrieval with JOINS. However this may be a little bit complex adding too many tables into the database.
This normalisation is having bit of problem when storing all this information against a Invoice. This is causing too many columns in invoice table.
Other way putting all of them into a one table with more columns and then each row will be a combination of them.. (hacking algorithm 4 list with 4 items over 24 records which will have reference ID).
Which one do you think the best and why!!
Your initial idea is correct. And anyone claiming that four tables is "a little bit complex" and/or "too many tables" shouldn't be doing database work. This is what RDBMS's are designed (and tuned) to do.
Each of these 4 items is an individual property of something so they can't simply be put, as is, into a table that merges them. As you had thought, you start with:
PaperSize
PaperWeight
PaperMode
PaperType
These are lookup tables and hence should have non-auto-incrementing ID fields.
These will be used as Foreign Key fields for the main paper-based entities.
Or if they can only exist in certain combinations, then there would need to be a relationship table to capture/manage what those valid combinations are. But those four paper "properties" would still be separate tables that Foreign Key to the relationship table. Some people would put an separate ID field on that relationship table to uniquely identify the combination via a single value. Personally, I wouldn't do that unless there was a technical requirement such as Replication (or some other process/feature) that required that each table had a single-field key. Instead, I would just make the PK out of the four ID fields that point to those paper "property" lookup tables. Then those four fields would still go into any paper-based entities. At that point the main paper entity tables would look about the same as they would if there wasn't the relationship table, the difference being that instead of having 4 FKs of a single ID field each, one to each of the paper "property" tables, there would be a single FK of 4 ID fields pointing back to the PK of the relationship table.
Why not jam everything into a single table? Because:
It defeats the purpose of using a Relational Database Management System to flatten out the data into a non-relational structure.
It is harder to grow that structure over time
It makes finding all paper entities of a particular property clunkier
It makes finding all paper entities of a particular property slower / less efficient
maybe other reasons?
EDIT:
Regarding the new info (e.g. Invoice Table, etc) that wasn't in the question when I was writing the above, that should be abstracted via a Product/Inventory table that would capture these combinations. That is what I was referring to as the main paper entities. The Invoice table would simply refer to a ProductID/InventoryID (just as an example) and the Product/Inventory table would have these paper property IDs. I don't see why these properties would be in an Invoice table.
EDIT2:
Regarding the IDs of the "property" lookup tables, one reason that they should not be auto-incrementing is that their values should be taken from Enums in the app layer. These lookup tables are just a means of providing a "data dictionary" so that the database layer can have insight into what these values mean.

Database Design Questions

I have 2 questions regarding a project. Would appreciate if I get clarifications on that.
I have decomposed Address into individual entities by breaking down to the smallest
unit. Bur addresses are repeated in a few tables. Like Address fields are there in the
Client table as well as Employee table. Should we separate the Address into a separate table with just a linking field
For Example
Create an ADDRESS Table with the following attributes :
Entity_ID ( It could be a employee ID(Home Address) or a client ID(Office Address) )
Unit
Building
Street
Locality
City
State
Country
Zipcode
Remove all the address fields from the Employee table and the Client table
We can obtain the address by getting the employee ID and referring the ADDRESS table for the address
Which approach is better ? Having the address fields in all tables or separate as shown above. Any thoughts on which design in better ?
Ya definitely separating address is better Because people can have multiple addresses so it will be increasing data redundancy.
You can design the database for this problem in two ways according to me.
A. Using one table
Table name --- ADDRESS
Column Names
Serial No. (unique id or primary key)
Client / Employee ID
Address.
B. Using Two tables
Table name --- CLIENT_ADDRESS
Column Names
Serial No. (unique id or primary key)
Client ID (foreign key to client table)
Address.
Table name --- EMPLOYEE_ADDRESS
Column Names
Serial No. (unique id or primary key)
Client ID (foreign key to employee table)
Address.
Definitely you can use as many number of columns instead of address like what you mentioned Unit,Building, Street e.t.c
Also there is one suggestion from my experience
Please add this five Columns in your each and every table.
CREATED_BY (Who has created this row means an user of the application)
CREATED_ON (At what time and date table row was created)
MODIFIED_ON (Who has modified this row means an user of the application)
MODIFIED_BY (At what time and date table row was modified)
DELETE_FLAG (0 -- deleted and 1 -- Active)
The reason for this from point of view of most of the developers is, Your client can any time demand records of any time period. So If you are deleting in reality then it will be a serious situation for you. So every time when a application user deleted an record from gui you have to set the flag as 0 instead of practically deleting it. The default value is 1 which means the row is still active.
At time of retrieval you can select with where condition like this
select * from EMPOLOYEE_TABLE where DELETE_FLAG = 1;
Note : This is an suggestion from my experience. I am not at all enforcing you to adopt this. So please add it according to your requirement.
ALSO tables which don't have any significant purpose doesn't need this.
Separating address into a seperate table is a better design decision as it means any db-side validation logic etc. only needs to be maintained in one place.

Resources