How to model a tables with similar dimensions in a data warehouse? - data-modeling

I have 3 tables for recruitment purposes
Table 1 : contain data for already Accepted candidate
Table 2 : contain data for under evaluation candidate
Table 3 : contain data for interviews
Business requirements :
1- Number of accepted candidates under the procedure "table 1"
2- Number of Candidates Rejected / Pending / Interviewed / Number of Applicants "table 2"
3- number of interviews "table 3"
There are a lot of common dimensions between the tables and I found an issue building the model
in the picture below is what I have tried.
I separate the common columns as a DIM_APPLICANT and try to link it to the tables.
question : is it a correct solution or no?
is there a better solution?
Note : I added the ID columns in the model because it will be added to the excel sheet later.

Related

Storing processed results of connection in RDBMS

A csv file contains following two column : admission_number, project_name.
The relationship between two entities are many to many relationships : a specific admission_number can work over multiple projects. A specific project may have multiple admission_number.
Data will be like as follows and initially there are '1000 milion' rows and data will keep on updating on daily basis in this table will go upto 1300 milion rows.
admission_number,project_name
1234567890,ABC1234567
1234567890,ABC1234568
1234567891,ABC1234569
1234567892,ABC1234569
1234567893,ABC1234570
1234567894,ABC1234567
1234567895,ABC1234567
For a specific admission number(lets say 1234567890), i want to know all the admission_number who are working on the same projects (ABC1234567,ABC1234568). The output of above query will be
1234567894,1234567895.
Explanation : Since for admission number '1234567890', projects name are 'ABC1234567' and 'ABC1234568'. On these two projects other 'admission_number' are working as '1234567894','1234567895'
I came up with two solutions, To store the data,RDBMS will be used.
Approach 1 : By using two retrieval query : First query shall return all the projcects_name for a specific 'admission_number' and the second query will retrun all the admission_number for 'project_name'.
select admission_number from table where project_name IN (select project_name from table where admission_number='ABC1234567'.
Approach 2 : In this approach, before going for loading i am preprocessing the results and directly results is storing in database. I am only storing all the connected 'admission_number'.
Eg. For project_name 'ABC1234567', these 3 admission_number '1234567890','1234567894', '1234567895' are working. I want to store all connected admission_number in table with two columns (number,connected_number) like ('1234567890','1234567894'),('1234567890','1234567895'), ('1234567894','1234567895'), and query will work on both columns (number and connected_number).
But in this approach there will be many rows means if a specifc project_name 'p', there are n 'admission_number' than total number of rows will be n(n-1)/2
How can i store all the connected admission_number in RDBMS? Loading of data can be slow, but retrieval should be fast.
Do not optimize the data structure. It would only cause problems.
Create a simple table with two columns for both ID and create index for both columns.
The RDBMS will build and maintain an index of the column values, which will enable fast lookup for a specific record.

Database ER Diagram

Please explain to me the type of relationship portrayed using this flowchart:
Also, explain the different notation for a different type of relationship i.e
One-to-one, One-to-many, Many-to-one, Many-to-many. And what about the participation i.e Total or partial
A record in table 1 maps to exactly 1 record in table 2. but a record in table 2 can map to 1 or many records in table 1.
for example if if table 1 is records of people, and table 2 is records of places of birth. each person in table 1 can only have one place of birth, but a place of birth can have many people

What datatype should be used to store an array in the SQL Server

I am trying to create a table in my database using Visual Studio.
I've got a table for my Products (like in online shop) and then I have a table for Orders, which should store all products that user has ordered. The problem is that I am not sure which datatype I should use when designing the database to store an array of products in my Orders table. This is what the Orders table should look like
You should create Products and Orders table with relationship between them.
Your Orders table should have Id column as well (which is PrimaryKey)
Then you should create Products table, that keeps all the information about products and additionaly OrderId which should be used as Foreign Key to Orders table.
Please look at that link:
https://msdn.microsoft.com/en-us/library/ms189049.aspx
It's also worth of checking:
One To One, One To Many, Many To Many relations in SQLServer to have better understanding and design your data store properly.
In your case you need ProductsOrders table, Many To Many relationship.
In Relational database, you can create a relationship between 2 tables.
The relationship can be
1 to 1 (1 Product - 1 Order)
1 to Many (1 Product - 'n' Order)
Many to Many (n product - 'n' Order)
Based on your scenario, You can choose any of the relationship listed above. While querying from the database, you can easily operate over each order/Product.

Best approach to avoid Too many columns and complexity in database design

Inventory Items :
Paper Size
-----
A0
A1
A2
etc
Paper Weight
------------
80gsm
150gsm etc
Paper mode
----------
Colour
Bw
Paper type
-----------
glass
silk
normal
Tabdividers and tabdivider Type
--------
Binding and Binding Types
--
Laminate and laminate Types
--
Such Inventory items and these all needs to be stored in invoice table
How do you store them in Database using proper RDBMS.
As per my opinion for each list a master table and retrieval with JOINS. However this may be a little bit complex adding too many tables into the database.
This normalisation is having bit of problem when storing all this information against a Invoice. This is causing too many columns in invoice table.
Other way putting all of them into a one table with more columns and then each row will be a combination of them.. (hacking algorithm 4 list with 4 items over 24 records which will have reference ID).
Which one do you think the best and why!!
Your initial idea is correct. And anyone claiming that four tables is "a little bit complex" and/or "too many tables" shouldn't be doing database work. This is what RDBMS's are designed (and tuned) to do.
Each of these 4 items is an individual property of something so they can't simply be put, as is, into a table that merges them. As you had thought, you start with:
PaperSize
PaperWeight
PaperMode
PaperType
These are lookup tables and hence should have non-auto-incrementing ID fields.
These will be used as Foreign Key fields for the main paper-based entities.
Or if they can only exist in certain combinations, then there would need to be a relationship table to capture/manage what those valid combinations are. But those four paper "properties" would still be separate tables that Foreign Key to the relationship table. Some people would put an separate ID field on that relationship table to uniquely identify the combination via a single value. Personally, I wouldn't do that unless there was a technical requirement such as Replication (or some other process/feature) that required that each table had a single-field key. Instead, I would just make the PK out of the four ID fields that point to those paper "property" lookup tables. Then those four fields would still go into any paper-based entities. At that point the main paper entity tables would look about the same as they would if there wasn't the relationship table, the difference being that instead of having 4 FKs of a single ID field each, one to each of the paper "property" tables, there would be a single FK of 4 ID fields pointing back to the PK of the relationship table.
Why not jam everything into a single table? Because:
It defeats the purpose of using a Relational Database Management System to flatten out the data into a non-relational structure.
It is harder to grow that structure over time
It makes finding all paper entities of a particular property clunkier
It makes finding all paper entities of a particular property slower / less efficient
maybe other reasons?
EDIT:
Regarding the new info (e.g. Invoice Table, etc) that wasn't in the question when I was writing the above, that should be abstracted via a Product/Inventory table that would capture these combinations. That is what I was referring to as the main paper entities. The Invoice table would simply refer to a ProductID/InventoryID (just as an example) and the Product/Inventory table would have these paper property IDs. I don't see why these properties would be in an Invoice table.
EDIT2:
Regarding the IDs of the "property" lookup tables, one reason that they should not be auto-incrementing is that their values should be taken from Enums in the app layer. These lookup tables are just a means of providing a "data dictionary" so that the database layer can have insight into what these values mean.

How to maintain the relation ship on more than two tables

Please help me on maintaining the relatoinships between more than two tables. I will explain the one scenario that i am facing now.
Table 1 has all the codes i am using in the application, example status codes means Active has the code as 10, InActive has the code as 20, in this way i am mainting all the codes,
Table 2 is having the 5 columns,
column 1 is auto generated key
column 2 - has the id of table 1
column 3 - also has the id of table 1
column 4 - also has the id of table 1
column 5 - has description.
So,I need to perform multiple joins while retreiving the data from these two tables.My question is this, is it right way to maitain the tables?
Moreover i am using Spring with Hibernate to fetch the data from DB. Any ideas on how to do that using Hibernate?
Please suggest me on this.

Resources