How can I create a view using a table which has multiple foreign key referencing the same table and a single field. I have product table and Reference table I have around 5 foreign key in product table referencing to the RefCodeKey Field in reference table. How can I create a view which shows product reference Code joining product and Reference Code
I have a product table as follows
PK PTK PC PN RCKey PSKey PCKey PCAKey
1 1 500000 Prod A 5 12 14 98
2 1 500001 Prod B 5 12 14 98
3 1 500002 Prod C 5 11 13 145
4 4 500002 Prod C 10 11 13 76
5 3 500002 Prod C 10 11 13 95
6 1 500005 Prod D 5 12 14 137
I have Reference Code Table as follows
RefCodeKey RefCodeType Code Label Status
1 ParentTypeKey assembly assembly Active
2 ParentTypeKey WHL WHL Active
3 ParentTypeKey TIRE TIRE Active
4 ParentTypeKey TIRE TIRE Active
5 RegionCodeKey 1 COMP 1 Active
6 RegionCodeKey 2 COMP 2 Active
7 RegionCodeKey 3 COMP 3 Active
8 RegionCodeKey 4 COMP 4 Active
9 RegionCodeKey 9 COMP 5 Active
10 RegionCodeKey 0 COMP 6 Active
11 ProductStatusKey CLOSED CLOSED Active
12 ProductStatusKey ACTIVE ACTIVE Active
13 ProductClassificationKey DropShip DropShipActive
14 ProductClassificationKey INFO NA INFO NA Active
How can i create a view display a result as show below?
PC PN RCKey PSKey PCKey
500000 Prod A COMP 1 ACTIVE INFO NA
500001 Prod B COMP 1 ACTIVE INFO NA
500002 Prod C COMP 1 CLOSED DropShip
500002 Prod C COMP 6 CLOSED DropShip
500002 Prod C COMP 6 CLOSED DropShip
500005 Prod D COMP 1 ACTIVE INFO NA
This is a common reporting pattern wherever the database architect has employed the "one true lookup table" model. I'm not going to get bogged down in the merits of that design. People like Celko and Phil Factor are far more erudite than me at commenting on these things. All I'll say is that having reported off over sixty enterprise databases in the last 15 years, that design is pervasive. Rightly or wrongly, you're probably going to see it over and over again.
There is currently insufficient information to definitively answer your question. The answer below makes assumptions on what I think is the most likely missing information is.
I'll assume your product table is named PRODUCT
I'll assume your all-powerful lookup table is call REFS
I'll assume RefCodeKey in REFS has a unique constraint on it, or it is the a primary key
I'll assume the REFS table is relatively small (say < 100,000 rows). I'll come back to this point later.
I'll assume that the foreign keys in the PRODUCT table are nullable. This affects whether we INNER JOIN or LEFT JOIN.
SELECT prod.PC
,prod.PN
,reg_code.label as RCKey
,prod_stat.label as PSKey
,prod_clas.label as PCKey
FROM PRODUCT prod
LEFT JOIN REFS reg_code ON prod.RCKey = reg_code.RefCodeKey
LEFT JOIN REFS prod_stat ON prod.PSKey = prod_stat.RefCodeKey
LEFT JOIN REFS prod_clas ON prod.PCKey = prod_clas.RefCodeKey
;
The trick is that you can refer to the REFS table as many times as you like. You just need to give it a different alias and join it to the relevant FK each time. For example reg_code is an alias. Give your aliases meaningful names to keep your code readable.
Note: Those RCKey/PSKey/PCKey names are really not good names. They'll come back to bite you. They don't represent the key. They represent a description of the thing in question. If it's a region code, call it region_code
The reason I'm assuming the REFS table is relatively small, is that if it's really large (I've seen one with 6 million lookup values across hundreds of codesets) and indexed to take RefCodeType into consideration, you might get better performance by adding a filter for RefCodeType to each of your LEFT JOINs. For example:
LEFT JOIN REFS prod_clas ON prod.PCKey = prod_clas.RefCodeKey
AND prod_clas.RefCodeType = 'ProductClassificationKey'
Related
I'm having difficulty connecting a dimension table (recursive/hierarchical) to a fact table as there are concerns/issues to deal with:
The dimension table belongs to a parent-child relationship structure
From the original table, it keeps growing
id
item_name
parent_id
1
classification
null
2
category
null
3
group
null
4
modern
1
5
modified
1
6
tools
2
7
meters
2
8
metal
3
9
plastic
3
10
lead
8
11
alloy
8
Denormalizing this kind of table is not suitable as a new entity type comes in, it would affect the dimension structure.
What is the best approach to this type?
Kindly provide an example and what would be the query statement after connecting the fact and dimension.
I'm in an intro to database management course and we're learning about normalizing data (1NF, 2NF, 3NF, etc.) and I'm super confused on how to actually go about and do it. I've read up on this, consulted various sites and youtube videos and I still can't seem to get it to click. I am using Microsoft Access 2013 if that's of any help.
This is the data I'm working with.
Thanks.
Edit1: Alright, I think I have the tables set up correctly. But now I'm having trouble actually inputting data to go from one table to the next. Here's my relationship table.
On a very basic level, any repeating values in a table are candidates for normalization. Duplicated data is usually a bad idea. Say you needed to update a patient's surname - you now have to update all the occurrences in this table, and possibly many others throughout the rest of the database. Much better to store each patient's details in one place only.
This is where normalization comes in. Looking down the columns, you can see that there are repeating values for data about dentists, patients and surgeries, so we should normalize towards having tables for each of these entities, as well as the original table that contains appointments, giving you four tables in total.
Extract the entities out into their own tables, and give each row a primary (unique) key - just use an incrementing integer for now. (Edit: as suggested in the comment we could use the natural keys of PatientNo, StaffNo and SurgeryNo instead of creating surrogates.)
Then, instead of each patient's name and number appearing multiple times in the appointments table, we just reference the key of the master record in the Patient table. This is called a foreign key.
Then, do the same for Dentist and Surgery.
You will end up with tables looking something like this:
APPOINTMENT
AppointmentID DentistID PatientID AppointmentTime SurgeryID
----------------------------------------------------------------
1 1 1 12 Aug 03 10:00 1
2 1 2 ... 2
3 2 3 ... 1
4 2 3 ... 1
5 3 2 ... 2
6 3 4 ... 3
DENTIST
DentistID Name StaffNo
--------------------------------------
1 Tony Smith S1011
2 Helen Pearson S1024
3 Robin Plevin S1032
PATIENT
PatientID Name PatientNo
---------------------------------------
1 Gillian White P100
2 Jill Bell P105
3 Ian MackKay P108
4 John Walker P110
SURGERY
SurgeryID SurgeryNo
-------------------------
1 S10
2 S15
3 S13
The first step is to data modelling and denormalization is to understand your data. Study it an understand the domain "objects" or tables that exist within your model. That will give you an idea of how to start. Sometimes a single table or query sample is not enough to fully understand the database, but in your case, we can use the sample data and make some assumptions.
Secondly, look for repeated / redundant data. If you see copies of names, there is a good chance that is a candidate for a foreign key. Our assumption tells us that STAFF_NO is a primary key candidate for DENTIST because each unique STAFF_NO correlates to a unique DENTIST_NAME, so I see a good candidate DENTIST table (STAFF_NO, DENTIST_NAME)
Example in some table of SURGERY:
ID STAFF_NO DENTIST_NAME
1 1 Fred Sanford
2 1 Fred Sanford
3 3 Lamont Sanford
4 3 Lamont Sanford
Why store these over and over? What happens when Fred says "But my correct name is Fred G Sanford", so you have to update your database. In the current table, you have to update the name is many rows. If you had normalized it, you'd have a single location for the name, in the DENTIST table.
So I can take the unique dentists and store them in DENTIST
create table DENTIST(staff_no integer primary key, dentist_name varchar(100));
-- One possible way to populate our dentist table is to use a distinct query from surgery
insert into DENTIST
select distinct staff_no, dentist_name from surgery;
STAFF_NO DENTIST_NAME
1 Fred Sanford
3 Lamont Sanford
SURGERY table now points to DENTIST table
ID STAFF_NO
1 1
2 1
3 3
4 3
And you can now create a view, VIEW_SURGERY to join the DENTIST_NAME back in to satisfy the needs of typical queries.
select s.id, d.staff_no, d.dentist_name
from surgery s join dentist d
on s.staff_no = d.staff_no -- join here
So now a unique update to DENTIST, by the dentist primary key will update a single row.
update dentist set name = 'Fred G Sanford' where staff_no = 1;
Add query view will show the updated name for N rows:
select * from view_surgery
ID STAFF_NO DENTIST_NAME
1 1 Fred G Sanford
2 1 Fred G Sanford
3 3 Lamont Sanford
4 3 Lamont Sanford
In short, you are removing redundancy.
This is just a sample, and one way to do it. Manual normalization like this is not as common when you have modelling tools, but the point is, we can look at data, spot redundancies and factor those redundancies into new tables, and relate those new tables by foreign keys and joins, then build views to represent the original data.
I have two database tables:
report(id, description) (key: id) and
registration(a, b, id_report) (key: (a, b));
id_report is a foreign key that references report id.
In the table registration there is the functional dependency a -> id_report.
So the table registration is 1nf but not 2nf.
Despite this i can not find insert/update/delete problems in the table registration. Is it possible?
Thanks
YOu said in a comment that you couldn't "find how problems could occur." (Emphasis added.) Here's how.
Let's say your table "registration" starts off with data like this.
a b id_report
--
1 10 13
1 11 13
1 12 13
2 27 14
2 33 14
The functional dependency a->id_report still holds. When we know the value for "a", we find one and only one value for "id_report".
But the dbms can't directly enforce that dependency. That means the dbms will permit this update statement to run without error.
update registration
set id_report = 15
where a = 1 and b = 10;
a b id_report
--
1 10 15
1 11 13
1 12 13
2 27 14
2 33 14
Now your data is broken. When we know the value for "a", we now find two values for "id_report". In the earlier table, knowing that "a" equaled 1 meant we knew that "id_report" equaled 13. We no longer know that; if "a" equals 1, id_report might be either 13 or 15.
A table can be denormalized and still not have any existing referential integrity issues.
The reason to normalize is to make it more difficult or impossible to create insert, update and delete anomalies. It is possible, but pretty hard, to manage all of the redundant data such that it remains consistent.
It's still a better idea to use a database in 3NF (or higher, if applicable) so that you aren't relying on programmers and users to keep you out of trouble. Sooner or later mistakes will happen.
I have a database of test data that have been collected on behalf of agents. The test data are grouped together (after the fact) into result sets. As the tests come in, they are stored in the database with the ID of the corresponding agent:
TEST_ID TEST_OWNER TIMESTAMP RESULT_ID
1 1 0 null
2 1 15 null
3 2 30 null
4 2 32 null
5 1 34 null
The result sets are generated at a later time in such a way that groups tests that took place during a similar time frame. This judgment cannot be made as the tests come in.
RESULT_ID
1
2
3
All of the tests in a result set must belong to the same owner. I can ensure this (in code) as I assign the result IDs to the tests in my later operation, but some things would be easier if I had a TEST_OWNER field in my result set table.
Would adding this field be a violation of some normalization goal? The TEST_OWNER information will be duplicated, even though one instance of it is really implicit. I'm not a DBA, and I don't want to do things that are bad style.
Jim I am not completely sure if you are saying this is a table in your DB??
TEST_ID TEST_OWNER TIMESTAMP RESULT_ID
1 1 0 null
2 1 15 null
3 2 30 null
4 2 32 null
5 1 34 null
If so the first thing I would do is pull the result attribute out of this table to achieve normalization. Or is this your Result table?
Regardless are these results being derived from from other data in the DB? If so I don't see the need to duplicate things and store the results (calculated) also. Just derive as needed and keep the DB clean.
If you need further info I need a better understanding of what you are presenting.
This is a simple and common scenario at work, and I'd appreciate some input.
Say I am generating a report for the owners of a pet show, and they want to know which of their customers have bought how many of each pet. In this scenario my only tools are SQL and something that outputs my query to a spreadsheet.
As the shop owner, I might expect reports in the form:
Customer Dog Cat Rabbit
1 2 3 0
2 0 1 1
3 1 2 0
4 0 0 1
And if one day I decided to stock Goldfish then the report should now come out as.
Customer Dog Cat Rabbit Goldfish
1 2 3 0 0
2 0 1 1 0
3 1 2 0 0
4 0 0 1 0
5 0 0 0 1
But as you probably know, to have a query which works this way would involve some form of dynamic code generation and would be harder to do.
The simplest query would work along the lines of:
Cross join Customers and Pets, Outer join Sales, Group, etc.
and generate:
Customer Pet Quantity
1 Dog 2
1 Cat 3
1 Rabbit 0
1 Goldfish 0
2 Dog 0
2 Cat 1
2 Rabbit 1
...etc
a) How would I explain to the shop owners that the report they want is 'harder' to generate? I'm not trying to say it's harder to read, but it is harder to write.
b) What is the name of the concept I am trying to explain to the customer (to aid with my Googling)?
The name of the concept is 'cross-tab' and can be accomplished in several ways.
MS Access has proprietary extensions to SQL to make this happen. SQL pre-2k5 has a CASE trick and 2k5 and later has PIVOT, but I think you still need to know what the columns will be.
Some databases indeed support some way of creating cross tables, but I think most need to know
the columns in advance, so you'd have to modify the SQL (and get a database that supports such an extension).
Another alternative is to create a program that will postprocess the second "easy" table to get your clients the cross table as output. This is probably easier and more generic than having to modify SQL or dynamically generate it.
And about a way to explain the problem... you could show them in an Excel how many steps are needed to get the desired result:
Source data (your second listing).
Select values from the pets column
Place each pet type found on a new column
Count values per each type per client
Fill the values
and then say that SQL gives you only the source data, so it's of course more work.
This concept is called pivoting
SQL assumes that your data is represented in terms of relations with fixed structure.
Like, equality is a binary relation, "customer has this many pets of this type" is a ternary relation and so on.
When you see this resultset:
Customer Pet Quantity
1 Dog 2
1 Cat 3
1 Rabbit 0
1 Goldfish 0
2 Dog 0
2 Cat 1
2 Rabbit 1
, it's actually a relation defined by all possible combinations of domain values being in this relation.
Like, a customer 1 (domain customers id's) has exactly 2 (domain positive numbers) pets of genus dog (domain pets).
We don't see rows like these in the resultset:
Customer Pet Quantity
1 Dog 3
Pete Wife 0.67
, because the first row is false (customer 1 doesn't have 3 items of dog, but 2), and the second row values are out of their domain scopes.
SQL paradigma implies that your relations are defined when you issue a query and each row returned defines the relation completely.
SQL Server 2005+ can map rows into columns (that is what you want), but you should know the number of columns when designing the query (not running).
As a rule, the reports you are trying to build are built with reporting software which knows how to translate relational SQL resultsets into nice looking human readable reports.
I have always called this pivoting, but that may not be the formal name.
Whatever it's called you can do almost all of this in plain SQL.
SELECT customer, count(*), sum(CASE WHEN pet='dog' THEN 1 ELSE 0 END) as dog, sum(case WHEN pet='cat' THEN 1 ELSE 0 END) as cast FROM customers join pets
Obviously what's missing is the dynamic columns. I don't know if this is possible in straight SQL, but it's certainly possible in a stored procedure to generate the query dynamically after first querying for a list of pets. The query is built into a string then that string is used to create a prepared statement.