What normal form is pivoted/widened data in? - database

There are many questions that have something to the effect of "What normal form are these data in?" and I admittedly have not combed through every one of them to see if "These data" they are referring to are pivoted. I'm asking this because I think this would be useful for those searching for this for those familiar with this terminology.
Lets say that I have a table with the columns:
personid*, email1, email2, email3
1 , e#e.us, NULL , NULL
2 , a#a.com,b#b.co, c#c.com
3 , j#j.com,l#l.uk
Where personid is the primary key and uniquely identifies the table. Each e-mail is functionally dependent on personid but obviously this isn't in 3NF because that would involve having a person table and an e-mail table such as:
personid, email ,email_num
1 ,e#e.us ,1
2 ,a#a.com,1
2 ,b#b.co ,2
2 ,c#c.com,3
3 ,j#j.com,1
3 ,l#l.uk ,2
Where email_num takes the place of the n from the previous table.
What normal form is the first (pivoted) table in?

Related

Database design for unilateral relationship between contractors and companies

Companies and contractors can have contacts, those contacts can be both companies and contractors, and that relationship is unilateral (no need for the other party to accept).
I was thinking of doing something like this:
Contacts:
ContractorId
CompanyId
ContactContractorId
ContactCompanyId
Type
1
NULL
15
NULL
3
NULL
22
12
NULL
1
NULL
44
NULL
22
2
ContactTypes:
Id
Type
CompanyToContractor
1
CompanyToCompany
2
ContractorToContractor
3
ContractorToCompany
4
Is this a good approach, or should I create four different tables, one for each type of relationship?
If your model doesn't require companies and contractors to be handled differently, use a generic table that covers both. Otherwise you could do first 3 columns of the above like this:
contact: contact_id
company: company_id, ...
company_contact: company_id, contact_id
contractor: contractor_id, ...
contractor_contact: contractor_id, contact_id
and for the last 2 columns:
contact_company: contact_id, company_id
contact_contractor: contract_id, contractor_id
which would enable referential integrity.
If you don't care about the foreign keys, use unique key across both company and contractor tables. For example a sequence or an uuid. This would would allow you your contractor to have reference to either table table and eliminate the company_contact and contractor_contact. Similar, you can condense contact_company and contact_contractor into 1 table.

SQL JOIN all tables from one data

I am trying to get all the data from all tables in one DB.
I have looked around, but i haven't been able to find any solution that works with my current problems.
I made a C# program that creates a table for each day the program runs. The table name will be like this tbl18_12_2015 for today's date (Danish date format).
Now in order to make a yearly report i would love if i can get ALL the data from all the tables in the DB that stores these reports. I have no way of knowing how many tables there will be or what they are called, other than the format (tblDD-MM-YYYY).
in thinking something like this(that obviously doesen't work)
SELECT * FROM DB_NAME.*
All the tables have the same columns, and one of them is a primary key, that auto increments.
Here is a table named tbl17_12_2015
ID PERSONID NAME PAYMENT TYPE RESULT TYPE
3 92545 TOM 20,5 A NULL NULL
4 92545 TOM 20,5 A NULL NULL
6 117681 LISA NULL NULL 207 R
Here is a table named tbl18_12_2015
ID PERSONID NAME PAYMENT TYPE RESULT TYPE
3 117681 LISA 30 A NULL NULL
4 53694 DAVID 78 A NULL NULL
6 58461 MICHELLE NULL NULL 207 R
What i would like to get is something like this(from all tables in the DB):
PERSONID NAME PAYMENT TYPE RESULT TYPE
92545 TOM 20,5 A NULL NULL
92545 TOM 20,5 A NULL NULL
117681 LISA NULL NULL 207 R
117681 LISA 30 A NULL NULL
53694 DAVID 78 A NULL NULL
58461 MICHELLE NULL NULL 207 R
Have tried some different query's but none of them returned this, just a lot of info about the tables.
Thanks in advance, and happy holidays
edit: corrected tbl18_12_2015 col 3 header to english rather than danish
Thanks to all those who tried to help me solving this question, but i can't (due to my skill set most likely) get the UNION to work, so that's why i decided to refactor my DB.
While you could store the table names in a database and use dynamic sql to union them together, this is NOT a good idea and you shouldn't even consider it - STOP NOW!!!!!
What you need to do is create a new table with the same fields - and add an ID (auto-incrementing identity column) and a DateTime field. Then, instead of creating a new table for each day, just write your data to this table with the DateTime. Then, you can use the DateTime field to filter your results, whether you want something from a day, week, month, year, decade, etc. - and you don't need dynamic sql - and you don't have 10,000 database tables.
I know some people posted comments expressing the same sentiments, but, really, this should be an answer.
If you had all the tables in the same database you would be able to use the UNION Operator to combine all your tables..
Maybe you can do something like this to select all the tables names from a given database
For SQL Server:
SELECT TABLE_NAME
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_TYPE = 'BASE TABLE' AND TABLE_CATALOG='dbName'
For MySQL:
SELECT TABLE_NAME
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_TYPE = 'BASE TABLE' AND TABLE_SCHEMA='dbName'
Once you have the list of tables you can move all the tables to 1 database and create your report using Unions..
You will need to use a UNION between each select query.
Do not use *, always list the name of the columns you are bringing up.
If you want duplicates, then UNION ALL is what you want.
If you want unique records based on the PERSONID, but there is likely to be differences, then I will guess that an UPDATE_DATE column will be useful to determine which one to use but what if each records with the same PERSONID lived a life of its own on each side?
You'd need to determine business rules to find out which specific changes to keep and merge into the unique resulting record and you'd be on your own.
What is "Skyttenavn"? Is it Danish? If it is the same as "NAME", you'd want to alias that column as 'NAME' in the select query, although it's the order of the columns as listed that counts when determining what to unite.
You'd need a new auto-incremented ID as a unique primary key, by the way, if you are likely to have conflicting IDs. If you want to merge them together into a new primary key identity column, you'd want to set IDENTITY_INSERT to OFF then back to ON if you want to restart natural incrementation.

How can I associate a single record with one or more PKs

If I had a single record that represented, say, a sellable item:
ItemID | Name
-------------
101 | Chips
102 | Candy bar
103 | Beer
I need to create a relationship between these items and one or more different types of PKs. For instance, a company might have an inventory that included chips; a store might have an inventory that includes chips and a candy bar, and the night shift might carry chips, candy bars, and beer. The is that we have different kinds of IDs: CompanyID, StoreID, ShiftID respectively.
My first though was "Oh just create link tables that link Company to inventory items, Stores to inventory items, and shifts to inventory items" and that way if I needed to look up the inventory collection for any of those entities, I could query them explicitly. However, the UI shows that I should be able to compile a list arbitrarily (e.g. show me all inventory items for company a, all west valley stores and Team BrewHa who is at an east valley store) and then display them grouped by their respective entity:
Company A
---------
- Chips
West Valley 1
-------------
- Chips
- Candy Bar
West Valley 2
-------------
- Chips
BrewHa (East Valley 6)
--------------------
- Chips
- Candy Bar
- Beer
So again, my first though was to base the query on the provided information (what kinds of IDs did they give me) and then just union them together with some extra info for grouping (candidate keys like IDType+ID) so that the result looked kind of like this:
IDType | ID | InventoryItemID
------------------------------
1 |100 | 1
2 |200 | 1
2 |200 | 2
2 |201 | 1
3 |300 | 1
3 |300 | 2
3 |300 | 3
I guess this would work, but it seems incredibly inefficient and contrived to me; I'm not even sure how the parameters of that sproc would work... So my question to everyone is: is this even the right approach? Can anyone explain alternative or better approaches to solve the problem of creating and managing these relationships?
It's hard to ascertain what you want as I don't know the purpose/use of this data. I'm not well-versed in normalization, but perhaps a star schema might work for you. Please keep in mind, I'm using my best guess for the terminology. What I was thinking would look like this:
tbl_Current_Inventory(Fact Table) records current Inventory
InventoryID INT NOT NULL FOREIGN KEY REFERENCES tbl_Inventory(ID),
CompanyID INT NULL FOREIGN KEY REFERENCES tbl_Company(ID),
StoreID INT NULL FOREIGN KEY REFERENCES tbl_Store(ID),
ShiftID INT NULL FOREIGN KEY REFERENCES tbl_Shift(ID),
Shipped_Date DATE --not really sure, just an example,
CONSTRAINT clustered_unique CLUSTERED(InventoryID,CompanyID,StoreID,ShiftID)
tbl_Inventory(Fact Table 2)
ID NOT NULL INT,
ProductID INT NOT NULL FOREIGN KEY REFERENCES tbl_Product(ID),
PRIMARY KEY(ID,ProductID)
tbl_Store(Fact Table 3)
ID INT PRIMARY KEY,
CompanyID INT FOREIGN KEY REFERENCES tbl_Company(ID),
RegionID INT FOREIGN KEY REFERENCES tbl_Region(ID)
tbl_Product(Dimension Table)
ID INT PRIMARY KEY,
Product_Name VARCHAR(25)
tbl_Company(Dimension Table)
ID INT PRIMARY KEY,
Company_Name VARCHAR(25)
tbl_Region(Dimension Table)
ID PRIMARY KEY,
Region_Name VARCHAR(25)
tbl_Shift(Dimension Table)
ID INT PRIMARY KEY,
Shift_Name VARCHAR(25)
Start_Time TIME,
End_Time TIME
So a little explanation. Each dimension table holds only distinct values like tbl_Region. Lists each region's name once and an ID.
Now for tbl_Current_Inventory, that will hold all the columns. I have companyID and StoreID both in their for a reason. Because this table can hold company inventory information(NULL StoreID and NULL shiftID) AND it can hold Store Inventory information.
Then as for querying this, I would create a view that joins each table, then simply query the view. Then of course there's indexes, but I don't think you asked for that. Also notice I only had like one column per dimension table. My guess is that you'll probably have more columns then just the name of something.
Overall, this helps eliminate a lot of duplicate data. And strikes a good balance at performance and not overly complicated data structure. Really though, if you slap a view on it, and query the view, it should perform quite well especially if you add some good indexes.
This may not be a perfect solution or even the one you need, but hopefully it at least gives you some ideas or some direction.
If you need any more explanation or anything else, just let me know.
In a normalized database, you implement a many-to-many relationship by creating a table that defines the relationships between entities just as you thought initially. It might seem contrived, but it gives you the functionality you need. In your case I would create a table for the relationship called something like "Carries" with the primary key of (ProductId, StoreId, ShiftId). Sometimes you can break normalization rules for performance, but it comes with side effects.
I recommend picking up a good book on designing relational databases. Here's a starter on a few topics:
http://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model
http://en.wikipedia.org/wiki/Database_normalization
You need to break it down to inventory belongs to a store and a shift
Inventory does does not belong to a company - a store belongs to a company
If the company holds inventory directly then I would create a store name warehouse
A store belongs to a region
Don't design for the UI - put the data in 3NF
Tables:
Company ID, name
Store ID, name
Region ID, name
Product ID, name
Shift ID, name
CompanyToStore CompanyID, StoreID (composite PK)
RegionToStore RegionID, StoreID (composite PK)
Inventory StoreID, ShiftID, ProductID (composite PK)
The composite PK are not just efficient they prevent duplicates
The join tables should have their own ID as PK
Let the relationships they are managing be the PK
If you want to report by company across all shifts you would have a query like this
select distinct store.Name, Product.Name
from Inventory
join Store
on Inventory.StoreID = Store.ID
join CompanyToStore
on Store.ID = CompanyToStore.StoreID
and CompanyToStore.CompanyID = X
store count in a region
select RegionName, count(*)
from RegionToStore
join Region
on Region.ID = RegionToStore.RegionID
group by RegionName

Viable ways to have an AutoNumber per user (or per entity)?

Let's say you have a web application that manages books for book sellers, and it is built on a multi-tenant database with a single books table that contains books from several book sellers.
Now let's say that each book seller really wants each of their books to have a unique number associated with it so they can look books up by that number, but it's important to them that the number is roughly consecutive for them. (It's OK if there are small breaks in the sequence due to deleted books and other events that cause an AutoNumber to get consumed but not used).
Obviously each book already has a unique number (primary key) associated with it that is generated via AutoNumber and is unique across book sellers. That is not what I am discussing here.
Let's just assume SQL-Server from here on, but the discussion applies equally to Oracle (except that Oracle uses Sequences that are independent of tables, and the current version of SQL Server must use a table to accomplish the same thing).
We want a number that increments safely in the context of a book seller. We want to maintain the benefits of using AutoNumber, but we want there to be one sequence per book seller. It seems like there are two options, and neither are very good:
Create one single-column table per book seller. This scares me because I can't think of another example of dynamically changing the schema (adding a new table whenever a new book seller is added to the system via the web application) in a web request. It also seems really heavyweight to have one table per book seller. I know a future version of SQL-server will support Sequences, but even that would still be a schema change at run time.
Roll your own auto-numbering behavior. This seems really risky because databases' built-in AutoNumber features take care of a lot of stuff for you, and giving that up is a big deal. Attempts to re-implement it yourself are probably error-prone and may cause poorer concurrency than the built-in AutoNumber.
Hopefully there are additional options that I'm missing. Has anyone successfully dealt with a similar situation? Thanks.
Is there a reason you couldn't have a 2 field table with:
BookSeller_ID, BookID
You wouldn't need to change schema as you add sellers, and it would be trivial to track per seller:
SELECT MAX(BookID)
WHERE BookSeller_ID = 123
For additional info you could also add a Universal_BookID field that linked to your unique ID referenced in the 3rd paragraph.
EDIT:
To clarify, if you have sellers 1 2 and 3 you could have a table like:
SellerID BookID BookUniversalID
1 1 123
2 1 456
3 1 999
1 2 1234
1 3 8798
1 4 999
1 5 10000001
3 2 123
3 3 456
You keep track of which seller has which IDs assigned and which actual book it links too, and to determine what the next ID is for a seller just query
SELECT MAX(bookid) FROM ThisTable WHERE SellerID = 1
DENSE_RANK, works in SQL Server and Oracle
Assuming your table looks vaguely thus
CREATE TABLE dbo.BOOKS
(
internal_book_id int identity(1,1) primary key
, seller_id int NOT NULL
, title varchar(50) NOT NULL
)
Whenever you present the identity value to the seller, use the dense_rank() function to generate the surrogate values.
CREATE VIEW dbo.BOOK_TO_SELLER_MAP
AS
SELECT
B.*
, DENSE_RANK() OVER (PARTITION BY B.seller_id ORDER BY B.internal_book_id ASC) AS unique_book_id_for_seller
FROM
dbo.BOOKS B
WHERE
B.seller_id = #sellerId
For the combination of seller_id and the generated id, you ought to always match back to the true id (assuming no physical deletes).
Demo code
;
WITH BOOKS (internal_book_id, seller_id, title)
AS
(
SELECT 1, 100, 'Secret of NIMH'
UNION ALL SELECT 2, 400, 'Once and Future King'
UNION ALL SELECT 7, 88, 'Microsoft SQL Server 2008'
UNION ALL SELECT 8, 100, 'Bonfire of the Vanities'
UNION ALL SELECT 9, 100, 'Canary Row'
UNION ALL SELECT 10, 400, '1916'
UNION ALL SELECT 11, 100, 'The Picture of Dorian Gray'
UNION ALL SELECT 12, 88, 'The Disasters of War'
)
, BOOK_TO_SELLER_MAP AS
(
SELECT
B.*
, DENSE_RANK() OVER (PARTITION BY B.seller_id ORDER BY B.internal_book_id ASC) AS unique_book_id_for_seller
FROM
BOOKS B
)
SELECT
*
FROM
BOOK_TO_SELLER_MAP V
ORDER BY
V.seller_id
, V.unique_book_id_for_seller
Results
internal_book_id seller_id title unique_book_id_for_seller
7 88 Microsoft SQL Server 2008 1
12 88 The Disasters of War 2
-------------------------------------------------------------------------------
1 100 Secret of NIMH 1
8 100 Bonfire of the Vanities 2
9 100 Canary Row 3
11 100 The Picture of Dorian Gray 4
-------------------------------------------------------------------------------
2 400 Once and Future King 1
10 400 1916 2
OMG Ponies is correct that sequences are the only correct way to achieve this. There isn't really another viable option.

MS Access Relationship help needed

I have 2 MS Access Tables.
Table 1
id
room-name
Table 2
wall
cupboard
ceiling
Now... table1.room-name has the room names and table2 contains object (many) so each room name contains many objects.
My question is ... How do I set the relationships for this please?
Nothing in table 2 tells you what room things are in so you need to add a foreign key of the room to the primary key of table 1. In this case either column of table1 could be its primary key - I would use room- name and drop the id.
So table2 needs altering so that room-name is in it and the draw the connection from table1 to table2.
Something like:
[Room]
RoomId eg 1 2
RoomName eg bedroom kitchen
[RoomItem]
RoomItemId eg 1 eg 2 eg 3
RoomId eg 1 eg 1 eg 2
ItemName eg wardrobe eg bed eg cooker
Where the RoomId links the Room and RoomItem tables.

Resources