Which of these definitions explain 1NF? - database

I found it vague when i'm trying to look for the definition of 1NF in google.
Some of the sites like this one, says the table is in 1st normal form when it doesn't have any repetitive set of columns.
Some others (most of them) says there shouldn't be multiple values of the same domain exist in the same column.
and some of them says, all tables should have a primary key but some others doesn't talk about primary key at all !
can someone explain this for me ?

A relation is in first normal form if it has the property that none of its domains has elements which are themselves sets.
From E. F. Codd (Oct 1972). "Further normalization of the database relational model"
This really gets down to what it is about, but the guy who invented the relational database model.
When something is in the first normal form, there are no columns which themselves contain sets of data.
The wikipedia article on first normal form demonstrates this with a denormalized table:
Example1:
Customer
Customer ID | First Name | Surname | Telephone Number
123 | Robert | Ingram | 555-861-2025
456 | Jane | Wright | 555-403-1659, 555-776-4100
789 | Maria | Fernandez | 555-808-9633
This table is denormalized because Jane has a telephone number that is a set. Writing the table thus is still in violation of 1NF.
Example2:
Customer
Customer ID | First Name | Surname | Telephone Number
123 | Robert | Ingram | 555-861-2025
456 | Jane | Wright | 555-403-1659
456 | Jane | Wright | 555-776-4100
789 | Maria | Fernandez | 555-808-9633
The proper way to normalize the table is to break it out into two tables.
Example3:
Customer
Customer ID | First Name | Surname
123 | Robert | Ingram
456 | Jane | Wright
789 | Maria | Fernandez
Phone
Customer ID | Telephone Number
123 | 555-861-2025
456 | 555-403-1659
456 | 555-776-4100
789 | 555-808-9633
Another way of looking at 1NF is as defined by Chris Date (from Wikipedia):
There's no top-to-bottom ordering to the rows.
There's no left-to-right ordering to the columns.
There are no duplicate rows.
Every row-and-column intersection contains exactly one value from the applicable domain (and nothing else).
All columns are regular [i.e. rows have no hidden components such as row IDs, object IDs, or hidden timestamps].
Example2 lacks a unique key which is in violation of rule 3. Example1 violates rule 4 in that the telephone number contains multiple values.
Only Example3 fills all those requirements.
Further reading:
Simple Guide to Five Normal Forms in Relational Database Theory

The simplest explanation I have found is this modified definition copied from here:
1st Normal Form Definition
A database is in first normal form if it satisfies the following conditions:
1) Contains only atomic values
2) There are no repeating groups

Related

How can I traverse a hierarchy to find circular references?

I have a table that maps employees for profit sharing. It allows a given employee to assign a percentage of their take to another employee. This means there isn't necessarily a single entry for a given employee, there could be 0 or more. Alice could give 5% to Bob and 3% to Charlie, so Alice would have two records in the table.
The table has three fields, FromEmployeeId of type int, ToEmployeeId of type int, PctShare of type float.
When a user attempts to add a new mapping record I need to make sure that it won't result in a circular reference.
The problem is I have both multiple hierarchy levels and each employee can have multiple mappings. Initially I thought a recursive CTE may work but I don't think it will work with the multiple mapping issue.
Here's an example:
Bob maps to Charlie and Yusuf
Charlie maps to David and William
Yusuf has no mappings
David maps to Alice and Vince
Vince has no mappings
Alice maps to Tucker
Tucker has no mappings
or in tabular form:
------------------------
| From | To |
------------------------
| Bob | Charlie|
| Bob | Yusuf |
| Charlie | David |
| Charlie | William|
| David | Alice |
| David | Vince |
| Alice | Tucker |
------------------------
If a user attempted to add a mapping record from Alice -> Bob, I want to recognize that that results in a circular reference because Bob -> Charlie -> David -> Alice.
Is my only option something like a cursor and looping through each of the result sets?

Implementing a Model in a Relational Database

I have a super-class/subclass hierarchical relationship as follows:
Super-class: IT Specialist
Sub-classes: Databases, Java, UNIX, PHP
Given that each instance of a super-class may not be a member of a subclass and a super-class instance may be a member of two or more sub-classes, how would I go about implementing this system?
I haven't been given any attributes to assign to the entities so I find this very vague and I'm at a loss where to start.
To get started, you would have one table that contains all of your super-classes (in your example case, there would only be IT Specialist, but it could also contain things like Networking Specialist, or Digital Specialist). I've included these to give a bit more flavour:
ID | Name |
-----------------------------
1 | IT Specialist |
2 | Networking Specialist |
3 | Digital Specialist |
You also would have another table that contains all of your sub-classes:
ID | Name |
--------------------
1 | Databases |
2 | Java |
3 | UNIX |
4 | PHP |
For example, let's say that a Networking Specialist needs to know about Databases, and a Digital Specialist needs to know about both Java and PHP. An IT Specialist would need to know all four fields listed above.
There are two possible ways to go about this. One such way would be to set 'flags' in the sub-class table:
ID | Name | Is_IT | Is_Networking | Is_Digital
----------------------------------------------------
1 | Databases | 1 | 1 | 0
2 | Java | 1 | 0 | 1
3 | UNIX | 1 | 0 | 0
4 | PHP | 1 | 0 | 1
Keep in mind, this is only using a small number of skills. If you started to have a lot of super-classes, the columns in the sub-class table could get out of hand pretty quickly.
Fortunately, you can also use something known as a bridging table (also known as an associative entity). Essentially, a bridging table allows you to have two foreign keys that are primary keys in another table, solving the problem of a many-to-many relationship.
You would set this up by having a new table that associates which sub-classes belong with which super-classes:
ID | Sub-class ID | Super-class ID |
-------------------------------------
1 | 1 | 1 |
2 | 1 | 2 |
3 | 2 | 1 |
4 | 2 | 3 |
5 | 3 | 1 |
6 | 4 | 1 |
7 | 4 | 3 |
Note that there are 'duplicates' in both the sub-class ID and super-class ID fields, yet no duplicates in the ID field. This is because the bridging table has unique IDs, which it uses to make independent associations. Sub-class 1 (Databases) needs to be associated to two different groups (IT Specialist and Networking Specialist). Thus, two different associations need to be formed.
Both approaches above give the same 'result'. The only real difference here is that a bridging table will give you more rows, while setting multiple flags will give you more columns. Obviously, the way in which you craft your query will be different as well.
Which of the two approaches you choose to go with really depends on how much data you're dealing with, and how much scope the database is going to have for expansion in the future :)
Hope this helps! :)

How to store country and state in sql server

I'm creating a video website that similar to youtube except its targets the indie gaming community.
I'm working on the table design and have run into a bit of stumbling block with the location column.
How do major sites design tables for storing location?
Profile table:
ID | username | country | state
0 | jack | US | New York
1 | ted | Canada | Alberta
OR
ID | username | countryID
0 | jack | 1
1 | ted | 2
Regions table:
ID | country | state
0 | United States | Texas
1 | United states | New york
2 | Canada | Alberta
Or is there some other design I missed?
And what about :
profile Table
ID | username | stateID
0 | jack | 1
1 | ted | 2
states table
ID | countryID | state
0 | 0 | Texas
1 | 0 | New york
2 | 1 | Alberta
countries table
ID | country
0 | United States
1 | Canada
I have no idea how "big" websites handle their data, but anyway I think this would be a matter of preference and business requirements, in the first case the table isn't properly normalized as the state depends on the country, and in the other case the model is [almost] properly normalized (the country could be moved to another table). The first option can be faster when doing lookups et cetera but as it breaks the normalized relational model it can lead to issues when inserting/updating the data (as well as additional storage). Personally I would chose to use the second option (and maybe de-normalize it for analytics processing if needed - I would think it very much depends on the amount of data you expect to handle)
A normalized model would look something like:
profile (**username**, state)
states (**state**, country)
countries (**country**)
The example above doesn't use surrogate keys and only illustrate the model; a database implementation of the model would often use surrogate keys such as UserID, StateID and CountryID although if properly normalized they shouldn't be needed as the entities should be primary keys (as they are candidate keys).

Database Design - Drop Down Input Box Issue

I'm trying to create a friendship site. The issue I'm having is when a user joins a website they have to fill out a form. This form has many fixed drop down items the user must fill out. Here is an example of one of the drop downs.
Drop Down (Favorite Pets)
Items in Favorite Pets
1. Dog
2. Cat
3. Bird
4. Hampster
What is the best way to store this info in a database. Right now the profile table has a column for each fixed drop down. Is this correct database design. See Example:
User ID | Age | Country | Favorite Pet | Favorite Season
--------------------------------------------------------------
1 | 29 | United States | Bird | Summer
Is this the correct database design? right now I have probably 30 + columns. Most of the columns are fixed because they are drop down and the user has to pick one of the options.
Whats the correct approach to this problem?
p.s. I also thought about creating a table for each drop down but this would really complex the queries and lead to lots of tables.
Another approach
Profile table
ID | username | age
-------------------
1 | jason | 27
profileDropDown table:
ID | userID | dropdownID
------------------------
1 | 1 | 2
2 | 1 | 7
Drop Down table:
ID | dropdown | option
---------------------
1 | pet | bird
2 | pet | cat
3 | pet | dog
4 | pet | Hampster
5 | season | Winter
6 | Season | Summer
7 | Season | Fall
8 | Season | spring
"Best way to approach" or "correct way" will open up a lot of discussion here, which risks this question being closed. I would recommend creating a drop down table that has a column called "TYPE" or "NAME". You would then put a unique identifier of the drop down in that column to identify that set. Then have another column called "VALUE" that holds the drop down value.
For example:
ID | TYPE | VALUE
1 | PET | BIRD
2 | PET | DOG
3 | PET | FISH
4 | SEASON | FALL
5 | SEASON | WINTER
6 | SEASON | SPRING
7 | SEASON | SUMMER
Then to get your PET drop down, you just select all from this table where type = 'PET'
Will the set of questions (dropdowns) to be asked every user ever be changed? Will you (or your successor) ever need to add or remove questions over time? If no, then a table for users with one column per question is fine, but if yes, it gets complex.
Database purists would require two tables for each question:
One table containing a list of all valid answers for that question
One table containing the many to many relation between user and answer to “this” question
If a new question is added, create new tables; if a question is removed, drop those tables (and, of course, adjust all your code. Ugh.) This would work, but it's hardly efficient.
If, as seems likely, all the questions and answer sets are similar, then a three-table model suggests itself:
A table with one row per question (QuestionId, QuestionText)
A table with one row for each answer for each Question (QuestionId, AnswerId, AnswerText)
A table with one row for each user-answered question (UserId, QuestionId, AnswerId)
Adding and removing questions is straightforward, as is identifying skipped or unanswered questions (such as, if you add a new question a month after going live).
As with most everything, there’s a whole lot of “it depends” behind this, most of which depends on what you want your system to do.

Friendship Website Database Design

I'm trying to create a database for a frienship website I'm building. I want to store multiple attributes about the user such as gender, education, pets etc.
Solution #1 - User table:
id | age | birth day | City | Gender | Education | fav Pet | fav hobbie. . .
--------------------------------------------------------------------------
0 | 38 | 1985 | New York | Female | University | Dog | Ping Pong
The problem I'm having is the list of attributes goes on and on and right now my user table has 20 something columns.
I feel I could normalize this by creating another table for each attribute see below. However this would create many joins and I'm still left with a lot of columns in the user table.
Solution #2 - User table:
id | age | birth day | City | Gender | Education | fav Pet | fav hobbies
--------------------------------------------------------------------------
0 | 38 | 1985 | New York | 0 | 0 | 0 | 0
Pets table:
id | Pet Type
---------------
0 | Dog
Anyone have any ideas how to approach this problem it feels like both answers are wrong. What is the proper table design for this database?
There is more to this than meets the eye: First of all - if you have tons of attributes, many of which will likely be null for any specific row, and with a very dynamic selection of attributes (i.e. new attributes will appear quite frequently during the code's lifecycle), you might want to ask yourself, whether a RDBMS is the best way to materialize this ... essentially non-schema. Maybe a document store would be a better fit?
If you do want to stay in the RDBMS world, the canonical answer is to have either one or one-per-datatype property table plus a table of properties:
Users.id | .name | .birthdate | .Gender | .someotherfixedattribute
----------------------------------------------------------
1743 | Me. | 01/01/1970 | M | indeed
Propertytpes.id | .name
------------------------
234 | pet
235 | hobby
Poperties.uid | .pid | .content
-----------------------------
1743 | 234 | Husky dog
You have a comment and an answer that recommend (or at least suggest) and Entity-Attribute-Value (EAV) model.
There is nothing wrong with using EAV if your attributes need to be dynamic, and your system needs to allow adding new attributes post-deployment.
That said, if your columns and relationships are all known up front, and they don't need to be dynamic, you are much better off creating an explicit model. It will (generally) perform better and will be much easier to maintain.
Instead of a wide table with a field per attribute, or many attribute tables, you could make a skinny table with many rows, something like:
Attributes (id,user_id,attribute_type,attribute_value)
Ultimately the best solution depends greatly on how the data will be used. People can only have one DOB, but maybe you want to allow for multiple addresses (billing/mailing/etc.), so addresses might deserve a separate table.

Resources