I have this table and I don't know how to normalize.
1st table name, address, landline no., mobile no., e-mail ad, amount, and registration number https://docs.google.com/file/d/0B99TeByt30n2dDVLVHV4dU1yRFE/edit
and in the second table will be their monthly pledges. https://docs.google.com/file/d/0B99TeByt30n2TVh4c1dmLTFYOWs/edit
PersonTable
PersonId int primary key
Name varchar(50)
Addresss varchar(50)
etc.
PledgeTable
PledgeId int primary key,
PersonId int foreign key references PersonTable(PersonId)
PledgeDate datetime
PledgeAmount decimal(10,2)
Ok.
You have to re engineer the whole idea.
I will give you a few concepts to understand what to do.
Keep in mind that we must NOT to have data duplicates.
So an array can be named Customer.
Customer can have a customer id column, named c_id for instance.
c_id must be defined as integer UNIQUE Auto_Increment NOT NULL
That means that it's an attribute that grants each customer its uniqueness.
A customer might have same name and same surname with another customer, but never same c_id.
Auto increment means that the first entry will be automatically numbered with c_id = 1, the next 2 etc etc.
Keep in mind that if you pass a record in the database using PHP, you enter null as value for c_id. Auto Incremet does the job then.
So you got a customer table and its first attribute that will be defined as the Primary key.
For example here is a small table:
CREATE TABLE customer
(c_id integer UNIQUE Auto_Increment NOT NULL,
c_name varchar(100),
c_surname varchar(100),
c_address varchar(100),
PRIMARY KEY (c_id)
);
You have to add all the attributes the customer has in the table.
That was just an example.
PRIMARY KEY (c_id) line in the end,
sets c_id as the primary key that distinguishes each record as unique.
Then you got another table.
A Pledge table.
CREATE TABLE pledge
(pl_id integer UNIQUE Auto_Increment NOT NULL,
pl_date date,
pl_price double(9,2),
pl_c_id integer,
PRIMARY KEY (pl_id),
FOREIGN KEY (pl_c_id) REFERENCES customer (c_id));
What is new here?
Line:
pl_c_id integer,
and line:
FOREIGN KEY (pl_c_id) REFERENCES customer (c_id)
What happens here, is that you create a column, that will contain an existing c_id!
That way, you make a reference to a customer, by using his/her UNIQUE c_id.
This is defined as integer, so it can fit the key.
pl_c_id MUST be the same type as the primary key of the other table. Ok?
Also SQL will know what key we refer to, by reading the line:
FOREIGN KEY (pl_c_id) REFERENCES customer (c_id)
In plain English that means:
pl_c_id can be filled with values already declared in a Primary Key of another Table, named "customer" that uses as primary key the column named "c_id".
Got it?
Now you got the flexibility to use ANY date.
That is more normalized than what you got.
Usually a 3NF (Third Normal Form) you will be ok.
Oh! Not to mention that you can Google "3NF" or "Third Normal Form" and get nice results for your reference.
;)
Cheers.
Edit: Made this reply more simple to understand.
Related
I am talking about the normalization of a primary key. So let's say my primary key column is of type nvarchar, which violates the rules of normalization. After removing the primary key constraint and the identity specification from the desired column. I need to create a new column which will be the new primary key of that table.
My question is, what should happen with the previous primary key?
I've got an answer that sounds like: "the column should became a semantic key", but i can't understand this answer.
It's not unusual when designing a database schema to use a SURROGATE primary key. The idea is to give each record a unique and permanent identifier so it can be easily referenced by applications and foreign keys. This key has no meaning. Knowing the surrogate key gives you no information about the content of the record. The user of your application would never see this value.
On the other hand, your record may have a SEMANTIC primary key. This is a unique value that identifies this data to that makes sense to the user.
For example, let's say you have a table of Employees. The employer assigns each employee a unique Employee ID Number. Let's say you store this value as a string. To the user that value serves as the unique identifier that refers to that employee. Meanwhile, your table may have a numeric column that serves as the unique identifier for that record.
create table Employee ( EmployeeRecordID int identity(1,1) primary key,
EmployerAssignedID nvarchar(12),
EmployeeName nvarchar(60),
Salary money )
insert into Employee ( EmployerAssignedID, EmployeeName, Salary ) values
( '#ABC100', 'Fred', 25000.12 ),
( '#AZZ314', 'Mary', 37700.00 ),
( '#MAA719', 'Fran', 34444.04 ),
( '#MZA977', 'Mary', 36000.00 )
As each record is added, SQL Server generates a unique EmployeeRecordID for each record, starting with 1. This is the SURROGATE key. Within your database and within your application, you would use this value to reference the record.
But when your application is communicating with the users, you would use the EmployerAssignedID. This is the SEMANTIC primary key. It makes sense to your users to use this value to search for a particular employee.
A primary key is no more than a unique index which can't have NULL value as a key. Like any of indexes it can be clustered or nonclustered.
Deleting a clustered index makes table become a heap with changes in structure and behaviour. Deleting a nonclustered index is just deallocation its space and does not affect that table and other indexes on the table as well.
So after deleting you just have a column(s) with unique values and you are able to consider them as a semantic key until some duplicate values are inserted.
Suppose there is a table called Accounts:
CREATE TABLE Accounts
(
[Id] int not null primary key identity(1,1)
[Username] varchar(20) not null unique,
[Password] varchar(20) not null
)
Then, there is another tabled called Characters. Each account can have N characters. So I can use a foreign key to link these characters.
CREATE TABLE Characters
(
[AccountId] int not null foreign key references Accounts([Id]),
[Id] int not null primary key identity(1,1),
[Nickname] varchar(20) not null unique,
[Level] int not null default 0,
)
Each character can have multiple equipments (inventory), so there is a Equipments table.
Since each equipment is linked to a character, I should use foreign key again, and there comes the problem.
Me and my coworker were arguing about which foreign key to use.
Since each character has a unique Id, I told him that we could use foreign key to that Id and that would be enough. As follows:
CREATE TABLE Equipments
(
[CharId] int not null foreign key references Characters([Id]),
[ItemId] int not null
)
He told me that we must use a foreign key to the character id AND the account id, as follows:
CREATE TABLE Equipments
(
[AccountId] int not null foreign key references Accounts([Id]), /*is this necessary?*/
[CharId] int not null foreign key references Characters([Id]),
[ItemId] int not null
)
I'm not expert in Sql Server and in my opinion, the foreign key to the account id is completely unecessary but he keeps telling me that we must use it and it will help performance because the more foreign key you use, it will be better.
So, should I use foreign key to account id and character id or character id is good enough?
As you said, there is a one-to-many relationship between Account and Character (and hence, a character cannot belong to more than one account).
Similarly, as you described, each record in Equipments only corresponse to a unique record in Characters. The relation from Account to Equipments hence can be inferred, and so, there is no need to create an extra column in the Equipments table. Also, the data integrity is preserved just by the two foreign keys already created, so that should not be a problem when you go without the AccountId column in the Equipments table.
Regarding the performance argument, this is a case-by-case situation, and it depends on a lot of other things (number of records, business logic,...). Having unnecessary foreign key can even hurt performance since the database/server will need to maintain that foreign key while operate. Also, I found that if you do not have the key and when you find out that you need it, it is easier to add one in than to remove an existing one, especially when you have to create a whole new column for this one (this last piece is a mere personal opinion).
You should use it only if you plan to interrogate equipment directly for an account which is faster than joining with account via char. Otherwise, no, you shouldn't use it.
You are correct, but for a more important reason.
If you include Accountid in the Equipments table, then you have a second relationship to the Accounts table. Perhaps this is allowed, but in all likelihood, you intend to have the Characters.AccountId be the account id for a row in Equipments.
You would then get the appropriate account id by using a join to the Equipments table.
I'm having problems with adding another primary key to my table.
I have 3 columns:
Account ID (Identity)
EmailID
Data field
When I made the table I had this to make the Account ID and the Email ID unique.
PRIMARY KEY (AccountID, EmailID)
I thought that would make my emailid unique, but then after I tried inserting another row with the same emailid it went through.
So I thought I missed something out.
Now for my question:
IF, I had to use alter, how do I alter the table/PK Constraint to modify the EmailID field and make it Unique?
IF I decided to drop the table and made a new one, how do I make those two primary keys unique?
You may ALTER the table and add a new UNIQUE CONSTRAINT on the EmailID column.
-- This will create a constraint which enforces that the field EmailID
-- have unique values
ALTER TABLE Your_Table_Name
ADD CONSTRAINT unique_constraint_name UNIQUE (EmailID)
It's worth noting though, that altering the table to add this new unique constraint doesn't mean that you have to drop the other PRIMARY KEY constraint that you have added for the (AccountID, EmailID) pair. That is, of course, unless your business logic dictates it.
When you make the grouping of (AcountID, EmailID) the PRIMARY KEY it specifies that both the AcountID and EmailID participate in uniquely identifying each individual record in that table. So, that means that you could have the following records in the table:
AccountID | EmailID | Other Fields
----------------------------------------------------------
100 | user#company.com | ....
101 | user2#othermail.com | ....
100 | user_alternate#mail.com | ....
In the previous example it is possible to have two records with the same AccountID, and that is valid because the PRIMARY KEY specifies that only the (AccountID, EmailID) pair has to be unique - which it is. It makes no stipulation about AccountID being unique independently.
In conclusion, you probably want to add yet another UNIQUE constraint on AccountID. Or simply make the AccountID alone the PRIMARY KEY and then add a UNIQUE constraint on EmailID.
If both AccountID and EmailID are candidate keys then only one can be the PK the other one will need a unique constraint.
From the POV of SQL Server it doesn't matter which one you choose as the PK. Foreign Key's can reference either the PK or a unique constraint but given that the PK is the clustered index by default it probably makes sense to choose AccountID as this is presumably narrower and more stable.
It sounds like an incorrect Primary key. It's more likely that emailID is intended to be your natural key but for some reason (maybe a development standard in your organization?) you want to use a surrogate ID, AccountID but you still intend for both email ID and surrogate ID to both be unique and have a one to one relationship. If this is true then your primary key should be AccountID and you should place a unique constraint on EmailID.
If you were to recreate the table, it could look like this. I assumed EmailID was referencing an email table instead of being an email address.
CREATE TABLE dbo.AccountEmails
(
AccountID int not null identity(1,1),
EmailID int not null,
Data varchar(max) null,
constraint PK_AccountEmails PRIMARY KEY //this is a unique single column primary key
(
AccountID
),
constraint FK_AccountEmails_EmailID FOREIGN KEY dbo.Email(EmailID) ON //this makes sure EmailID exists in the Email table
(
EmailID
),
constraint UQ_AccountEmails_EmailID UNIQUE //unique single column unique constraint
(
EmailID
),
constraint UQ_AccountEmails_AccountID_EmailID UNIQUE //the combination of AccountID and EmailID is also unique
(
AccountID,
EmailID
)
)
Given the fact that AccountID and EmailID are both seperately unique, I'm not sure UQ_AccountEmails_AccountID_EmailID is really necessary.
My teach said I should combine two foreign keys into a single primary key. But my thought process is that that would allow for only one combination of each foreign key.
Imagine I have a Product, Purchase, PurchaseDetail.
In PurchaseDetail I have two foreign keys, one for product and one for purchase. My teacher said that I should combine these two foreign keys into a single one. But can't a product be in many different purchases? And many purchases have many products?
I'm confused.
Thanks!
Edit: This is the SQL my teacher saw and then gave feedback upon. Thanks for the guidance guys. (I changed the essential to English)
create table Purchase
(
ID int primary key identity(1,1),
IDCliente int foreign key references Cliente(ID),
IDEmpleado int foreign key references Empleado(ID),
Fecha datetime not null,
Hora datetime not null,
Amount float not null,
)
create table PurchaseDetail
(
ID int primary key identity(1,1),
IDPurchase int foreign key references Purchase(ID),
IDProductOffering int foreign key references ProductOffering(ID),
Quantity int not null
)
create table Product
(
ID int primary key identity(1,1),
IDProveedor int foreign key references Proveedor(ID),
Nombre nvarchar(256) not null,
IDSubcategoria int foreign key references Subcategoria(ID),
IDMarca int foreign key references Marca(ID),
Fotografia nvarchar(1024) not null
)
create table ProductOffering
(
ID int primary key identity(1,1),
IDProduct int foreign key references Product(ID),
Price float not null,
OfferDate datetime not null,
)
Maybe I'm confused about good database schema design. Thanks again!
I imagine he's suggesting:
Product - one primary key (product id), which implies a unique product id
Purchase - one primary key (purchase id), which implies a unique purchase id
PurchaseDetail - two foreign keys (product id),(purchase id), plus one unique constraint on (product id + purchase id)
Plus some people argue that all tables should have their own primary key that doesn't depend on anything else (purchase detail id). Some DBMS make this mandatory.
This means that you can't have two rows in PurchaseDetail that have the same product and purchase. That makes sense, assuming there is also a quantity column on PurchaseDetail, so that one purchase can have more than one of each product.
Note that there is a difference between a unique constraint and a foreign key. A foreign key merely says that there should be an item with that id in the parent table - it will let you create as many references to that item as you want in the child table. You need to specify that the column or combination of columns are unique if you want to avoid duplicates. A primary key on the other hand implies a unique constraint.
Exact syntax for defining all of this varies by language, but those are the principles.
I don't agree with the single key, but they could be a compound key (which I tend to dislike). They can be two different fields each restricted to the ID in the corresponding tables.
Not sure why the same product iD would need to be listed more than once for a single purchase? Isn't that why you indicate quantity? Maybe the need to do a separate line item for a purchase and a discount?
I believe thelem has answered correctly. But there is another option. You could add a new primary key column to the details table, so it looks like this:
detail_id int (PK)
product_id int (FK)
purchsae_id int (FK)
This is not really necessary, but it could be useful if you need to ever need to reference the details table as a foreign key - having a single primary key field makes for smaller indexes and foreign key reference (and they are a little easier to type).
That depends on what data you need to represent.
If you use the two foreign keys as the primary key for the purchase detail, a product may only occur once in each purchase. A purchase may however still contain many products, and a product may still occur in many purchases.
If the purchase detail contains more information, you may need to be able to use a product more than once in a purchase. For example if the purchase detail contains size and color, and you want to by a red T-shirt size XL and a blue T-shirt size S.
Perhaps he is suggesting a many-to-many table where it's Primary Key is comprised of the Foreign Keys to the mapped tables:
PurchaseDetail:
ProductId int (FK)
PurchaseId int (FK)
PK(ProductId, PurchaseId)
This can also be modelled as
PurchaseDetail:
PurchaseDetailId int (PK, Identity)
ProductId int (FK)
PurchaseId int (FK)
The second form is useful if you want to refer to Purchase details elsewhere in your model, and also in some RDBMS's it is beneficial to have a PK on a montonically increasing integer.
I'm thinking of designing a database schema similar to the following:
Person (
PersonID int primary key,
PrimaryAddressID int not null,
...
)
Address (
AddressID int primary key,
PersonID int not null,
...
)
Person.PrimaryAddressID and Address.PersonID would be foreign keys to the corresponding tables.
The obvious problem is that it's impossible to insert anything into either table. Is there any way to design a working schema that enforces every Person having a primary address?
"I believe this is impossible. You cannot create an Address Record until you know the ID of the person and you cannot insert the person record until you know an AddressId for the PrimaryAddressId field."
On the face of it, that claim seems SO appealing. However, it is quite propostrous.
This is a very common kind of problem that the SQL DBMS vendors have been trying to attack for perhaps decades already.
The key is that all constraint checking must be "deferred" until both inserts are done. That can be achieved under different forms. Database transactions may offer the possibility to do something like "SET deferred constraint checking ON", and you're done (were it not for the fact that in this particular example, you'd likely have to mess very hard with your design in order to be able to just DEFINE the two FK constraints, because one of them simply ISN'T a 'true' FK in the SQL sense !).
Trigger-based solutions as described here achieve essentially the same effect, but those are exposed to all the maintenance problems that exist with application-enforced integrity.
In their work, Chris Date & Hugh Darwen describe what is imo the true solution to the problem : multiple assignment. That is, essentially, the possibility to compose several distinct update statements and have the DBMS act upon it as if that were one single statement. Implementations of that concept do exist, but you won't find any that talks SQL.
This is a perfect example of many-to-many relationship. To resolve that you should have intermediate PERSON_ADDRESS table. In other words;
PERSON table
person_id (PK)
ADDRESS table
address_id (PK)
PERSON_ADDRESS
person_id (FK) <= PERSON
address_id (FK) <= ADDRESS
is_primary (BOOLEAN - Y/N)
This way you can assign multiple addresses to a PERSON and also reuse ADDRESS records in multiple PERSONs (for family members, employees of the same company etc.). Using is_primary field in PERSON_ADDRESS table, you can identify if that person_addrees combination is a primary address for a person.
We mark the primary address in our address table and then have triggers that enforces only record per person can have it (but one record must have it). If you change the primary address, it will update the old primary address as well as the new one. If you delete a primary address and other addresses exist, it will promote one of them (basesd ona series of rules) to the primary address. If the address is inserted and is the first address inserted, it will mark that one automatically as the primary address.
The second FK (PersonId from Address to Person) is too restrictive, IMHO. Are you suggesting that one address can only have a single person?
From your design, it seems that an address can apply to only one person, so just use the PersonID as the key to the address table, and drop the AddressID key field.
I know I'll probably be crucified or whatever, but here goes...
I've done it like this for my "particular very own unique and non-standard" business need ( =( God I'm starting to sound like SQL DDL even when I speak).
Here's an exaxmple:
CREATE TABLE IF NOT EXISTS PERSON(
ID INT,
CONSTRAINT PRIMARY KEY (ID),
ADDRESS_ID INT NOT NULL DEFAULT 1,
DESCRIPTION VARCHAR(255),
CONSTRAINT PERSON_UQ UNIQUE KEY (ADDRESS_ID, ...));
INSERT INTO PERSON(ID, DESCRIPTION)
VALUES (1, 'GOVERNMENT');
CREATE TABLE IF NOT EXISTS ADDRESS(
ID INT,
CONSTRAINT PRIMARY KEY (ID),
PERSON_ID INT NOT NULL DEFAULT 1,
DESCRIPTION VARCHAR(255),
CONSTRAINT ADDRESS_UQ UNIQUE KEY (PERSON_ID, ...),
CONSTRAINT ADDRESS_PERSON_FK FOREIGN KEY (PERSON_ID) REFERENCES PERSON(ID));
INSERT INTO ADDRESS(ID, DESCRIPTION)
VALUES (1, 'ABANDONED HOUSE AT THIS ADDRESS');
ALTER TABLE PERSON ADD CONSTRAINT PERSON_ADDRESS_FK FOREIGN KEY (ADDRESS_ID) REFERENCES ADDRESS(ID);
<...life goes on... whether you provide and address or not to the person and vice versa>
I defined one table, then the other table referencing the first and then altered the first to reflect the reference to the second (which didn't exist at the time of the first table's creation). It's not meant for a particular database; if I need it I just try it and if it works then I use it, if not then I try to avoid having that need in the design (I can't always control that, sometimes the design is handed to me as-is). if you have an address without a person then it belongs to the "government" person. If you have a "homeless person" then it gets the "abandoned house" address. I run a process to determine which houses have no users