My boss has assigned a SQL task to me and as I am new to SQL, I am struggling to know where to start.
Task: Create a Customer table to hold the data written in the #Customer temporary table in the PopulateCustomers stored procedure. This table will also need to have a unique id to ensure multiple instances of the populate functionality can be run concurrently.
I know how to create a table in SQL and I am guessing I can look in the PopulateCustomer stored procedure to know what data will be written in the temp Customer table in order to create columns for the Customer table.
But what I am really struggling with is the concept of a unique Id for a database table. I immediately thought primary key for each row in the table. Which my boss responded no, I didn't want to push for more as not to come across as a newbie.
I have tried to google this myself and all I keep coming up with is pages that tell me about identifiers vs primary keys. But nothing ever tells me about a table having its own unique ID unless its in reference to the rows within the table each having an Identifier or primary key. This is leading me to think that I am not searching for the right key word for what this functionality is.
The closest thing I found was here. http://sqlservercodebook.blogspot.com/2008/03/check-if-temporary-table-exists.html
This query looks to me like its creating a temp table with an id.
CREATE TABLE #temp(id INT)
I have not pasted any of my work queries because I really want to research myself and figure this out. I just want to make sure I am looking in the right direction with what term I need to search for to find out how to create a table that has a unique ID. Or maybe I have misinterpreted the task and there is no such thing.
What I got from your story is that you need a table with an unique id, automatically generated, and use this id as the primary key.
This table can be created like:
create table example
(
id int identity(1,1) primary key clustered,
other_data varchar(200)
)
The key terms here are:
identity - for the id column be auto-incremented
primary key - so SQL Server ensures this column is unique
clustered - for all the data in this table be organized physically by this column (and make it faster to be searched by it)
Imagine there is a social network and here is a table for storing the like (favorite) action and unlike that is deleting from this table:
CREATE TABLE IF NOT EXISTS post_likes(
post_id timeuuid,
liker_id uuid, //liker user_id
like_time timestamp,
PRIMARY KEY ((post_id) ,liker_id, like_time)
) WITH CLUSTERING ORDER BY (like_time DESC);
The above table has problem in Cassandra because when liker_id is the first clustering_key, we can't sort by the second clustering key which is like_time.
We need to sort our tables data by like_time, we use it when a user wants to see who liked this post and we show list of people who liked that post that sorted by time (like_time DESC)
and we also need to delete (unlike) and we again need to have post_id and liker_id
What is your suggestion? How we can sort this table by like_time?
After more researches, I found out this solution:
Picking the right data model is the hardest part of using Cassandra and here is the solution we found for likes tables in Cassandra, first of all, I have to say Cassandra's read and write path is amazingly fast and you don't need to be worry about writing on your Cassandra's tables, you need to model around your queries and remember, data duplication is okay. Many of your tables may repeat the same data. and do not forget to spread data evenly around the cluster and minimize the number of partitions read
Since we are using Cassandra which is NoSQL, we know one of the rules in NoSQLs is denormalization and we have to denormalize data and just think about the queries you want to have; Here for the like table data modeling we will have two tables, these tables have mainly focused on the easy read or easier to say we have focused on queries we want to have:
CREATE TABLE IF NOT EXISTS post_likes(
post_id timeuuid,
liker_id uuid, //liker user_id
like_time timestamp,
PRIMARY KEY ((post_id) ,liker_id)
);
CREATE TABLE IF NOT EXISTS post_likes_by_time(
post_id timeuuid,
liker_id uuid, //liker user_id
like_time timestamp,
PRIMARY KEY ((post_id), like_time, liker_id)
) WITH CLUSTERING ORDER BY (like_time DESC);
When a user like a post, we just insert into both above tables.
why do we have post_likes_by_time table?
In a social network, you should show list of users who liked a post, it is common that you have to sort likes by the like_time DESC and since you are going to sort likes by like_time you need to have like_time as clustering key to be able to sort likes by time.
Then why do we have post_likes table too?
In the post_likes_by_time, our clustering key is like_time, we also need to remove one like! We can't do that when we sorted data in our table when clustering key is like_time. That is the reason we also have post_likes table
Why you could not only have one table and do both actions, sorting and removing on it?
To delete one like from post_likes table we need to provide user_id (here liker_id) and post_id (together) and in post_likes_by_time we have like_time as clustering key and we need to sort table by like_time, then it should be the first clustering key and the second clustering key could be liker_id, and here is the point! like_time is the first clustering key then for selecting or deleting by liker_id you also need to provide like_time, but you do not have like_time most of the times.
I am making a Django web app and need help designing the a table within the DB.
I am to insert into the table an employee with a specific employee ID. Lets say there are three employees with the IDs (15039, 98443, 29234). Would the employee ID be the primary key or do I have to make some arbitrary column starting from 1 the primary id with employee id as a standalone column?
In a sense what I am i asking is if the 15039, 98443, and 29234 employees were inserted into the table with empl ID being primary key which order would the DMBS order them?
You did not specify which database you will use, but most likely the primary key will be the clustered index, in which case the database will order the rows by that id.
Many argue you should always create an auto-increment artifical primary key, and that usually saves you a lot of pain in the long run.
However, if you know the value will always be unique and you won't ever need to change the value, you can opt to use it as the PK for the table.
I try to create an index view, and unique clustered index on the view. My problem is that how to generate a primary key within a select clause. E.g.
Create view ssrs.vMyView
with schemabinding
as
select firstname, lastname, other columns --example columns
from mytable
how to generate a primary key for each row on the fly?
Update
The problem is that it does NOT have unique columns or combination of columns, so I need to generate a unique id on the fly. Firstname and lastname are just example. There are primary key for the base table.
Thanks in advance!
Once you've created this view, if you obeyed all the rules and requirements for an indexed view, you should be able to just create the clustered index like this:
CREATE CLUSTERED INDEX cix_vMyView ON dbo.vMyView(....)
You need to choose a good, valid clustering key - preferably according to the NUSE principle:
narrow
unique
static
ever-increasing
An INT IDENTITY would be perfect - or something like a BIGINT or a combination of INT and DATETIME.
Update: seeing that your base table doesn't even have a primary key (THAT's a much bigger problem you'll need to fix ASAP!! If it doesn't have a primary key, it's not a table), you could use something like ROW_NUMBER() in your view definition:
CREATE VIEW ssrs.vMyView
WITH SCHEMABINDING
AS
SELECT firstname, lastname,
ROW_NUMBER() OVER(ORDER BY Lastname, FirstName) AS 'ID'
FROM dbo.mytable
to give you an "artificial" unique, ever-increasing primary key.
(Update 2014-Apr-25: unfortunately, contrary to my belief at the time of posting this, this won't work since you cannot create a clustered index on a view that contains a ranking function like ROW_NUMBER .....). Thanks to #jspaey for pointing that out. So this makes it even more important to have a primary key on the base tables and include that in your view definition!)
But again: if your base table doesn't have a primary key - fix that first !!
Update #2: ok, so your base table(s) does have a primary key after all - then why isn't that part of your view definition? I would always include all the primary keys from all base tables in my views - only those PK enable you to clearly identify rows from the base table, and they allow you to make your views updateable.
Pingpong, Marc is right that you need something that is unique to add a primary key. Remember that this does not need to be a single column, so if you have two columns that are unique together that would work perfectly well.
If no combination of columns is unique, you probably wish to rethink your view or even add columns so that there is something unique.
As a related note, remember that Enterprise edition will take advantage of indexed views automatically. But outside of Enterprise Edition, you may need to explicitly tell the optimizer to use the index through the noexpand hint. I wrote about that previous at On Indexes and Views
Although I'm guilty of this crime, it seems to me there can't be any good reason for a table to not have an identity field primary key.
Pros:
- whether you want to or not, you can now uniquely identify every row in your table which previously you could not do
- you can't do sql replication without a primary key on your table
Cons:
- an extra 32 bits for each row of your table
Consider for example the case where you need to store user settings in a table in your database. You have a column for the setting name and a column for the setting value. No primary key is necessary, but having an integer identity column and using it as your primary key seems like a best practice for any table you ever create.
Are there other reasons besides size that every table shouldn't just have an integer identity field?
Sure, an example in a single-database solution is if you have a table of countries, it probably makes more sense to use the ISO 3166-1-alpha-2 country code as the primary key as this is an international standard, and makes queries much more readable (e.g. CountryCode = 'GB' as opposed to CountryCode = 28). A similar argument could be applied to ISO 4217 currency codes.
In a SQL Server database solution using replication, a UNIQUEIDENTIFIER key would make more sense as GUIDs are required for some types of replication (and also make it much easier to avoid key conflicts if there are multiple source databases!).
The most clear example of a table that doesn't need a surrogate key is a many-to-many relation:
CREATE TABLE Authorship (
author_id INT NOT NULL,
book_id INT NOT NULL,
PRIMARY KEY (author_id, book_id),
FOREIGN KEY (author_id) REFERENCES Authors (author_id),
FOREIGN KEY (book_id) REFERENCES Books (book_id)
);
I also prefer a natural key when I design a tagging system:
CREATE TABLE Tags (
tag VARCHAR(20) PRIMARY KEY
);
CREATE TABLE ArticlesTagged (
article_id INT NOT NULL,
tag VARCHAR(20) NOT NULL,
PRIMARY KEY (article_id, tag),
FOREIGN KEY (article_id) REFERENCES Articles (article_id),
FOREIGN KEY (tag) REFERENCES Tags (tag)
);
This has some advantages over using a surrogate "tag_id" key:
You can ensure tags are unique, without adding a superfluous UNIQUE constraint.
You prevent two distinct tags from having the exact same spelling.
Dependent tables that reference the tag already have the tag text; they don't need to join to Tags to get the text.
Every table should have a primary key. It doesn't matter if it's an integer, GUID, or the "setting name" column. The type depends on the requirements of the application. Ideally, if you are going to join the table to another, it would be best to use a GUID or integer as your primary key.
Yes, there are good reasons. You can have semantically meaningful true keys, rather than articificial identity keys. Also, it is not a good idea to have a seperate autoincrementing primary key for a Many-Many table. There are some reasons you might want to choose a GUID.
That being said, I typically use autoincrementing 64bit integers for primary keys.
Every table should have a primary key. But it doesn't need to be a single field identifier. Take for example in a finance system, you may have the primary key on a journal table being the Journal ID and Line No. This will produce a unique combination for each row (and the Journal ID will be a primary key in its own table)
Your primary key needs to be defined on how you are going to link the table to other tables.
I don't think every table needs a primary key. Sometimes you only want to "connect" the contents of two tables - via their primary key.
So you have a table like users and one table like groups (each with primary keys) and you have a third table called users_groups with only two colums (user and group) where users and groups are connected with each other.
For example a row with user = 3 and group = 6 would link the user with primary key 3 to the group with primary key 6.
One reason not to have primary key defined as identity is having primary key defined as GUIDs or populated with externally generated values.
In general, every table that is semantically meaningful by itself should have primary key and such key should have no semantic meaning. A join table that realizes many-to-many relationship is not meaningful by itself and so it doesn't need such primary key (it already has one via its values).
To be a properly normalised table, each row should only have a single identifiable key. Many tables will already have natural keys, such a unique invoice number. I agree, especially with storage being so cheap, there is little overhead in having an autonumber/identity key on all tables, but in this instance which is the real key.
Another area where I personally don't use this approach if for reference data, where typically we have a Description and a Value
Code, Description
'L', 'Live'
'O', 'Old'
'P', 'Pending'
In this situation making code a primary key ensures no duplicates, and is more human readable.
The key difference (sorry) between a natural primary key and a surrogate primary key is that the value of the natural key contains information whereas the value of a surrogate key doesn't.
Why is this important? Well a natural primary key is by definition guaranteed to be unique, but its value is not usually guaranteed to stay the same. When it changes, you have to update it in multiple places.
A surrogate key's value has no real meaning and simply serves to identify that row, so it never needs to be changed. It is a feature of the model rather than the domain itself.
So the only place I would say a surrogate key isn't appropriate is in an association table which only contains columns referring to rows in other tables (most many-to-many relations). The only information this table carries is the association between two (or more) rows, and it already consists solely of surrogate key values. In this case I would choose a composite primary key.
If such a table had bag semantics, or carried additional information about the association, I would add a surrogate key.
A primary key is ALWAYS a good idea. It allows for very fast and easy joining of tables. It aides external tools that can read system tables to make join allowing less skilled people to create their own queries by drag-and-drop. It also makes the implementation of referential integrity a breeze and that is a good idea from the get go.
I know for sure that some very smart people working for web giants do this. While I don't know why their own reasons, I know 2 cases where PK-less tables make sense:
Importing data. The table is temporary. Insertions and whole table scans need to be as fast as possible. Also, we need to accept duplicate records. Later we will clean the data, but the import process needs to work.
Analytics in a DBMS. Identifying a row is not useful - if we need to do it, it is not analytics. We just need a non-relational, redundant, horrible blob that looks like a table. We will build summary tables or materialized views by writing proper SQL queries.
Note that these cases have good reasons to be non-relational. But normally your tables should be relational, so... yes, they need a primary key.