I have a database that has 50 tables and all the tables have primary key on a field named ID. For example, Employee.ID, Customer.ID, order.ID, every single table has ID as its primary key. Should it not be Employee.Employee_ID, Customer.Customer_ID and so on?
Is there any drawback of using ID as name of every ID field in each table? If so please explain or give link to explanation.
I'd use Employee.Id and Customer.Id with the qualifying table names. Employee.EmployeeId and Customer.CustomerId seem a bit redundant to me.
There are different schools of thought on this. Personally, I prefer to use identical column names only for columns that have primary key / foreign key relationships -- it helps make it easier to write complex joins.
Also, the apparent redundancy disappears when you use table aliases. For example:
SELECT o.OrderDate, c.CustomerName
FROM Customers c
INNER JOIN Orders o ON o.CustomerID = c.CustomerID
It's largely a matter of personal style.
However, if your queries always use appropriate aliases for your tables, and you always use aliases when referencing columns, then it's not too bad.
Please don't fall into the habit (as many do) of saying "There's only one Customer_EmployeeID field in my query, so I'll leave off the alias". It's really ordinary, and drives SQL people crazy when they look at code that's done this way (you end up wanting to query sys.columns to see which table contains a column called latest_status).
So if you're writing your queries nicely, it shouldn't really matter how you name your columns, so long as you're consistent. Using a plain old "ID" for any integer identity field is just fine, so long as that's what you always do.
I typically follow the same naming conventions (Entity.Id rather than Entity.EntityId).
If you think of your tables as entities, adding the table name to the ID field is redundant, forces you to write more code, and makes your queries harder to read.
Related
First of all a little bit of context:
TableA as ta
TableB as tb
One 'ta' has many 'tb', but one 'tb' can only be owned by one 'ta'.
I'd often just ad a FK to 'tb' pointing to 'ta' and it's done. Now i'm willing to model it differently (to improve it's readability); i want to use a join table, be it named 'ta_tb' and set a PK to 'tb_id', to enforce the 'one-to-many' clause.
Are there any performance issues when using the approach b in spite of approach a?
If this is a clear 1:n-relation (and ever will be!) there is no need (and no advantage) of a new table in between.
Such a joining table you would use to build a m:n-relation.
There could be one single reason to use a joining table with a 1:n-relation: If you want to manage additional data specifying details of this relation.
HTH
Whenever you normalize your database, there is always a performance hit. If you do a join table (or sometimes referred to as a cross reference) the dbms will need to do work to join the right records.
DBMS's these days do pretty well with creating indexes and reducing these performance hits. It just depends on your situation.
Is it more important to have readability and normalization? Then use a join/xref table.
Is this for a small application that you want to perform well? Just make Table B have a FK to its parent.
If you index correctly, there should be very little performance impact although there will be a very slight extra cost that is likely not noticeable unless your database is maxed out already.
However, if you do this and want to maintain the idea that each id from column b must have one and only 1 a, then you need to make sure to put a unique index on the id for table b in the join table. Later if you need to change that to a many to many relationship, you can remove the index which is certainly easier than changing the database structure. Otherwise this design puts your data integrity at risk.
So the join table might be the structure I woudl recommend if I knew that eventually a many to many relationship was possible. Otherwise, I would probably stick with the simpler structure of the FK in table b.
Yes, at least you will need one more join to access TableB fields and this will impact the performance. (there is a question regarding this here When and why are database joins expensive?)
Also relations that uses a join table is known as many to many, in your case you have a PK on the middle table "making" this relation one to many, and this is less readable than the approach A.
Keep it simple as possible, the approach A is perfect for one to many relationships.
I'm just curious here. If I have two tables, let's say Clients and Orders.
Clients have a unique and primary key ID_Client. Orders have an ID_Client field also and a relation to maintain integrity to Client's table by ID_Client field.
So when I want to join both tables i do:
SELECT
Orders.*, Clients.Name
FROM
Orders
INNER JOIN
Clients ON Clients.ID_Client = Orders.ID_Client
So if I took the job to create the primary key, and the relation between the tables,
Is there a reason why I need to explicitly include the joined columns in on clause?
Why can't I do something like:
SELECT
Orders.*, Clients.Name
FROM
Orders
INNER JOIN
Clients
So SQL should know which columns relate both tables...
I had this same question once and I found a great explanation for it on Database Administrator Stack Exchange, the answer below was the one that I found to be the best, but you can refer to the link for additional explanations as well.
A foreign key is meant to constrain the data. ie enforce
referential integrity. That's it. Nothing else.
You can have multiple foreign keys to the same table. Consider the following where a shipment has a starting point, and an ending point.
table: USA_States
StateID
StateName
table: Shipment
ShipmentID
PickupStateID Foreign key
DeliveryStateID Foreign key
You may want to join based on the pickup state. Maybe you want to join on the delivery state. Maybe you want to perform 2 joins for
both! The sql engine has no way of knowing what you want.
You'll often cross join scalar values. Although scalars are usually the result of intermediate calculations, sometimes you'll have a
special purpose table with exactly 1 record. If the engine tried to
detect a foriegn key for the join.... it wouldn't make sense because
cross joins never match up a column.
In some special cases you'll join on columns where neither is unique. Therefore the presence of a PK/FK on those columns is
impossible.
You may think points 2 and 3 above are not relevant since your questions is about when there IS a single PK/FK relationship
between tables. However the presence of single PK/FK between the
tables does not mean you can't have other fields to join on in
addition to the PK/FK. The sql engine would not know which fields you
want to join on.
Lets say you have a table "USA_States", and 5 other tables with a FK to the states. The "five" tables also have a few foreign keys to
each other. Should the sql engine automatically join the "five" tables
with "USA_States"? Or should it join the "five" to each other? Both?
You could set up the relationships so that the sql engine enters an
infinite loop trying to join stuff together. In this situation it's
impossible fore the sql engine to guess what you want.
In summary: PK/FK has nothing to do with table joins. They are separate unrelated things. It's just an accident of nature that you
often join on the PK/FK columns.
Would you want the sql engine to guess if it's a full, left, right, or
inner join? I don't think so. Although that would arguably be a lesser
sin than guessing the columns to join on.
If you don't explicitly give the field names in the query, SQL doesn't know which fields to use. You won't always have fields that are named the same and you won't always be joining on the primary key. For example, a relationship could be between two foreign key fields named "Client_Address" and "Delivery_Address". In that case, you can easily see how you would need to give the field name.
As an example:
SELECT o.*, c.Name
FROM Clients c
INNER JOIN Orders o
ON o.Delivery_Address = c.Client_Address
Is there a reason why do i need to explicit include then joinned fields in on clause?
Yes, because you still need to tell the database server what you want. "Do what I mean" is not within the capabilities of any software system so far.
Foreign keys are tools for enforcing data integrity. They do not dictate how you can join tables. You can join on any condition that is expressible through an SQL expression.
In other words, a join clause relates two tables to each other by a freely definable condition that needs to evaluate to true given the two rows from left hand side and the right hand side of the join. It does not have to be the foreign key, it can be any condition.
Want to find people that have last names equal to products you sell?
SELECT
Products.Name,
Clients.LastName
FROM
Products
INNER JOIN Clients ON Products.Name = Clients.LastName
There isn't even a foreign key between Products and Clients, still the whole thing works.
It's like that. :)
The sql standard says that you have to say on which columns to join. The constraints are just for referential integrity. With mysql the join support "join table using (column1, column2)" but then those columns have to be present in both tables
Reasons why this behaviour is not default
Because one Table can have multiple columns referencing back to one column in another table.
In a lot of legacy databases there are no Foreign key constraints but yet the columns are “Supposed to be” referencing some column in some other table.
The join conditions are not always as simple as A.Column = B.Column . and the list goes on…….
Microsoft developers were intelligent enough to let us make this decision rather than them guessing that it will always be A.Column = B.Column
There are 2 tables A & B. Each has say 10 colums.
Table A has 8 columns as FK to other tables. Table B uses enums and std colunms without any FK.
So which table is faster / better to
use?
If i do any action with table A,
i assume I only have to touch colunms
I am relating the action too and do
not have to join all the 10 FK tables
even if i only need 1 FK colunm?
If i do
need to perform any action on a FK,
like write, update or delete a value,
do i need to join to the parent
table?
If i understand correctly,
EAV model is better than a expanded
colunm table because if i need to
display two text from the table then
i need to use a inner join for the
colunm table for for a EAV table i
can use a regular select only with no join?
For only a few values and if the amount of values doesn't change, ENUM can be faster and takes up less space. However, to later add possible values, you'll need to alter the entire table, which is not good design. Table A is in most cases the better option.
Offcourse you only join the table A with the tables you need.
No, you can just modify the table containing the value, unless you change the PK. You should however design your tables in such way that changing the PK is not often needed - use artificial PK's (autoincrements are perfect). Even countries cease to exist or change names...
No, for your EAV you'll need the join. However, joining on keys is extremely fast... this is what relational databases are all about, it's their strong point.
why in the database tables most of the times primery key name is:
tablename_id and not id?
example: user_id and not just id.
Thanks
A sound principle of database naming schemes is that a name should uniquely identify the thing being named within the domain of discourse (see ISO 11179 for example). It is also a convention in relational database design that a foreign key should have the same name as the candidate key that it references.
"id" is certainly a poor name for anything because it tells us nothing about the attribute being named and it is presumably unlikely to be unique within the database.
I use a similar convention: TableNameID because:
foreign keys can have the same column name, which helps when you have lots of tables
selecting columns form multiple tables with the same name forces you to use an alias, so naming the column more uniquely helps.
My tables are not mapped to objects in the application, so don't need to consider that.
What makes you think they are always named something_id? Sometimes I use something_KEY like if I am building dimension tables. I don't think I have ever made a key/id with tablename_id. In short, I think it depends on the situation, the architect, and your standards (i.e., all caps, underscores, etc).
This is totally a local convention, and will vary dramatically depending on your company's practice (or the project you're working with).
As for me, I pretty much always choose "ID" because I'm very comfortable with the table aliasing idiom in SQL queries. For me select p.id from products p is more comfortable than select product_id from products. But I am not everybody else. I value consistency and I prefer to have as close as possible a mapping between my object model and my database; from my perspective, using "id" is more predictable across the domain. Some people prefer to have unique-as-possible names so that they don't accidentally refer to the wrong "id" by accident.
I wish those who label PK columns as ID would offer a list of benefits.
You get responses like this:
it is very easy to just prefix the column name with the table name using dot notation
User_ID or User.ID
That's not an advantage of just ID as a name. What does it do for you that USER_ID won't do?
Or
I pretty much always choose "ID" because I'm very comfortable with the table aliasing
That's awesome this guy is comfortable, what's the benefit?
Here's my argument:
Would you have any other n columns all named exactly the same, all meaning completely different things? Would you allow the column "COST" to mean 5 different things in five different tables? Wouldn't you want to call them, "TOTAL_COST", "DIRECT_COST", "AVERAGE_COST", etc? SO why would you do that with ID? If you wouldn't do that for any other column, be consistent.
I use the dictionary tables a lot. I like to query the database's own tables for discovery. Sometimes you'll find missing referential integrity constraints. If the columns are named the same, you can find most missing constraints with no other information. You look for PK's, get the column name USER_ID, Look for other tables with USER_ID, check to see if it has a constraint. If you name your PK ID, you'll also have to know what the convention is Is the FK column named, %Table%_ID or %Table%ID or %Table%_FK or %Table%FK? A tautological column naming convention is much simpler than some function.
Naming the primary key the same as its foreign keys is necessary if you want to use the NATURAL JOIN syntax. I think it's easier to understand as well, but there are people who feel strongly about it either way. In any case, it's really only a matter of style and local convention -- whatever you decide to do, at least be consistent!
I've seen a few cases of database schemas where all columns have been prefixed with either the table name or an abbreviation of the table name. The explanation I got when questioning this was that it was practical not having to worry about column name collisions when joining tables, because each column then got a name that was unique in the schema.
I guess that this applies to primary keys as well, especially as they are frequently used when doing joins.
In my opinion, this doesn't make sense, because it is very easy to just prefix the column name with the table name using dot notation in the sql queries, and thereby removing the ambiguity... if at all you have to write the SQL yourself, that is.
This syntax can make it easier when doing joins, avoiding the 'as' keyword. ex:
SELECT `tbl_user`.`user_id`, `tbl_emails`.`email_id` FROM `tbl_emails` LEFT JOIN `tbl_users` ON 'tbl_emails`.`user_id` = `tbl_users`.`user_id`
rather than:
SELECT SELECT `tbl_user`.`id` AS `user_id`, `tbl_emails`.`id` AS `email_id` FROM `tbl_emails` LEFT JOIN `tbl_users` ON 'tbl_emails`.`user_id` = `tbl_users`.`id`
However, this is only a preference that some coders feel. Coding guidelines and naming standards are different everywhere. I personally use only a simple 'id' for the primary key, then use the 'as' keywords. There's not necessarily a right or wrong, as long as what you do makes sense and you do it consistently.
When you have some JOIN then ON A.customer_id = B.customer_id looks better than ON A.customer_id = B.id. If you don't use JOINs, I see no real reason for tablename_id. About JOIN on Wikipedia
I am no good at SQL.
I am looking for a way to speed up a simple join like this:
SELECT
E.expressionID,
A.attributeName,
A.attributeValue
FROM
attributes A
JOIN
expressions E
ON
E.attributeId = A.attributeId
I am doing this dozens of thousands times and it's taking more and more as the table gets bigger.
I am thinking indexes - If I was to speed up selects on the single tables I'd probably put nonclustered indexes on expressionID for the expressions table and another on (attributeName, attributeValue) for the attributes table - but I don't know how this could apply to the join.
EDIT: I already have a clustered index on expressionId (PK), attributeId (PK, FK) on the expressions table and another clustered index on attributeId (PK) on the attributes table
I've seen this question but I am asking for something more general and probably far simpler.
Any help appreciated!
You definitely want to have indexes on attributeID on both the attributes and expressions table. If you don't currently have those indexes in place, I think you'll see a big speedup.
In fact, because there are so few columns being returned, I would consider a covered index for this query
i.e. an index that includes all the fields in the query.
Some things you need to care about are indexes, the query plan and statistics.
Put indexes on attributeId. Or, make sure indexes exist where attributeId is the first column in the key (SQL Server can still use indexes if it's not the 1st column, but it's not as fast).
Highlight the query in Query Analyzer and hit ^L to see the plan. You can see how tables are joined together. Almost always, using indexes is better than not (there are fringe cases where if a table is small enough, indexes can slow you down -- but for now, just be aware that 99% of the time indexes are good).
Pay attention to the order in which tables are joined. SQL Server maintains statistics on table sizes and will determine which one is better to join first. Do some investigation on internal SQL Server procedures to update statistics -- it's been too long so I don't have that info handy.
That should get you started. Really, an entire chapter can be written on how a database can optimize even such a simple query.
I bet your problem is the huge number of rows that are being inserted into that temp table. Is there any way you can add a WHERE clause before you SELECT every row in the database?
Another thing to do is add some indexes like this:
attributes.{attributeId, attributeName, attributeValue}
expressions.{attributeId, expressionID}
This is hacky! But useful if it's a last resort.
What this does is create a query plan that can be "entirely answered" by indexes. Usually, an index actually causes a double-I/O in your above query: one to hit the index (i.e. probe into the table), another to fetch the actual row referred to by the index (to pull attributeName, etc).
This is especially helpful if "attributes" or "expresssions" is a wide table. That is, a table that's expensive to fetch the rows from.
Finally, the best way to speed your query is to add a WHERE clause!
If I'm understanding your schema correctly, you're stating that your tables kinda look like this:
Expressions: PK - ExpressionID, AttributeID
Attributes: PK - AttributeID
Assuming that each PK is a clustered index, that still means that an Index Scan is required on the Expressions table. You might want to consider creating an Index on the Expressions table such as: AttributeID, ExpressionID. This would help to stop the Index Scanning that currently occurs.
Tips,
If you want to speed up your query using join:
For "inner join/join",
Don't use where condition instead use it in "ON" condition.
Eg:
select id,name from table1 a
join table2 b on a.name=b.name
where id='123'
Try,
select id,name from table1 a
join table2 b on a.name=b.name and a.id='123'
For "Left/Right Join",
Don't use in "ON" condition, Because if you use left/right join it will get all rows for any one table.So, No use of using it in "On". So, Try to use "Where" condition