Simple database design question about foreign keys - database

I have a simple question about database desing...
Let's say we have Table Customer with some fields:
(PK) Id,
Firstname,
Lastname,
Address,
City,
(FK) Sex_Id...
So...
Would it be a good idea to have an additional table Table Sex where data about Sex ('M', 'W') would be saved?
Sex_Id,
Value
or should Sex values ('M' or 'W') be saved directly into table Customer? What about query speed etc.?
Thanks in advance,
best Regards.

Or, one could use an existing standard. ISO 5218 covers four codes:
0 = Not Known
1 = Male
2 = Female
9 = Not applicable (lawful person such as corporation, organization etc)
ISO 5218 is a legal encoding and does not apply for medical/biological aspect.
Obviously, a reference table containing those codes should use the natural key (as per above list), and not a syntetic key.
Joe Celko's Data Measurements And Standards in SQL is a great (albeit boring) read.

You could try a multivalued attribute, but I prefer to do this: If there are only 2 values, you could consider using a BOOL type for that attribute in your DB and making 0 = Male and 1 = Female (commenting, of course, to avoid confusion). When data is entered in the external program (given there is one), you could just do a quick mapping where if they check "male", the attribute is 0 in the DB, and if they check "female", the attribute value is 1 in the DB.

How many different values are you planning on having for Sex? If you aren't going to be adding more possible values for that column, it doesn't make sense to use a foreign key.

You can use a character for the column, storing "M" or "W", and also use a foreign key into a table (primary key of a character) if you need to store any more details about that thing; You get the benefit of easy to write/read queries (no join required) for basic stuff, but still have the possibility of adding more data later on.
That said, unless you actually do have more columns in your Sex table, you could probably not create it at all now and add it later when you actually do have a need for it.

in your example, the extra table does not buy you anything.
#marc_s has the right idea here to add a good CHECK CONSTRAINT to make sure the local values are in the proper subset.
now if your example contained additional attributes on the related object, like a 'name' or'description' or further links to other objects like 'alias' or some kind of date range - then absolutely yes, create another table.

Related

Trigger before insert to create a primary key from name and surname

I need to generate a primary key char(3) for my database table "People" from name and surname, i had old database inherited and i have to replicate this id. Example from 'John' 'Smith' i would like to generate the id 'JOS' then if there are two 'John Smith' it should go like this 'JSM' etc etc.
So I tought I probably could do this as trigger before insert, is that really possible? if yes is it the best way to do it ? How to setup the trigger.
CREATE OR REPLACE TRIGGER cacciatore_bef_ins
BEFORE INSERT ON eddo.CACCIATORI
referencing old as "old" new as "new"
FOR EACH ROW
DECLARE
temp char(3);
pos number :=1;
BEGIN
temp:=genera_cod(new.nome,new.cognome,pos);
while esiste(temp)= true
LOOP
temp:=genera_cod(new.nome,new.cognome,pos+1)
END LOOP;
new.codcacciatore:=temp;
END
end cacciatore_bef_ins
/
i had old database inherited and i have to replicate this id.
Says who? The fact that you inherited something bad doesn't mean that you have to make it worse. Just being curious: what would be the 3rd John Smith's primary key value? What about Joan Smadden? Yet another JOS as well as JSM.
I'd suggest you to
keep "old" values
keep column datatype
it is difficult to change primary key datatype and values because of foreign key constraints. If there are none, consider switching to something easier to maintain
CHAR(3) suggests that you don't expect many rows in this table. There are (in English alphabet) 26^3 = 17.576 possible combinations. That's probably enough for what you store, but you might exceed combinations for those "John Smith" variations. How to find an available primary key? That depends on algorithm you use (I'm not sure which one is it, based on what you described and my first question)
finally, see whether you can use something "simpler", e.g.
all digits (from 000 to 999), then
letter + 2 digits (e.g. A00, A01, ..., A99, B00, B01, ...)
and so on
A bad idea is still a bad idea even if data volume is small. The fact being it is a conversion from a manual system is all the more reason to change, everything else will be different anyway. Processes which work reasonable well manually fail miserably when automated. This is a case in point.
With a manual system reducing to 3 letters is a good, effective, efficient process. It gets me to the correct section of the file cabinet very quickly. From there I have only a few files to scan in order to locate the exact one needed.
That is manually.
In an automated system that is the exact opposite - it is ineffective, inefficient, and limited. I can guess with a high level of confidence your 3letter code derives from the first letter of first,middle,last name of the individual. (It could of course be derived from something else, but let go with it anyway.) This is both natural and intuitive to a person and if there is more than one John O. Smith (JOS) or Joan Ophelia Saunders (JOS) - then so what there are other means(data points) to identify the exact file. Not so on an automated system. I have a hard time imagining that anyone would think of OOS as indication for Joan Ophelia Saunders. But as automated it is a very simple algorithmic change.
If you think the 3char code lookup really needs to be maintained, and that is not a bad assumption, then add that as a column and create a non-unique index. When the user enters 'JOS', provide a list of qualifying namesand maybe another data point (like address) to choose from -just like a manual file cabinet. Then you do not need to try making then unique - indeed you do not want unique.
For you Primacy key however just use an generated integer (sequence). BTW Hint of the day. Make the 3letter code column varchar2(3) not char(3).

Table column name design

Let's say I have the following tables: Customer and Staff.
Customer (CusID, CusName, CusAddres, CusGender)
Staff (StaID, StaName, StaAddress, StaGender)
vs
Customer(ID, Name, Address, Gender)
Staff(ID, Name, Address, Gender)
Which design is preferred and why?
SQL employs a concept known as domain name integrity which means that the names of objects have a scope given by their container.
Column names have to be unique, but only within the context of the table that contains the columns. Table names have to be unique, but only within the context of the schema that contains the tables, etc.
When you query columns you need to reference the schema, table and colum that you are interested in, unless one or more of these can be inferred. Unless your query is so simple that it only references one table, you're going to need to reference the table name directly or by using an alias, e.g. Customer.ID or C.ID from Customer C, etc.
The first option is a throw-back to technical requirements for uniqueness of all column names, which applied to old ISAM databases and to languages like COBOL in the 1960s and 70s. This got dragged along for no good reason into dBase in the 1980s and has stuck as a convention well into the relational and object DBMS eras. Resist this outdated convention.
The second approach is much simpler and more readable. Simpler and more readable code is easier to write and much easier to maintain.
Keys are "propagated" down foreign keys, so it's useful if they keep their names constant in all the resulting "copies". It just makes the database schema clearer: you don't have to look into FK definition1 to see where a particular field came from - you can do that by just glancing at its name.
On the other hand, non-key fields are not propagated and there is no particular reason to keep their names unique outside their respective tables.
So, I'd recommend a hybrid approach:
Prefix key field names to keep them unique among all tables and avoid any need for renaming propagated fields.
Don't prefix non-key field names to keep them shorter, even though this might lead to some name repetition in different tables.
For example:
Customer (CustomerID, Name, Addres, Gender)
Staff (StaffID, Name, Address, Gender)
And if you happen to have a junction table between the two2, it'll look like this...
CustomerStaff(CustomerID PK FK1, StaffID PK FK2)
...so no need for renames (that would be necessary if both parent keys were named ID) and it's immediately clear where CustomerID and StaffID came from. Also, I personally don't like shortening table names in the prefix (hence CustomerID and not CusID), as this makes naming even more "mechanical" and predictable.
1 Potentially multiple levels of them!
2 Just an example. Whether it makes sense is another matter.
ID is a SQL antipattern (http://www.amazon.com/SQL-Antipatterns-Programming-Pragmatic-Programmers/dp/1934356557/ref=sr_1_1?s=books&ie=UTF8&qid=1343835938&sr=1-1&keywords=sql+antipatterns), never name a column ID.
I would use:
Customer(CustomerID, Name, Address, Gender)
Staff(StaffID, Name, Address, Gender)
I would say that the second option is better. If you decide to change the table name, you won't have to change the names of all the columns. Also when you write your SELECT statements, you usually use an alias anyway ("select sta.name, sta.adress from staff sta...")
In general, when in doubt, always pick a simpler solution.
Every Industry has its own standard. In those both style of designing's are normal. It is recommended to use First one.
Customer(CusID, CusName, CusAddres, CusGender)
Staff(StaID, StaName, StaAddress, StaGender)
Because it is using table name symbol like Customer to Cus and Staff to Sta. It helps
the project good to maintenance.
Some Organizations are also using vch for VARCHAR data type,int for INT data type.
Like
Customer(intCusID, vchCusName, vchCusAddres, vchCusGender)
Staff(intStaID, vchStaName, vchStaAddress, vchStaGender)

How to generate serial STRING primary key in Database

My collegues don't like auto generated int serial number by database and want to use string primary key like :
"camera0001"
"camera0002"
As camera may be deleted, I can not use "total nubmer of camera + 1" for id of a new camera.
If you were me, how will you generate this kind of key in your program?
PS : I think auto generated serail number as primary key is OK, just don't like arguing with my collegues.
Don't do it like "camera0001"! argue it out, that is a horrible design mistake.
try one of these:
http://en.wikipedia.org/wiki/Database_normalization
http://www.datamodel.org/NormalizationRules.html
just google: database normalization
Each column in a database should only contain 1 piece of information. Keep the ID and the type in different columns. You can display them together if you wish, but do not store them the together! You will have to constantly split them and make simple queries difficult. The string will take a lot of space on disk and cache memory, if it is a FK it will waste space there too.
have a pure numeric auto column ID and a type column that is a foreign key to a table that contains a description, like:
Table1
YourID int auto id PK
YourType char(1) fk
TypeTable
YourType char(1) PK
Description varchar(100)
Table1
YourID YourType YourData....
1 C xyz
2 C abc
3 R dffd
4 C fg
TypeTable
YourType Description
C Camera
R Radio
I don't agree that a sequence number is always the best key. When there is a natural primary key available, I prefer it to a sequence number. If, say, your cameras are identified by some reasonably short model name or code, like you identify your "Super Duper Professional Camera Model 3" as "SDPC3" in the catalog and all, that "SDPC3" would, in my opinion, be an excellent choice for a primary key.
But that doesn't sound like what your colleagues want to do here. They want to take a product category, "camera", that of course no one expects to be unique, and then make it unique by tacking on a sequence number. This gives you the worst of both worlds: It's hard to generate, a long string which makes it slower to process, and it's still meaningless: no one is going to remember that "camera0002904" is the 3 megapixel camera with the blue case while "camera0002905" is the 4 megapixel camera with the red case. No one is going to consistently remember that sort of thing, anyway. So you're not going to use these values as useful display values to the user.
If you are absolutely forced to do something like this, I'd say make two fields: One for the category, and one for the sequence number. If they want them concatenated together for some display, fine. Preferably make the sequence number unique across categories so it can be the primary key by itself, but if necessary you can assign sequence numbers within the category. MySQL will do this automatically; most databases will require you to write some code to do it. (Post again if you want discussion on how.) Oh, and I wouldn't have anyone type in "camera" for the category. This should be a look-up table of legal values, and then post the primary key of this look-up table into the product record. Otherwise you're going to have "camera" and "Camera" and "camrea" and dozens of other typos and variations.
Have a table with your serial number counters, increment it and insert your record.
OR
Set the Id to 'camera' + PAD((RIGHT(MAX(ID), 4) + 1), '0', 4)

How do you manage "pick lists" in a database

I have an application with multiple "pick list" entities, such as used to populate choices of dropdown selection boxes. These entities need to be stored in the database. How do one persist these entities in the database?
Should I create a new table for each pick list? Is there a better solution?
In the past I've created a table that has the Name of the list and the acceptable values, then queried it to display the list. I also include a underlying value, so you can return a display value for the list, and a bound value that may be much uglier (a small int for normalized data, for instance)
CREATE TABLE PickList(
ListName varchar(15),
Value varchar(15),
Display varchar(15),
Primary Key (ListName, Display)
)
You could also add a sortOrder field if you want to manually define the order to display them in.
It depends on various things:
if they are immutable and non relational (think "names of US States") an argument could be made that they should not be in the database at all: after all they are simply formatting of something simpler (like the two character code assigned). This has the added advantage that you don't need a round trip to the db to fetch something that never changes in order to populate the combo box.
You can then use an Enum in code and a constraint in the DB. In case of localized display, so you need a different formatting for each culture, then you can use XML files or other resources to store the literals.
if they are relational (think "states - capitals") I am not very convinced either way... but lately I've been using XML files, database constraints and javascript to populate. It works quite well and it's easy on the DB.
if they are not read-only but rarely change (i.e. typically cannot be changed by the end user but only by some editor or daily batch), then I would still consider the opportunity of not storing them in the DB... it would depend on the particular case.
in other cases, storing in the DB is the way (think of the tags of StackOverflow... they are "lookup" but can also be changed by the end user) -- possibly with some caching if needed. It requires some careful locking, but it would work well enough.
Well, you could do something like this:
PickListContent
IdList IdPick Text
1 1 Apples
1 2 Oranges
1 3 Pears
2 1 Dogs
2 2 Cats
and optionally..
PickList
Id Description
1 Fruit
2 Pets
I've found that creating individual tables is the best idea.
I've been down the road of trying to create one master table of all pick lists and then filtering out based on type. While it works, it has invariably created headaches down the line. For example you may find that something you presumed to be a simple pick list is not so simple and requires an extra field, do you now split this data into an additional table or extend you master list?
From a database perspective, having individual tables makes it much easier to manage your relational integrity and it makes it easier to interpret the data in the database when you're not using the application
We have followed the pattern of a new table for each pick list. For example:
Table FRUIT has columns ID, NAME, and DESCRIPTION.
Values might include:
15000, Apple, Red fruit
15001, Banana, yellow and yummy
...
If you have a need to reference FRUIT in another table, you would call the column FRUIT_ID and reference the ID value of the row in the FRUIT table.
Create one table for lists and one table for list_options.
# Put in the name of the list
insert into lists (id, name) values (1, "Country in North America");
# Put in the values of the list
insert into list_options (id, list_id, value_text) values
(1, 1, "Canada"),
(2, 1, "United States of America"),
(3, 1, "Mexico");
To answer the second question first: yes, I would create a separate table for each pick list in most cases. Especially if they are for completely different types of values (e.g. states and cities). The general table format I use is as follows:
id - identity or UUID field (I actually call the field xxx_id where xxx is the name of the table).
name - display name of the item
display_order - small int of order to display. Default this value to something greater than 1
If you want you could add a separate 'value' field but I just usually use the id field as the select box value.
I generally use a select that orders first by display order, then by name, so you can order something alphabetically while still adding your own exceptions. For example, let's say you have a list of countries that you want in alpha order but have the US first and Canada second you could say "SELECT id, name FROM theTable ORDER BY display_order, name" and set the display_order value for the US as 1, Canada as 2 and all other countries as 9.
You can get fancier, such as having an 'active' flag so you can activate or deactivate options, or setting a 'x_type' field so you can group options, description column for use in tooltips, etc. But the basic table works well for most circumstances.
Two tables. If you try to cram everything into one table then you break normalization (if you care about that). Here are examples:
LIST
---------------
LIST_ID (PK)
NAME
DESCR
LIST_OPTION
----------------------------
LIST_OPTION_ID (PK)
LIST_ID (FK)
OPTION_NAME
OPTION_VALUE
MANUAL_SORT
The list table simply describes a pick list. The list_ option table describes each option in a given list. So your queries will always start with knowing which pick list you'd like to populate (either by name or ID) which you join to the list_ option table to pull all the options. The manual_sort column is there just in case you want to enforce a particular order other than by name or value. (BTW, whenever I try to post the words "list" and "option" connected with an underscore, the preview window goes a little wacky. That's why I put a space there.)
The query would look something like:
select
b.option_name,
b.option_value
from
list a,
list_option b
where
a.name="States"
and
a.list_id = b.list_id
order by
b.manual_sort asc
You'll also want to create an index on list.name if you think you'll ever use it in a where clause. The pk and fk columns will typically automatically be indexed.
And please don't create a new table for each pick list unless you're putting in "relationally relevant" data that will be used elsewhere by the app. You'd be circumventing exactly the relational functionality that a database provides. You'd be better off statically defining pick lists as constants somewhere in a base class or a properties file (your choice on how to model the name-value pair).
Depending on your needs, you can just have an options table that has a list identifier and a list value as the primary key.
select optionDesc from Options where 'MyList' = optionList
You can then extend it with an order column, etc. If you have an ID field, that is how you can reference your answers back... of if it is often changing, you can just copy the answer value to the answer table.
If you don't mind using strings for the actual values, you can simply give each list a different list_id in value and populate a single table with :
item_id: int
list_id: int
text: varchar(50)
Seems easiest unless you need multiple things per list item
We actually created entities to handle simple pick lists. We created a Lookup table, that holds all the available pick lists, and a LookupValue table that contains all the name/value records for the Lookup.
Works great for us when we need it to be simple.
I've done this in two different ways:
1) unique tables per list
2) a master table for the list, with views to give specific ones
I tend to prefer the initial option as it makes updating lists easier (at least in my opinion).
Try turning the question around. Why do you need to pull it from the database? Isn't the data part of your model but you really want to persist it in the database? You could use an OR mapper like linq2sql or nhibernate (assuming you're in the .net world) or depending on the data you could store it manually in a table each - there are situations where it would make good sense to put it all in the same table but do consider this only if you feel it makes really good sense. Normally putting different data in different tables makes it a lot easier to (later) understand what is going on.
There are several approaches here.
1) Create one table per pick list. Each of the tables would have the ID and Name columns; the value that was picked by the user would be stored based on the ID of the item that was selected.
2) Create a single table with all pick lists. Columns: ID; list ID (or list type); Name. When you need to populate a list, do a query "select all items where list ID = ...". Advantage of this approach: really easy to add pick lists; disadvantage: a little more difficult to write group-by style queries (for example, give me the number of records that picked value X".
I personally prefer option 1, it seems "cleaner" to me.
You can use either a separate table for each (my preferred), or a common picklist table that has a type column you can use to filter on from your application. I'm not sure that one has a great benefit over the other generally speaking.
If you have more than 25 or so, organizationally it might be easier to use the single table solution so you don't have several picklist tables cluttering up your database.
Performance might be a hair better using separate tables for each if your lists are very long, but this is probably negligible provided your indexes and such are set up properly.
I like using separate tables so that if something changes in a picklist - it needs and additional attribute for instance - you can change just that picklist table with little effect on the rest of your schema. In the single table solution, you will either have to denormalize your picklist data, pull that picklist out into a separate table, etc. Constraints are also easier to enforce in the separate table solution.
This has served us well:
SQL> desc aux_values;
Name Type
----------------------------------------- ------------
VARIABLE_ID VARCHAR2(20)
VALUE_SEQ NUMBER
DESCRIPTION VARCHAR2(80)
INTEGER_VALUE NUMBER
CHAR_VALUE VARCHAR2(40)
FLOAT_VALUE FLOAT(126)
ACTIVE_FLAG VARCHAR2(1)
The "Variable ID" indicates the kind of data, like "Customer Status" or "Defect Code" or whatever you need. Then you have several entries, each one with the appropriate data type column filled in. So for a status, you'd have several entries with the "CHAR_VALUE" filled in.

Database, Table and Column Naming Conventions? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Whenever I design a database, I always wonder if there is a best way of naming an item in my database. Quite often I ask myself the following questions:
Should table names be plural?
Should column names be singular?
Should I prefix tables or columns?
Should I use any case in naming items?
Are there any recommended guidelines out there for naming items in a database?
I recommend checking out Microsoft's SQL Server sample databases:
https://github.com/Microsoft/sql-server-samples/releases/tag/adventureworks
The AdventureWorks sample uses a very clear and consistent naming convention that uses schema names for the organization of database objects.
Singular names for tables
Singular names for columns
Schema name for tables prefix (E.g.: SchemeName.TableName)
Pascal casing (a.k.a. upper camel case)
Late answer here, but in short:
Plural table names: My preference is plural
Singular column names: Yes
Prefix tables or columns:
Tables: *Usually* no prefixes is best.
Columns: No.
Use any case in naming items: PascalCase for both tables and columns.
Elaboration:
(1) What you must do. There are very few things that you must do a certain way, every time, but there are a few.
Name your primary keys using "[singularOfTableName]ID" format. That is, whether your table name is Customer or Customers, the primary key should be CustomerID.
Further, foreign keys must be named consistently in different tables. It should be legal to beat up someone who does not do this. I would submit that while defined foreign key constraints are often important, consistent foreign key naming is always important
You database must have internal conventions. Even though in later sections you'll see me being very flexible, within a database naming must be very consistent . Whether your table for customers is called Customers or Customer is less important than that you do it the same way throughout the same database. And you can flip a coin to determine how to use underscores, but then you must keep using them the same way. If you don't do this, you are a bad person who should have low self-esteem.
(2) What you should probably do.
Fields representing the same kind of data on different tables should be named the same. Don't have Zip on one table and ZipCode on another.
To separate words in your table or column names, use PascalCasing. Using camelCasing would not be intrinsically problematic, but that's not the convention and it would look funny. I'll address underscores in a moment. (You may not use ALLCAPS as in the olden days. OBNOXIOUSTABLE.ANNOYING_COLUMN was okay in DB2 20 years ago, but not now.)
Don't artifically shorten or abbreviate words. It is better for a name to be long and clear than short and confusing. Ultra-short names is a holdover from darker, more savage times. Cus_AddRef. What on earth is that? Custodial Addressee Reference? Customer Additional Refund? Custom Address Referral?
(3) What you should consider.
I really think you should have plural names for tables; some think singular. Read the arguments elsewhere. Column names should be singular however. Even if you use plural table names, tables that represent combinations of other tables might be in the singular. For example, if you have a Promotions and an Items table, a table representing an item being a part of a promotion could be Promotions_Items, but it could also legitimately be Promotion_Items I think (reflecting the one-to-many relationship).
Use underscores consistently and for a particular purpose. Just general tables names should be clear enough with PascalCasing; you don't need underscores to separate words. Save underscores either (a) to indicate an associative table or (b) for prefixing, which I'll address in the next bullet.
Prefixing is neither good or bad. It usually is not best. In your first db or two, I would not suggest using prefixes for general thematic grouping of tables. Tables end up not fitting your categories easily, and it can actually make it harder to find tables. With experience, you can plan and apply a prefixing scheme that does more good than harm. I worked in a db once where data tables began with tbl, config tables with ctbl, views with vew, proc's sp, and udf's fn, and a few others; it was meticulously, consistently applied so it worked out okay. The only time you NEED prefixes is when you have really separate solutions that for some reason reside in the same db; prefixing them can be very helpful in grouping the tables. Prefixing is also okay for special situations, like for temporary tables that you want to stand out.
Very seldom (if ever) would you want
to prefix columns.
Ok, since we're weighing in with opinion:
I believe that table names should be plural. Tables are a collection (a table) of entities. Each row represents a single entity, and the table represents the collection. So I would call a table of Person entities People (or Persons, whatever takes your fancy).
For those who like to see singular "entity names" in queries, that's what I would use table aliases for:
SELECT person.Name
FROM People person
A bit like LINQ's "from person in people select person.Name".
As for 2, 3 and 4, I agree with #Lars.
I work in a database support team with three DBAs and our considered options are:
Any naming standard is better than no standard.
There is no "one true" standard, we all have our preferences
If there is standard already in place, use it. Don't create another standard or muddy the existing standards.
We use singular names for tables. Tables tend to be prefixed with the name of the system (or its acronym). This is useful if the system complex as you can change the prefix to group the tables together logically (ie. reg_customer, reg_booking and regadmin_limits).
For fields we'd expect field names to be include the prefix/acryonm of the table (i.e. cust_address1) and we also prefer the use of a standard set of suffixes ( _id for the PK, _cd for "code", _nm for "name", _nb for "number", _dt for "Date").
The name of the Foriegn key field should be the same as the Primary key field.
i.e.
SELECT cust_nm, cust_add1, booking_dt
FROM reg_customer
INNER JOIN reg_booking
ON reg_customer.cust_id = reg_booking.cust_id
When developing a new project, I'd recommend you write out all the preferred entity names, prefixes and acronyms and give this document to your developers. Then, when they decide to create a new table, they can refer to the document rather than "guess" what the table and fields should be called.
No. A table should be named after the entity it represents.
Person, not persons is how you would refer to whoever one of the records represents.
Again, same thing. The column FirstName really should not be called FirstNames. It all depends on what you want to represent with the column.
NO.
Yes. Case it for clarity. If you need to have columns like "FirstName", casing will make it easier to read.
Ok. Thats my $0.02
I hear the argument all the time that whether or not a table is pluralized is all a matter of personal taste and there is no best practice. I don't believe that is true, especially as a programmer as opposed to a DBA. As far as I am aware, there are no legitimate reasons to pluralize a table name other than "It just makes sense to me because it's a collection of objects," while there are legitimate gains in code by having singular table names. For example:
It avoids bugs and mistakes caused by plural ambiguities. Programmers aren't exactly known for their spelling expertise, and pluralizing some words are confusing. For example, does the plural word end in 'es' or just 's'? Is it persons or people? When you work on a project with large teams, this can become an issue. For example, an instance where a team member uses the incorrect method to pluralize a table he creates. By the time I interact with this table, it is used all over in code I don't have access to or would take too long to fix. The result is I have to remember to spell the table wrong every time I use it. Something very similar to this happened to me. The easier you can make it for every member of the team to consistently and easily use the exact, correct table names without errors or having to look up table names all the time, the better. The singular version is much easier to handle in a team environment.
If you use the singular version of a table name AND prefix the primary key with the table name, you now have the advantage of easily determining a table name from a primary key or vice versa via code alone. You can be given a variable with a table name in it, concatenate "Id" to the end, and you now have the primary key of the table via code, without having to do an additional query. Or you can cut off "Id" from the end of a primary key to determine a table name via code. If you use "id" without a table name for the primary key, then you cannot via code determine the table name from the primary key. In addition, most people who pluralize table names and prefix PK columns with the table name use the singular version of the table name in the PK (for example statuses and status_id), making it impossible to do this at all.
If you make table names singular, you can have them match the class names they represent. Once again, this can simplify code and allow you to do really neat things, like instantiating a class by having nothing but the table name. It also just makes your code more consistent, which leads to...
If you make the table name singular, it makes your naming scheme consistent, organized, and easy to maintain in every location. You know that in every instance in your code, whether it's in a column name, as a class name, or as the table name, it's the same exact name. This allows you to do global searches to see everywhere that data is used. When you pluralize a table name, there will be cases where you will use the singular version of that table name (the class it turns into, in the primary key). It just makes sense to not have some instances where your data is referred to as plural and some instances singular.
To sum it up, if you pluralize your table names you are losing all sorts of advantages in making your code smarter and easier to handle. There may even be cases where you have to have lookup tables/arrays to convert your table names to object or local code names you could have avoided. Singular table names, though perhaps feeling a little weird at first, offer significant advantages over pluralized names and I believe are best practice.
I'm also in favour of a ISO/IEC 11179 style naming convention, noting they are guidelines rather than being prescriptive.
See Data element name on Wikipedia:
"Tables are Collections of Entities, and follow Collection naming guidelines. Ideally, a collective name is used: eg., Personnel. Plural is also correct: Employees. Incorrect names include: Employee, tblEmployee, and EmployeeTable."
As always, there are exceptions to rules e.g. a table which always has exactly one row may be better with a singular name e.g. a config table. And consistency is of utmost importance: check whether you shop has a convention and, if so, follow it; if you don't like it then do a business case to have it changed rather than being the lone ranger.
our preference:
Should table names be plural?
Never. The arguments for it being a collection make sense, but you never know what the table is going to contain (0,1 or many items). Plural rules make the naming unnecessarily complicated. 1 House, 2 houses, mouse vs mice, person vs people, and we haven't even looked at any other languages.
Update person set property = 'value' acts on each person in the table.
Select * from person where person.name = 'Greg' returns a collection/rowset of person rows.
Should column names be singular?
Usually, yes, except where you are breaking normalisation rules.
Should I prefix tables or columns?
Mostly a platform preference. We prefer to prefix columns with the table name. We don't prefix tables, but we do prefix views (v_) and stored_procedures (sp_ or f_ (function)). That helps people who want to try to upday v_person.age which is actually a calculated field in a view (which can't be UPDATEd anyway).
It is also a great way to avoid keyword collision (delivery.from breaks, but delivery_from does not).
It does make the code more verbose, but often aids in readability.
bob = new person()
bob.person_name = 'Bob'
bob.person_dob = '1958-12-21'
... is very readable and explicit. This can get out of hand though:
customer.customer_customer_type_id
indicates a relationship between customer and the customer_type table, indicates the primary key on the customer_type table (customer_type_id) and if you ever see 'customer_customer_type_id' whilst debugging a query, you know instantly where it is from (customer table).
or where you have a M-M relationship between customer_type and customer_category (only certain types are available to certain categories)
customer_category_customer_type_id
... is a little (!) on the long side.
Should I use any case in naming items?
Yes - lower case :), with underscores. These are very readable and cross platform. Together with 3 above it also makes sense.
Most of these are preferences though. - As long as you are consistent, it should be predictable for anyone that has to read it.
Take a look at ISO 11179-5: Naming and identification principles
You can get it here: http://metadata-standards.org/11179/#11179-5
I blogged about it a while back here: ISO-11179 Naming Conventions
I know this is late to the game, and the question has been answered very well already, but I want to offer my opinion on #3 regarding the prefixing of column names.
All columns should be named with a prefix that is unique to the table they are defined in.
E.g. Given tables "customer" and "address", let's go with prefixes of "cust" and "addr", respectively. "customer" would have "cust_id", "cust_name", etc. in it. "address" would have "addr_id", "addr_cust_id" (FK back to customer), "addr_street", etc. in it.
When I was first presented with this standard, I was dead-set against it; I hated the idea. I couldn't stand the idea of all that extra typing and redundancy. Now I've had enough experience with it that I'd never go back.
The result of doing this is that all of the columns in your database schema are unique. There is one major benefit to this, which trumps all arguments against it (in my opinion, of course):
You can search your entire code base and reliably find every line of code that touches a particular column.
The benefit from #1 is incredibly huge. I can deprecate a column and know exactly what files need to be updated before the column can safely be removed from the schema. I can change the meaning of a column and know exactly what code needs to be refactored. Or I can simply tell if data from a column is even being used in a particular portion of the system. I can't count the number of times this has turned a potentially huge project into a simple one, nor the amount of hours we've saved in development work.
Another, relatively minor benefit to it is that you only have to use table-aliases when you do a self join:
SELECT cust_id, cust_name, addr_street, addr_city, addr_state
FROM customer
INNER JOIN address ON addr_cust_id = cust_id
WHERE cust_name LIKE 'J%';
My opinions on these are:
1) No, table names should be singular.
While it appears to make sense for the simple selection (select * from Orders) it makes less sense for the OO equivalent (Orders x = new Orders).
A table in a DB is really the set of that entity, it makes more sense once you're using set-logic:
select Orders.*
from Orders inner join Products
on Orders.Key = Products.Key
That last line, the actual logic of the join, looks confusing with plural table names.
I'm not sure about always using an alias (as Matt suggests) clears that up.
2) They should be singular as they only hold 1 property
3) Never, if the column name is ambiguous (as above where they both have a column called [Key]) the name of the table (or its alias) can distinguish them well enough. You want queries to be quick to type and simple - prefixes add unnecessary complexity.
4) Whatever you want, I'd suggest CapitalCase
I don't think there's one set of absolute guidelines on any of these.
As long as whatever you pick is consistent across the application or DB I don't think it really matters.
In my opinion:
Table names should be plural.
Column names should be singular.
No.
Either CamelCase (my preferred) or underscore_separated for both table names and column names.
However, like it has been mentioned, any convention is better than no convention. No matter how you choose to do it, document it so that future modifications follow the same conventions.
I think the best answer to each of those questions would be given by you and your team. It's far more important to have a naming convention then how exactly the naming convention is.
As there's no right answer to that, you should take some time (but not too much) and choose your own conventions and - here's the important part - stick to it.
Of course it's good to seek some information about standards on that, which is what you're asking, but don't get anxious or worried about the number of different answers you might get: choose the one that seems better for you.
Just in case, here are my answers:
Yes. A table is a group of records, teachers or actors, so... plural.
Yes.
I don't use them.
The database I use more often - Firebird - keeps everything in upper case, so it doesn't matter. Anyway, when I'm programming I write the names in a way that it's easier to read, like releaseYear.
Definitely keep table names singular, person not people
Same here
No. I've seen some terrible prefixes, going so far as to state what were dealing with is a table (tbl_) or a user store procedure (usp_). This followed by the database name... Don't do it!
Yes. I tend to PascalCase all my table names
Naming conventions allow the development team to design discovereability and maintainability at the heart of the project.
A good naming convention takes time to evolve but once it’s in place it allows the team to move forward with a common language. A good naming convention grows organically with the project. A good naming convention easily copes with changes during the longest and most important phase of the software lifecycle - service management in production.
Here are my answers:
Yes, table names should be plural when they refer to a set of trades, securities, or counterparties for example.
Yes.
Yes. SQL tables are prefixed with tb_, views are prefixed vw_, stored procedures are prefixed usp_ and triggers are prefixed tg_ followed by the database name.
Column name should be lower case separated by underscore.
Naming is hard but in every organisation there is someone who can name things and in every software team there should be someone who takes responsibility for namings standards and ensures that naming issues like sec_id, sec_value and security_id get resolved early before they get baked into the project.
So what are the basic tenets of a good naming convention and standards: -
Use the language of your client and
your solution domain
Be descriptive
Be consistent
Disambiguate, reflect and refactor
Don’t use abbreviations unless they
are clear to everyone
Don’t use SQL reserved keywords as
column names
Here's a link that offers a few choices. I was searching for a simple spec I could follow rather than having to rely on a partially defined one.
http://justinsomnia.org/writings/naming_conventions.html
SELECT
UserID, FirstName, MiddleInitial, LastName
FROM Users
ORDER BY LastName
Table names should always be singular, because they represent a set of objects. As you say herd to designate a group of sheep, or flock do designate a group of birds. No need for plural. When a table name is composition of two names and naming convention is in plural it becomes hard to know if the plural name should be the first word or second word or both.
It’s the logic – Object.instance, not objects.instance. Or TableName.column, not TableNames.column(s).
Microsoft SQL is not case sensitive, it’s easier to read table names, if upper case letters are used, to separate table or column names when they are composed of two or more names.
Table Name: It should be singular, as it is a singular entity representing a real world object and not objects, which is singlular.
Column Name: It should be singular only then it conveys that it will hold an atomic value and will confirm to the normalization theory. If however, there are n number of same type of properties, then they should be suffixed with 1, 2, ..., n, etc.
Prefixing Tables / Columns: It is a huge topic, will discuss later.
Casing: It should be Camel case
My friend, Patrick Karcher, I request you to please not write anything which may be offensive to somebody, as you wrote, "•Further, foreign keys must be named consistently in different tables. It should be legal to beat up someone who does not do this.". I have never done this mistake my friend Patrick, but I am writing generally. What if they together plan to beat you for this? :)
Very late to the party but I still wanted to add my two cents about column prefixes
There seem to be two main arguments for using the table_column (or tableColumn) naming standard for columns, both based on the fact that the column name itself will be unique across your whole database:
1) You do not have to specify table names and/or column aliases in your queries all the time
2) You can easily search your whole code for the column name
I think both arguments are flawed. The solution for both problems without using prefixes is easy. Here's my proposal:
Always use the table name in your SQL. E.g., always use table.column instead of column.
It obviously solves 2) as you can now just search for table.column instead of table_column.
But I can hear you scream, how does it solve 1)? It was exactly about avoiding this. Yes, it was, but the solution was horribly flawed. Why? Well, the prefix solution boils down to:
To avoid having to specify table.column when there's ambiguity, you name all your columns table_column!
But this means you will from now on ALWAYS have to write the column name every time you specify a column. But if you have to do that anyways, what's the benefit over always explicitly writing table.column? Exactly, there is no benefit, it's the exact same number of characters to type.
edit: yes, I am aware that naming the columns with the prefix enforces the correct usage whereas my approach relies on the programmers
Essential Database Naming Conventions (and Style) (click here for more detailed description)
table names
choose short, unambiguous names, using no more than one or two words
distinguish tables easily
facilitates the naming of unique field names as well as lookup and linking tables
give tables singular names, never plural (update: i still agree with the reasons given for this convention, but most people really like plural table names, so i’ve softened my stance)... follow the link above please
Table names singular. Let's say you were modelling a realtionship between someone and their address.
For example, if you are reading a datamodel would you prefer
'each person may live at 0,1 or many address.' or
'each people may live at 0,1 or many addresses.'
I think its easier to pluralise address, rather than have to rephrase people as person. Plus collective nouns are quite often dissimlar to the singular version.
--Example SQL
CREATE TABLE D001_Students
(
StudentID INTEGER CONSTRAINT nnD001_STID NOT NULL,
ChristianName NVARCHAR(255) CONSTRAINT nnD001_CHNA NOT NULL,
Surname NVARCHAR(255) CONSTRAINT nnD001_SURN NOT NULL,
CONSTRAINT pkD001 PRIMARY KEY(StudentID)
);
CREATE INDEX idxD001_STID on D001_Students;
CREATE TABLE D002_Classes
(
ClassID INTEGER CONSTRAINT nnD002_CLID NOT NULL,
StudentID INTEGER CONSTRAINT nnD002_STID NOT NULL,
ClassName NVARCHAR(255) CONSTRAINT nnD002_CLNA NOT NULL,
CONSTRAINT pkD001 PRIMARY KEY(ClassID, StudentID),
CONSTRAINT fkD001_STID FOREIGN KEY(StudentID)
REFERENCES D001_Students(StudentID)
);
CREATE INDEX idxD002_CLID on D002_Classes;
CREATE VIEW V001_StudentClasses
(
SELECT
D001.ChristianName,
D001.Surname,
D002.ClassName
FROM
D001_Students D001
INNER JOIN
D002_Classes D002
ON
D001.StudentID = D002.StudentID
);
These are the conventions I was taught, but you should adapt to whatever you developement hose uses.
Plural. It is a collection of entities.
Yes. The attribute is a representation of singular property of an entity.
Yes, prefix table name allows easily trackable naming of all constraints indexes and table aliases.
Pascal Case for table and column names, prefix + ALL caps for indexes and constraints.

Resources