Best Practice For Names Of People

Best Practice For Names Of People - sql-server

I am creating an Employee table in SQL Server. Existing databases at this company tend to put a person's entire name in one field. To me this seems to be limiting and error prone and I am inclined to use first name, middle name, last name. Is this the best way to handle names, or is having the entire name on one field sometimes preferred, and what are the reasons for this?

If you need to be able to sort or filter e.g. by last name, then it would be better to have separate columns for these fields.
If you're looking for reasons why a single "name" field might be preferred, here's a blog I ran across awhile back about Falsehoods Programmers Believe About Names that might be enlightening. If your user base is culturally diverse, some of your assumptions about what makes a name may be wrong.
Some of the systems that I work with on a daily basis tend to have two fields. First, a "login name" or "username" (or email address, in the case of Bugzilla) that is permanent and used to uniquely identify a user for login. Second, a "display name" or "real name" (in the case of MediaWiki) that can be changed by the user.
The systems I work with that have "first name", "middle name", and "last name" seem to be systems that are geared toward storing contact information (CRM systems or billing systems). My guess is that it's a holdover from paper forms (Rolodex, mail-in forms, invoices, etc.), but I have no references to substantiate that.

Depends if you need the name separated, for example, in case of a filter by last name. This is very restricted by the use that you'll take for this field. At the moment I never needed this situation, I'm always used only one field for the person name.

i suggest to use first name and last name on two columns.
if you have a middle name, it will saved in first name.
if you want the first name and the last name is possible with sql or c# or other language concatenate. If you save on one column, the order is fixed as you saved.(if you saved last name + first name, you can't show frist name + last name without using split, that is a bad solution)
is more accurate and clear.

Naturally this will depend a bit on the expected scope of your database (i.e. global, national, internal, etc.)
The most flexible option that I've seen some medical/legal systems use is a setup similar to this.
First, Middle, Last, Prefix, Suffix, LegalName, FullName, Alias
That may be overkill in this case, so perhaps this would work for you.
First, Middle, Last, LegalName, Alias
LegalName should be a fairly wide VARCHAR, to handle people with 3+ names, and the Alias column allows you to gracefully deal with marriage/divorce (from a data standpoint, not an HR one).

I mostly use Prefix, First Name (contains First Name and Middle Name) and Last Name (contains Last Name and Suffix). It is useful for searching instead of storing Full Name in a single column.

Well, in most of my projects wherever there was a need to store customer information, names of customer are stored as same format. that way if full name is needed we can club 3 parts and get them.
Otherwise, storing entire name, means we have store it in proper format (I mean proper spacing etc); as well not every time we are required to fetch the full name.
many people don't have a middle name (some even don't have last name)
first name, middle name, last name

Related

First name, middle name, last name. Why not Full Name?

I am trying to find a better approach for storing people's name in the table.
What is the benefits of 3 field over 1 field for storing persons name?
UPDATE
Here is an interesting discussion and resources about storing names and user experience
Merging firstname/last name into one field

You can always construct a full name from its components, but you can't always deconstruct a full name into its components.
Say you want to write an email starting with "Dear Richie" - you can do that trivially if you have a given_name field, but figuring out what someone's given name is from their full name isn't trivial.
You can also trivially search or sort by given_name, or family_name, or whatever.
(Note I'm using given_name, family_name, etc. rather than first_name, last_name, because different cultures put their names in different orders.)
Solving this problem in the general case is hard - here's an article that gives a flavour of how hard it is: Representing People's Names in Dublin Core.

Keep your data as clean as you can!
How?
Ask your user only as few things as you absolutely need at the time you ask.
How you store the name does not matter. What does matter is that
the user experience is as good as can be
you don't have false data in your system
If you annoy the users with mandatory fields to fill in and re-question them several times, they can get upset and not buy into your application right there and then. You want to avoid bad user experiences at all times.
No user cares how easy it is for you to search your database for his middle name. He wants to have a easy, feel good experience, that's it.
What do users do if they are forced to input data like their postal address, or even email address when they only want a "read-only" account with no notifications needed? They put garbage data into your system. This will render your super search and sort algorithms useless anyway.
Thus, my advice would be in any app to gather just as little information from your user as you really need in order to serve them, no more.
If for example you run a online shop for pet food, don't ask your users at sign-up what kind of pets they own. Make it an option for them to fill in once they are logged in and all happy (new customers). Don't ask them their postal address until they order stuff that is actually carried to their house, stuff they pay for and thus care that YOU have their exact coordinates.
This will lead to a lot better data quality and this is what you should care about, not technical details the user has no benefit from....
In your example I would just ask for the full name (not sure though) and once the user willingly subscribes to your newsletter, let the user decide how he/she wants to be addressed...

As others have said, how do you decompose a full name in to its component parts.
Colin Angus Mackay
Jean Michel Jarre
Vincent van Gogh
Pablo Diego José Francisco de Paula Juan Nepomuceno María de los Remedios Cipriano de la Santísima Trinidad Ruiz y Picasso
How do you reliably decompose that lot?
To learn more, see falsehoods programmers believe about names.

I was looking up the Spanish Civil War the other day, and found this exception to most rules:
Francisco Paulino Hermenegildo Teódulo Franco y Bahamonde, Salgado y Pardo de Andrade
Father: Nicolás Franco y Salgado-Araújo
Mother: María del Pilar Bahamonde y Pardo de Andrade
Next time I'm working on a system that has to store names, I'm going to try something radical: designing from the requirements.
What are we going to use the names for?
Name on an address label for the postal service
Greeting on the website
Informal name
Based on what the names will be used for, we'd determine how much information to store. Maybe we allow the user to enter all three of those, including line breaks in the first case (Generalissimo Franco might want his full titles and appointments listed, if he weren't still dead). Maybe we provide First, Middle, Last, Generation as an option, and fill in the rest as defaults. Maybe we offer other common options like Surname, Given Name.
This is in contrast to the old-style First, Middle, Last we've used since before I started programming in COBOL back in 1975, and have "made fit" ever since.

Unfortunately this is kind of like asking what is the best way to store a number in the database. It depends on what you are going to do with it - sometimes you want an int,other times a byte, and sometimes a float. With names it depends on things like what cultures do you expect your users to come from, what you plan on doing with the names (will you be using these names to connect with another system that stores names as "last name, first name"?), and how much you can afford to annoy your users. If this is an internal HR application, you can probably afford to annoy the users a lot, and have a very structured, formal breakdown of name components (there are way more than 3 - don't forget mr/mrs, jr, III, multiple middle names, hyphenated last names, and who knows what else if you are trying to handle names from all cultures). If you have a webapp that users might or might not care about, you can't ask them to care too much.

You may want to search on the 3 separate fields for one and its inexpensive to concatenate for the fullname.
e.g. If you want to search for all the Mr. Nolans your query would be
SELECT Title+' '+FirstName+' '+Surname As FullName
from table where firstname = 'Mr' and surname ='Nolan'
to do this with just the fullnames would be a pain.

I'm English and only have one name. I normally put it in the 'surname' field for least aggravation. I am usually forced to put something in the 'first name' field too, which by definition is wrong.
Any attempt to impose anything more than 'Name' is doomed to be wrong at least some of the time, and sometimes be very frustrating to users. Single names are common in Southern India, Indonesia, and Pakistan (which is hundreds of millions of people) as well as the occaisional weirdo on the UK like me.
The 'first, middle, last' thing is very U.S.-centric. Few other countries think of names that way. Please stop doing it.

Keeping the fields separate allows you to support different output formats and cultures where the family name is written first

Things like ORDER BY firstname or ORDER BY lastname are possible when you break the name up into multiple fields.
Not as easy to do when you mash all names into one field.

About the only thing I can think of is for searching purposes. It's a bit better to search a field using [=] rather than say [like].
If you have no need to display the name as seperate words then go with a single field.
But if you need to do something like [Dear Mr. Achu] then perhaps a 3 field approach would be better.

Most of the time it's there to support writing form letters like, "Mr. so-and-so", or to search/sort by last name which is very common.
Given that first/middle/last may not apply to all cultures, there could be a better approach. It might be better expressed as "informal name" / "formal name" / "legal name" or something like that.
Still, at this point first/middle/last is very common, and from a data entry standpoint it is what everyone expect.

Here's the thing, not even humans can get this right all the time, there's just too much data, and too many special cases. I could change my name right now to be 20 parts, with the middle 13 as my "first" name. Parts of names can contain any number of words, and there can be any number of parts of names. Some people only have 1 name (no surname). Some people have lots of middle names. Some people have first or surnames composed of several words. Some people list their surname first. Some people go by their middle name. Some people go by nicknames that aren't obviously related to their given name.
If you try to guess these conventions in software YOU WILL FAIL. Period. Maybe you'll get it right some of the time, maybe even most of the time, but is even that worth it? In my opinion you should store names as one field and stop trying to be cute by using first names to refer to a person. If you need additional information about a name (e.g. a nickname), ask the user!

Each of the individual names is an atomic piece of data. When they are stored separately then it is easier to print them out in different formats such as Firstname Lastname and Lastname, Firstname.

There is no benefit if you never need to sort or search by first, middle, or last name.

Flexibility.
e.g.
If someone had a double barreled last name and no middle name.

I voted up some of these answers, but if you are looking to avoid repetitive or redundant or messy concatenation in your code, you can always use a computed column in the database or a method in a class which exposes the name consistently reconstructed. If these concatenations are expensive (because you are printing a million statements), you can use a persisted column.
Often you will allow users to specify names like nicknames or friendly names, so that you aren't referring to them by the name in their records or always as Mr. Smith.
It all depends on your requirements. There is no single good answer without the environment it is expected to satisfy.

Not sure how practical it would be, but maybe if cultural sensitivity is important in the context of the application being developed, perhaps a name should be a collection with each element of the collection carrying a value indicating if the name is the addressable "first name" or the addressable "surname" and so on for "title" or anything else that needs to be identified. A name ID could be used to identify the order of the elements for re-composing the full name.

Just have two fields, 'Full Name', and 'Preferred Name' - easy. Supports every name in existance (As long as the language has lexical symbols... So, yes, that excludes languages that do not have a written form).
Just make sure that they are handled in some unicode format, and that application code properly handles unicode conversion.

To me it is simply better to store 3 names so that explicit parsing is necessary later on if the individual components are needed..

You can't always separate surname from full name cleanly and reliably so there's good reason to separate that because you often need surname. After you do that, there are two common approaches:
first_name and middle_name; or
given_names.
(2) is arguably more preferable because people sometimes have more than tow given names and (1) is more inflexible in this regard.
Also, another common field is preferred_name (in addition to the above).

The i18n issue can be a bugger either way. certain cultures use the surname first and the given name last, that strikes the idea of first and last names so we move to fields for surnames and given names. Wait, some cultures don't have a surname or the surname is modified by the gender of the named.
We can get into tribal cultures where the person is renamed on adulthood. "Sitting Bull" childhood name was "Jumping Badger".
This is somewhat of a ramble but what I am showing is that the more fields you have the more accurate the design is. There should be at least a not null 'given name' field and a optional 'surname' field tied to a PK that is an integer. If the aforementioned requirements are observed, fields can be added without issues of breaking queries.

Some of the issues can be solved by also storing an additional column like PreferredName. We do that in our DB and also store prefix column and a suffix column.
e.g
'Prof Henry W Jones Jnr' with preferred name as 'Indiana Jones'.

Should i search content on database by id or name?

I have a CMS that has two methods to query contents. One that queries by id and another one queries by the name of the content.
ContentManager.Select(12);
or
ContentManager.Select("Content Name");
The way I see the first one would be faster, because the id is an index and doesn't involve string comparison. While the second one is much easier to work with.
I have worked, for maintenance reasons, with the second one. But if i change the content name, the Select obviously is not going to work anymore. But the Id is supposed to be only o database level, and not visible from the CMS forms.
Edit:Also, if a content were to be deleted and reinserted the string select would work and the id select wouldn't.
I can't come to a common point between these two approaches.

Selecting by the primary key gives best possible performance, but that's not always your only motivation. You might be able to add an index to the content name column, depending on it's width and your read/write ration (and depending on how much control over the database you have, I suppose).
Verdict, if you have the id, select by the id, if you don't and it's not ruining your performance, don't sweat using the content name.

Depends on which one is indexed... So yes you are right, in this case use the ID... If there is a need to also search by name, add another index using the name..

IDs ususally work best in databases. However, you are at the mercy of the CMS, it might be storing both those in an array and using the same exact select statement. Who knows? view the source code and see what is going on.
whatever you do, stick to one style in all of your code.

Personal names in a global application: What to store [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed last year.
Improve this question
Storing personal names in a structured way seems quite difficult when it comes to an application which is used by users from lots of different countries. The application I'm working on could theoretically be used by anyone from any place in the world.
Most often a given name (first name / forename) and surname seems to be used. In which case those two could simply be stored in the user database table.
Is storing "given name" and "surname" in the user table enough for a globally used application? Please give your opinion with a motivation.
Do you have any other suggestions?
Are there any good guides on how to solve this?
Some important facts:
Communication between users (who reside at the same or different companies and may be in different countries).
It's important that searching for users by name feels natural to users and that all important parts of a person name are searchable.
It would be nice if, when sending a message to someone in another country the system should be able to help by suggesting a proper greeting. It will probably be hard for arabic names, at least from what I've read as they seem to have a complex structure.

There isn't really a universal structured way to do this. I'd have a big field for "Full Name" and another field for "Display Name". Both unicode.
For example, in Spanish-speaking countries, IIRC, people usually have FOUR names. Two given names, and two surnames (one from the father, one from the mother). Arabs essentially have a linked list of names as far back as they choose to go (So-and-so, son of so-and-so, son of so-and-so, ...). East Asian countries tend to put the given names last, whilst Europeans put the given names first.

If you are really interested in going global across all cultures, take a look at the HR-XML specification of Person. Given name & surname just doesn't cut it when you move outside of the West. Eastern standards of FamilyName GivenName will trip you up on defining FullName. Besides, you have all the potential complexities of alternate scripts (not everybody uses the Latin alphabet, y'know), prefixes and suffixes (NN Sr, van der Waals).
It's a standard intended for transmission and integration rather than storage, but don't let the XML schema syntax scare you. I wouldn't implement every aspect of it, but it's an excellent provider of corner cases which aren't immediately obvious, and which you can then ignore consciously. In particular, check out the examples at the end!
And for goodness' sake, don't create an American application which assumes a middle initial for everyone!

In general, a name is a human-readable identifier of a person. For a given person, you will need to store one name for each use case of such an identifier. You will need a name to display as part of the postal address; you may need one to use as a "screen name"; you may need one to use in the openning of a letter to the person; you may sometimes need to include titles and honors of the person, sometimes not.
In any of these cases, what you want to display is a single string. You may or may not be able to avoid duplication of effort by storing components of these strings and putting them back together later. But in general, both in terms of national culture and professional culture, you may be better off just storing the full strings.
The same, BTW, largely occurs for postal addresses, except that there are international standards that permit the postal authority of one country to send mail to persons in another country.

One note: don't require both a "first name" and a "last name".
Some people, like me, only have one name.
(Proof: http://saizai.com/dl_redacted_small.png)

Is normalizing a person's name going too far?

You usually normalize a database to avoid data redundancy. It's easy to see in a table full of names that there is plenty of redundancy. If your goal is to create a catalog of the names of every person on the planet (good luck), I can see how normalizing names could be beneficial. But in the context of the average business database is it overkill?
(Of course I know you could take anything to an extreme... say if you normalized down to syllables... or even adjacent character pairs. I can't see a benefit in going that far)
Update:
One possible justification for this is a random name generator. That's all I could come up with off the top of my head.

Yes, it's an overkill.
People don't change their names from Bill to Joe all at once.

Database normalization usually refers to normalizing the field, not its content. In other words, you would normalize that there only be one first name field in the database. That is generally worthwhile. However the data content should not be normalized, since it is individual to that person - you are not picking from a list, and you are not changing a list in one place to affect everybody - that would be a bug, not a feature.

How do you normalize a name? Not all names have the same structure. Not all countries or cultures use the same rules for names. A first name is not necessarily just a first name. People have variable numbers of names. Some countries don't have the simple pair of firstname/lastname. What if my first name just so happens to be your last name, should they be considered the same in your database? If not, then you get into the problem that last name might mean different things in different countries. In most countries I know of, it is a family name. Your last name is the same as at least one of your parents' last name. On Iceland, it is your father's first name, followed by "son" or "daughter". So the same last name will mean completely different things depending on whether you encounter it in Iceland and the US.
In some cultures it is common when getting married, for the woman to take her husband's last name. In other cultures, that's completely optional, or might even work the opposite way.
How can you normalize this? What information would it gain you? If you find someone in your database who has "Smith" as the last word making up their name, what does that tell you? It might not be their family name. It might only be part of the family name. It might be an honorary in some language, but which according to their culture, should be considered part of the name.
You can only normalize data if it follows a common structure.

If you had a need to perform queries based on diminutive names I could see a need for normalizing the names. e.g. a search for "Betty" may need to return results for "Betty", "Beth", and "Elizabeth"

Yes, definitely overkill. What's a few dozen bytes betewen friends?

Maybe if you work in the Census office it might make sense. Otherwise, see every other answer :)

I would say yes, it is going too far in 95%+ of the cases.

Yes. I cannot think of an instance where the benefits outweigh the problems and query complications.

No, but you might want to normalise to a canonical record for a customer (so you don't get 5 different entries for 'Bloggs & Co.' in your database. This is a data cleansing issue that often bites on MIS projects.

You often don't go over fourth form normalization in a database. Therefore seventh form normalization is quite a bit overboard. The only place this might even be a remotely plausible idea is in some kind of massive data warehouse.

Generally yes. Normalizing to that level would be going to far. Depending on the queries (such as phone books where searches by last name are common) it might be worthwhile. I expect that to be rare.

I generally haven't seen a need to normalize the name, mainly because that adds a performance hit on the join that will always be called, and doesn't give any benefit.
If you have so many similar names, and have a storage problem then it may be worth it, but there will be a performance hit that would need to be considered.

I would say it is absolutely overkill. In most applications, you display folks' names so often, every query involved with that is going to look that much more complex and harder to read.

Yes, it is. It is commonly recognized that just applying all of the Rules of Normalization can cause you to go way too far and end up with an overnormalized database. For example, it would be possible to normalize every instance of every character to a reference to a character enumeration table. It's easy to see that that's ridiculous.
Normalization needs to be performed at a level that is appropriate for your problem domain. Overnormalization is as much a problem as undernormalization (although, of course, for different reasons).

There might be a case where being able to link married/maiden names would be useful.
Recently had a case where I had to rename thousands of emails in exchange because somebody got divorced and didn't want any emails listing her as married_name#company.com

No need to normalize to that level unless the names make up a composite primary key and you have data that is dependant on one of the names (e.g. anyone with the surname Plummer knows nothing about databases). In which case, by not normalizing, you would violate second normal form.

I agree with the general response, you wouldn't do that.
One thing comes to mind though, compression. If you had a billion people and you found that 60% of first names were pulled from 5 very common names, you could use some tricky bit manipulation to reduce the size very significantly. It would also require very customized database software.
But this isn't for the purpose of normalization, just compression.

You should normalize it out if you need to avoid the delete anomaly that comes with not breaking it out. That is, if you ever need to answer the question, has my database ever had a person named "Joejimbobjake" in it, you need to avoid the anomaly. Soft deletes is probably a much better way than having a comprehensive first name table (for example), but you get my point.

In addition to all the points everyone else has made, consider that if you were implementing a data entry operation (for example), and were to insert a new contact, you would have to search your first name and last name tables to locate the correct Id's and then use those values. But then this is further complicated by the occasion when the name is not on the FN and/or LN tables, then you have to insert the new first/last name and use the new id(s).
And if you think that you have a comprehensive list of names, think again. I work with a list of over 200k unique first names and I'd guess it represents 99.9% of the US population. But that .1% = a lot of people. And don't forget the foreign names and misspellings...

What is the "best" way to store international addresses in a database?

What is the "best" way to store international addresses in a database? Answer in the form of a schema and an explanation of the reasons why you chose to normalize (or not) the way you did. Also explain why you chose the type and length of each field.
Note: You decide what fields you think are necessary.

Plain freeform text.
Validating all the world's post/zip codes is too hard; a fixed list of countries is too politically sensitive; mandatory state/region/other administrative subdivision is just plain inappropriate (all too often I'm asked which county I live in--when I don't, because Greater London is not a county at all).
More to the point, it's simply unnecessary. Your application is highly unlikely to be modelling addresses in any serious way. If you want a postal address, ask for the postal address. Most people aren't so stupid as to put in something other than a postal address, and if they do, they can kiss their newly purchased item bye-bye.
The exception to this is if you're doing something that's naturally constrained to one country anyway. In this situation, you should ask for, say, the { postcode, house number } pair, which is enough to identify a postal address. I imagine you could achieve similar things with the extended zip code in the US.

In the past I've modeled forms that needed to be international after the ups/fedex shipping address forms on their websites (I figured if they don't know how to handle an international order we are all hosed). The fields they use can be used as reference for setting up your schema.

In general, you need to understand why you want an address. Is it for shipping/mailing? Then there is really only one requirement, have the country separate. The other lines are freeform, to be filled in by the user. The reason for this is the common forwarding strategy for mail : any incoming mail for a foreign country is forwarded without looking at the other address lines. Hence, the detailed information is parsed only by the mail sorter located in the country itself. Like the receiver, they'll be familiar with national conventions.
(UPS may bunch together some small European countries, e.. all the Low Countries are probably served from Belgium - the idea still holds.)

I think adding country/city and address text will be fine. country and city should be separate for reporting. Managers always ask for these kind of reports which you do not expect and I dont prefer running a LIKE query through a large database.

Not to give Facebook undue respect. However, the overall structure of the database seems to be overlooked in many web applications launching every day. Obviously I don't think there is a perfect solution that covers all the potential variables with address structure without some hard work. That said, combined with autocomplete Facebook manages to take location input data and eliminate a majority of their redundant entries. They do this by organizing their database well enough to provide autocomplete information in a low cost, low error way to the client in real time allowing them to more or less choose the correct location from an existing list.
I think the best solution is to access a third party database which contains your desired geographic scope and use it to initially seed your user location information. This will allow you to avoid doing the groudwork of creating your own. With any luck you can reduce the load on your server by allowing your new users to receive the correct autocomplete information directly off your third party supplier. Eventually you will be able to fill most autocomplete for location information such as city, country, etc. from information contained in your own database from user input data.

You need to provide a bit more details about how you are planning to use the data. For example, fields like City, State, Country can either be text in the single table, or be codes which are linked to a separate table with a Foreign Key.
Simplest would be
Address_Line_01 (Required, Non blank)
Address_Line_02
Address_Line_03
Landmark
City (Required)
Pin (Required)
Province_District
State (Required)
Country (Required)
All the above can be Text/Unicode with appropriate field lengths.
Phone Numbers as applicable.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight