When designing a database, what decisions do you consider when deciding how big your nvarchar should be.
If i was to make an address table my gut reaction would be for address line 1 to be nvarchar(255) like an old access database.
I have found using this has got me in bother with the old 'The string would be truncated'. I know that this can be prevented by limiting the input box but if a user really has a address line one that is over 255 this should be allowed.
How big should I make my nvarchar(????)
My recommendation: make them just as big as you REALLY need them.
E.g. for a zip code column, 10-20 chars are definitely enough. Ditto for a phone number. E-Mails might be longer, 50-100 chars. Names - well, I usually get by with 50 chars, ditto for first names. You can always and easily extend fields if you really need to - that's no a big undertaking at all.
There's really no point in making all varchar/nvarchar fields as big as they can be. After all, a SQL Server page is fixed and limited to 8060 bytes per row. Having 10 fields of NVARCHAR(4000) is just asking for trouble.... (since if you actually try to fill them with too much data, SQL Server will barf at you).
If you REALLY need a really big field, use NVARCHAR/VARCHAR(MAX) - those are stored in your page, as long as they fit, and will be sent to "overflow" storage if they get too big.
NVARCHAR vs. VARCHAR: this really boils down to do you really need "exotic" characters, such as Japanese, Chinese, or other non-ASCII style characters? In Europe, even some of the eastern European characters cannot be represented by VARCHAR fields anymore (they will be stripped of their hachek (? spelling ?). Western European languages (English, German, French, etc.) are all very well served by VARCHAR.
BUT: NVARCHAR does use twice as much space - on disk and in your SQL Server memory - at all times. If you really need it, you need it - but do you REALLY ? :-) It's up to you.
Marc
I don't use nvarchar personally :-) I always use varchar.
However, I tend to use 100 for name and 1000 for comments. Trapping and dealing with longer strings is something the client can do, say via regex, so SQL only gets the data it expects.
You can avoid truncation errors be parameterising the calls, for example via stored procs.
If the parameter is defined as varchar(200), say, then truncation happens silently if you send > 200. The truncation error is thrown only for an INSERT or UPDATE statement: with parameters it won't happen.
The 255 "limit" for SQL Server goes back to 6.5 because vachar was limited to 255. SQL Server 7.0 + changed to 8000 and added support for unicode
Edit:
Why I don't use nvarchar: Double memory footprint, double index size, double disk size, simply don't need it. I work for a big Swiss company with offices globally so I'm not being parochial.
Also discussed here: varchar vs nvarchar performance
On further reflection, I'd suggest unicode appeals to client developers but as a developer DBA I focus on performance and efficiency...
It depends on what the field represents. If I'm doing a quick prototype I leave the defaults of 255. For anything like comments etc I'd probably put it to 1000.
The only way I'd make it smaller really is on things I definately know the siez of, zip codes or NI numbers etc.
For columns that you need to have certain constraints on - like names, emails, addresses, etc - you should put a reasonably high max length. For instance a first name of more than 50 characters seems a bit suspicious and an input above that size will probably contain more that just a first name. But for the initial design of a database, take that reasonable size and double it. So for first names, set it to 100 (or 200 if 100 is your 'reasonable size'). Then put the app in production, let the users play around for a sufficiently long time to gather data and then check the actual max(len(FirstName)). Are there any suspicious values there? Anything above 50 chars? Find out what's in there and see if it's actually a first name or not. If it's not, the input form probably needs better explanations/validations.
Do the same for comments; Set them to nvharchar(max) initially. Then come back when your database has grown enough for you to start optimizing performance. Take the max length of the comments, double it and you have a good max length for your column.
Related
I am having a problem choosing the variable types for database table.
Can someone please give me some general guidelines as to how to choose the types? The following are some of the questions I have --
What should an userid be? An INT seems small as the design should take large number of users into account. So if not INT what else? BIGINT? VARCHAR? Am I not thinking straight?
When should I choose varchar and text and tinytext, binary?
I searched online and did not find any helpful links. Can someone point me in the right direction? Maybe I need to read book.
Update for 2013: The internet now has roughly the same number of users as a signed 32-bit integer. Facebook has about a billion users. So if you're planning on being the next Google or Facebook, and you definitely won't have time to re-engineer between now and then, go with something bigger. Otherwise stick with what's easy.
As of 2010:
An int seems small? Are you anticipating having 2,147,483,648 users? (For the record, that's like 1 in 3 people in the world, or 400 million more than the total number of people who use the internet; or 5 and a half times more users than Facebook).
Answer: use an int. Don't overthink it. If you get in a situation where our old friend the 32-bit integer just isn't cutting it for you, you will probably also have a staff of genius college graduates and $5b capital infusion to help solve the problem then.
I would actually make a user ID a character type since I'd rather log in as "pax" than 14860. However, if it has to be numeric (such as if you want the name to be changed easily or you want to minimise storage for foreign keys to the user table), an integer should be fine. Just how many users are you expecting to have? :-)
I would always use varchar for character data since I know that's supported on all SQL databases (or at least the ones that matter). I prefer to stay with the standards as much as possible so moving between databases is easier.
Your username field should probably be a varchar (you pick the length). If you have an id_user on the user table, I agree with the above answers that int (2 billion) will more than adequately cover your needs.
As far as you alphanumeric data types, it depends on how long you want it to be. For example, I think a varchar(255), or probably even smaller (personally I'd go with 50), would work for a username and password field. This chart might help you.
Binary would be for storing non-character data such as images. bit can be used for boolean values.
A couple other useful types are uniqueidentifier for UUID/GUID and xml for XML. Although you can also use varchar or text types for those too.
There's also nvarchar for Unicode in SQL Server and several date types (I'm glad they came out with date in SQL Server 2008 which doesn't include a time such as smalldatetime).
I have a database with a field that holds permit numbers associated with requests. The permit numbers are 13 digits, but a permit may not be issued.
With that said, I currently have the field defined as a char(13) that allows NULLs. I have been asked to change it to varchar(13) because char's, if NULL, still use the full length.
Is this advisable? Other than space usage, are there any other advantages or disadvantages to this?
I know in an ideal relational system, the permit numbers would be stored in another related table to avoid the use of NULLs, but it is what it is.
Well, if you don't have to use as much space, then you can fit more pages in memory. If you can do that, then your system will run faster. This may seem trivial, but I just recently tweaked the data types on a a table at a client that reduced the amount of reads by 25% and the CPU by about 20%.
As for which is easier to work with, the benefits David Stratton mentioned are noteworthy. I hate having to use trim functions in string building.
If the field should always be exactly 13 characters, then I'd probably leave it as CHAR(13).
Also, an interesting note from BOL:
If SET ANSI_PADDING is OFF when either
CREATE TABLE or ALTER TABLE is
executed, a char column that is
defined as NULL is handled as varchar.
Edit: How frequently would you expect the field to be NULL? If it will be populated 95% of the time, it's hardly worth it to make this change.
The biggest advantage (in general, not necessarily your specific case) I know of is that in code, if you use varchar, you don't have to use a Trim function every time you want it displayed. I run into this a lot when taking FirstName fields and LastName fields and combining them into a FullName. It's just annoying and makes the code less readable.
if your are using sql server 2008, you should look at Row Compression and perhaps sparse fields if the column is more ~60% nulls.
I would keep the datatype a char(13) if all of the populated fields use that amount.
Row Compression Information:
http://msdn.microsoft.com/en-us/library/cc280449.aspx
Sparse columns:
http://msdn.microsoft.com/en-us/library/cc280604.aspx
Is it good practice to set all text fields to nvarchar(MAX)? if im not sure what the size of the field will be
will it take up extra space
varchar(max) won't take extra space.
One dangerous thing about varchar(max) is that someone can potentially but a huge amount of data in it. This can hurt if a malicious user. or faulty program, enters a 1MB address line. No client or web application will expect that, and it might cause them to crash.
So, as a best practice, create varchar fields with a specified maximum length.
If you're not sure, and its for something like a comments field that may be quite large, a NVARCHAR(MAX) type is the way to go. If it is for something with a reasonable maximum, like a postal address, then something like NVARCHAR(100) might be better.
The space used is up to the maximum length of the value that is put into it.
What I mean by that is if you insert an 80 character string, then update it with a 40 character replacement, the allocated space may remain as 80 characters until the database is compacted, table is rebuilt, etc.
I'm not sure what you're storing in the columns but you may want to consider varchar instead of nvarchar. nvarchar stores unicode data which is used for multilingual data and takes up more space than varchar. If the field will only store English (for example) then varchar might be a better choice.
I'm using SQL Express 2008 edition. I've planned my database tables/relationships on a piece of paper and was ready to begin.
Unfortunately, when I clicked on data type for my pkID_Officer I was offered more than I expected, and therefore doubt has set in.
Can anyone point me to a place where these dates types are explained and examples are given as to what fields work better with which data types.
Examples:
is int for an ID (primarykey) still the obvious choice or does the uniqueidentifier takes it's crown?
Telephone numbers where the digits are separated by '.' (01.02.03.04.05)
Email
items that will be hyper-links
nChar and vChar?
Any help is always appreciated.
Thanks
Mike.
The MSDN site has a good overview of the SQL 2008 datatypes.
http://msdn.microsoft.com/en-us/library/ms187752.aspx
For the ID field use a guid if it needs to be unique across separate systems or tables or you want to be able to generate the ID outside of the database. Otherwise an int/identity value works just fine.
I store telephone numbers a character data since I won't ever be doing calculations on it. I would think email would be stored the same way.
As for hyperlinks you can basically store the hyperlink by itself as a varchar and render the link on the client or store the markup itself in the database. Really depends on the circumstances.
Use nvarchar if you ever think you'll need to support double byte languages now or in the future.
For primary key I always prefer (as starting point) to use an auto increment int. It makes everything more usable and you don't have any "natural" relationship with the real data. Of course there can be exception to this...
is int for an ID (primarykey) still the obvious choice or does the uniqueidentifier takes it's crown?
I personally prefer INT IDENTITY over GUIDs - especially for your clustered index. GUIDs are random in nature and thus lead to a lot of index fragmentation and therefore poor performance when used as a clustered index on SQL Server. INT doesn't have this trouble, plus it's only 4 bytes vs. 16 bytes, so if you have lots of rows, and lots of non-clustered indexes (the clustered key gets added to each and every entry in each and every non-clustered index), using a GUID will unnecessarily bloat your space requirements (on disk and also in your machine's RAM)
Telephone numbers where the digits are separated by '.' (01.02.03.04.05)
Email
items that will be hyper-links
I'd use string fields for all of these.
VARCHAR is fine, as long as you don't need any "foreign" language support, e.g. it's okay for English and Western European languages, but fails on Eastern European and Asian languages (Cyrillic, Chinese etc.).
NVARCHAR will handle all those extra pesky languages at a price - each character is stored in 2 bytes, e.g. a string of 100 chars will use 200 bytes of storage - ALWAYS.
Hope this helps a bit !
Marc
uniqueidentifier are GUIDs:
http://de.wikipedia.org/wiki/Globally_Unique_Identifier
Guids
- are worldwide unique
- have in DotNet e.g. her own objects (System.Guid)
- are 16 digits long
If you want such things, then use it. If you are fine with int-ids, then it's ok.
Telephone-Numbers / emails / Hyperlings are normal strings.
NCHAR/NVARCHR are Unicode counterpoints of CHAR/VARCHAR datatypes. I almost always use them in my apps - unless I have compelling reasons not to use them.
I need to store phone numbers in a table. Please suggest which datatype should I use?
Wait. Please read on before you hit reply..
This field needs to be indexed heavily as Sales Reps can use this field for searching (including wild character search).
As of now, we are expecting phone numbers to come in a number of formats (from an XML file). Do I have to write a parser to convert to a uniform format? There could be millions of data (with duplicates) and I dont want to tie up the server resources (in activities like preprocessing too much) every time some source data comes through..
Any suggestions are welcome..
Update: I have no control over source data. Just that the structure of xml file is standard. Would like to keep the xml parsing to a minimum.
Once it is in database, retrieval should be quick. One crazy suggestion going on around here is that it should even work with Ajax AutoComplete feature (so Sales Reps can see the matching ones immediately). OMG!!
Does this include:
International numbers?
Extensions?
Other information besides the actual number (like "ask for bobby")?
If all of these are no, I would use a 10 char field and strip out all non-numeric data. If the first is a yes and the other two are no, I'd use two varchar(50) fields, one for the original input and one with all non-numeric data striped and used for indexing. If 2 or 3 are yes, I think I'd do two fields and some kind of crazy parser to determine what is extension or other data and deal with it appropriately. Of course you could avoid the 2nd column by doing something with the index where it strips out the extra characters when creating the index, but I'd just make a second column and probably do the stripping of characters with a trigger.
Update: to address the AJAX issue, it may not be as bad as you think. If this is realistically the main way anything is done to the table, store only the digits in a secondary column as I said, and then make the index for that column the clustered one.
We use varchar(15) and certainly index on that field.
The reason being is that International standards can support up to 15 digits
Wikipedia - Telephone Number Formats
If you do support International numbers, I recommend the separate storage of a World Zone Code or Country Code to better filter queries by so that you do not find yourself parsing and checking the length of your phone number fields to limit the returned calls to USA for example
Use CHAR(10) if you are storing US Phone numbers only. Remove everything but the digits.
I'm probably missing the obvious here, but wouldn't a varchar just long enough for your longest expected phone number work well?
If I am missing something obvious, I'd love it if someone would point it out...
I would use a varchar(22). Big enough to hold a north american phone number with extension. You would want to strip out all the nasty '(', ')', '-' characters, or just parse them all into one uniform format.
Alex
nvarchar with preprocessing to standardize them as much as possible. You'll probably want to extract extensions and store them in another field.
SQL Server 2005 is pretty well optimized for substring queries for text in indexed varchar fields. For 2005 they introduced new statistics to the string summary for index fields. This helps significantly with full text searching.
using varchar is pretty inefficient. use the money type and create a user declared type "phonenumber" out of it, and create a rule to only allow positive numbers.
if you declare it as (19,4) you can even store a 4 digit extension and be big enough for international numbers, and only takes 9 bytes of storage. Also, indexes are speedy.
Normalise the data then store as a varchar. Normalising could be tricky.
That should be a one-time hit. Then as a new record comes in, you're comparing it to normalised data. Should be very fast.
Since you need to accommodate many different phone number formats (and probably include things like extensions etc.) it may make the most sense to just treat it as you would any other varchar. If you could control the input, you could take a number of approaches to make the data more useful, but it doesn't sound that way.
Once you decide to simply treat it as any other string, you can focus on overcoming the inevitable issues regarding bad data, mysterious phone number formating and whatever else will pop up. The challenge will be in building a good search strategy for the data and not how you store it in my opinion. It's always a difficult task having to deal with a large pile of data which you had no control over collecting.
Use SSIS to extract and process the information. That way you will have the processing of the XML files separated from SQL Server. You can also do the SSIS transformations on a separate server if needed. Store the phone numbers in a standard format using VARCHAR. NVARCHAR would be unnecessary since we are talking about numbers and maybe a couple of other chars, like '+', ' ', '(', ')' and '-'.
Use a varchar field with a length restriction.
It is fairly common to use an "x" or "ext" to indicate extensions, so allow 15 characters (for full international support) plus 3 (for "ext") plus 4 (for the extension itself) giving a total of 22 characters. That should keep you safe.
Alternatively, normalise on input so any "ext" gets translated to "x", giving a maximum of 20.
It is always better to have separate tables for multi valued attributes like phone number.
As you have no control on source data so, you can parse the data from XML file and convert it into the proper format so that there will not be any issue with formats of a particular country and store it in a separate table so that indexing and retrieval both will be efficient.
Thank you.
I realize this thread is old, but it's worth mentioning an advantage of storing as a numeric type for formatting purposes, specifically in .NET framework.
IE
.DefaultCellStyle.Format = "(###)###-####" // Will not work on a string
Use data type long instead.. dont use int because it only allows whole numbers between -32,768 and 32,767 but if you use long data type you can insert numbers between -2,147,483,648 and 2,147,483,647.
For most cases, it will be done with bigint
Just save unformatted phone numbers like: 19876543210, 02125551212, etc.
Check the topic about bigint vs varchar