SQL Server - Storing multiple decimal values in a column? - sql-server

I know storing multiple values in a column. Not a good idea.
It violates first normal form --- which states NO multi valued attributes. Normalize period...
I am using SQL Server 2005
I have a table that require to store lower limit and uppper limit for a measurement, think of it as a minimum and maximum speed limit... only problem is only 2 % out of hundread i need upper limit. I will only have data for lower limit.
I was thinking to store both values in a column (Sparse column introduces in 2008 so not for me)
Is there a way...? Not sure about XML..

You'd have to be storing an insane amount of rows for this to even matter. The price of a 1 terabyte disk is now 60 dollars!
Two floats use up 8 bytes; an XML string will use a multiple of that just to store one float. So even though XML would store only one instead of two columns, it would still consume more space.
Just use a nullable column.

To answer your question, you could store it as a string with a particular format that you know how to parse (e.g. "low:high").
But ... this is really not a good idea.
Dealing with 98% of the rows having NULL value for upper limit is totally fine IMHO. Keep it clean, and you won't regret it later.

Even so, I agree with Andomar. Use two colums, low limit and high limit. If either value could be unknown, make those columns nullable.
Alternatively, designate a default arbitrary minimum and maximum values, and use those values instead of nulls. (Doing this means you never have to mess with trinary logic, e.g. having to wrap everything with ISNULL or COALESCE.)
Once you define your schema, there are tricks you can use to reduce storage space (such as compression and sparce columns).

Related

Is there a benefit to decreasing the size of my NVARCHAR columns

I have a SQL Server 2008 database that stores millions of rows. There are several NVARCHAR columns that will never exceed the current max length of the column, nor get close to it due to application constraints.
i.e.
The Address NVARCHAR field has a length of 50 characters, but it'll never exceed 32 characters.
Is there a performance benefit or space saving benefit to me reducing the size of the NVARCHAR column to what it's actual max length will be (i.e. in the case of the Address field, 32 characters). Or will it not make a difference since it's a variable length field?
Setting the number of characters in NVARCHAR is mainly for validation purposes. If there is some reason why you don't want the data to exceed 50 characters then the database will enforce that rule for you by not allowing extra data.
If the total row size exceeds a threshold then it can affect performance, so by restricting the length you could benefit by not allowing your row size to exceed that threshold. But in your case, that does not seem to matter.
The reason for this is that SQL Server can fit more rows onto a Page, which results in less disk I/O and more rows can be stored in memory.
Also, the maximum row size in SQL Server is 8KB as that is the size of a page and rows cannot cross page boundaries. If you insert a row that exceeds 8KB, the extra data will be stored in a row overflow page, which will likely have a negative affect on performance.
There is no expected performance or space saving benefit for reducing your n/var/char column definitions to their maximum length. However, there may be other benefits.
The column won't accidentally have a longer value inserted without generating an error (desirable for the "fail fast" characteristic of well-designed systems).
The column communicates to the next developer examining the table something about the data, that aids in understanding. No developer will be confused about the purpose of the data and have to expend wasted time determining if the code's field validation rules are wrong or if the column definition is wrong (as they logically should match).
If your column does need to be extended in length, you can do so with potential consequences ascertained in advance. A professional who is well-versed in databases can use the opportunity to see if upcoming values that will need the new column length will have a negative impact on existing rows or on query performance—as the amount of data per row affects the number of reads required to satisfy queries.

How are varchar values stored in a SQL Server database?

My fellow programmer has a strange requirement from his team leader; he insisted on creating varchar columns with a length of 16*2n.
What is the point of such restriction?
I can suppose that short strings (less than 128 chars for example) a stored directly in the record of the table and from this point of view the restriction will help to align fields in the record, larger strings are stored in the database "heap" and only the reference to this string is saved in the table record.
Is it so?
Is this requirement has a reasonable background?
BTW, the DBMS is SQL Server 2008.
Completely pointless restriction as far as I can see. Assuming standard FixedVar format (as opposed to the formats used with row/page compression or sparse columns) and assuming you are talking about varchar(1-8000) columns
All varchar data is stored at the end of the row in a variable length section (or in offrow pages if it can't fit in row). The amount of space it consumes in that section (and whether or not it ends up off row) is entirely dependant upon the length of the actual data not the column declaration.
SQL Server will use the length declared in the column declaration when allocating memory (e.g. for sort operations). The assumption it makes in that instance is that varchar columns will be filled to 50% of their declared size on average so this might be a better thing to look at when choosing a size.
I have heard of this practice before, but after researching this question a bit I don't think there is a practical reason for having varchar values in multiples of 16. I think this requirement probably comes from trying to optimize the space used on each page. In SQL Server, pages are set at 8 KB per page. Rows are stored in pages, so perhaps the thinking is that you could conserve space on the pages if the size of each row divided evenly into 8 KB (a more detailed description of how SQL Server stores data can be found here). However, since the amount of space used by a varchar field is determined by its actual content, I don't see how using lengths in multiples of 16 or any other scheme could help you optimize the amount of space used by each row on the page. The length of the varchar fields should just be set to whatever the business requirements dictate.
Additionally, this question covers similar ground and the conclusion also seems to be the same:
Database column sizes for character based data
You should always store the data in the data size that matches the data being stored. It is part of how the database can maintain integrity. For instance suppose you are storing email addresses. If your data size is the size of the maximum allowable emailaddress, then you will not be able to store bad data that is larger than that. That is a good thing. Some people want to make everything nvarchar(max) or varchar(max). However, this causes only indexing problems.
Personally I would have gone back to the person who make this requirement and asked for a reason. Then I would have presented my reasons as to why it might not be a good idea. I woul never just blindly implement something like this. In pushing back on a requirement like this, I would first do some research into how SQL Server organizes data on the disk, so I could show the impact of the requirement is likely to have on performance. I might even be surprised to find out the requirement made sense, but I doubt it at this point.

SQL Better performance: char(10) and trim or varchar(10)

I have a database that uses codes. Each code can be anywhere from two characters to ten characters long.
In MS SQL Server, is it better for performance to use char(10) for these codes and RTRIM them as they come in, or should I use varchar(10) and not have to worry about trimming the extra whitespace? I need to get rid of the whitespace because the codes will then be used in application logic for comparisons and what not.
As for the average code length, hard to tell exactly. Assume all codes are a random length between one and ten. Edit: A rough estimation is about 4.7 characters for the average length of a code.
I'd vote for varchar.
I say varchar to avoid the TRIM which would invalidate index usage (unless you use a computed column etc which defeats the purpose, no?).
Otherwise at length 10, it would be 50/50 but TRIM tips the balance towards varchar and wins out over the fixed length benefit
As a general rule, always favor smaller storage over extra CPU. Because the driving factor of database performance is always IO and smaller data records means more records per page and this in turn means fewer IO requests. The extra CPU involved in handling the variable length is not going to be a factor. Historically, in the dark ages of '80s and even in the '90s it may have been a measurable factor, but today is just noise. Because the CPU and memory access have increased tremendously, but the IO speed has stayed pretty much constant. That's why 'old books' advice does not apply today. Unless you have a constant field like char(2) or similar, just use varchar, always.
I'm confident that you wouldn't be able to tell a speed difference between the two.
Your requirements are a textbook definition of someone who needs to use varchar.
If you want to worry about performance, worry about DB design and writing good SQL. Char vs VarChar internals are well-optimized by the DB vendors.
In one old book I read that in general char is a better choice when for the most of the records the real string length is at least 60% of maximum; in your example - if more than half of all records have length 6 or greater. Otherwise, use varchar.

Should Data types be sizes of powers of 2 in SQL Server?

What are good sizes for data types in SQL Server? When defining columns, i see data types with sizes of 50 as one of the default sizes(eg: nvarchar(50), binary(50)). What is the significance of 50? I'm tempted to use sizes of powers of 2, is that better or just useless?
Update 1
Alright thanks for your input guys. I just wanted to know the best way of defining the size of a datatype for a column.
There is no reason to use powers of 2 for performance etc. Data length should be determined by the size stored data.
Why not the traditional powers of 2, minus 1 such as 255...
Seriously, the length should match what you need and is suitable for your data.
Nothing else: how the client uses it, aligns to 32 bit word boundary, powers of 2, birthdays, Scorpio rising in Uranus, roll of dice...
The reason so many fields have a length of 50 is that SQL Server defaults to 50 as the length for most data types where length is an issue.
As has been said, the length of a field should be appropriate to the data that is being stored there, not least because there is a limit to the length of single record in SQL Server (it's ~8000 bytes). It is possible to blow past that limit.
Also, the length of your fields can be considered part of your documentation. I don't know how many times I've met lazy programmers who claim that they don't need to document because the code is self documenting and then they don't bother doing the things that would make the code self documenting.
You won't gain anything from using powers of 2. Make the fields as long as your business needs really require them to be - let SQL Server handle the rest.
Also, since the SQL Server page size is limited to 8K (of which 8060 bytes are available to user data), making your variable length strings as small as possible (but as long as needed, from a requirements perspective) is a plus.
That 8K limit is a fixed SQL Server system setting which cannot be changed.
Of course, SQL Server these days can handle more than 8K of data in a row, using so called "overflow" pages - but it's less efficient, so trying to stay within 8K is generally a good idea.
Marc
The size of a field should be appropriate for the data you are planning to store there, global defaults are not a good idea.
It's a good idea that the whole row fits into page several times without leaving too much free space.
A row cannot span two pages, an a page has 8096 bytes of free space, so two rows that take 4049 bytes each will occupy two pages.
See docs on how to calculate the space occupied by one row.
Also note that VAR in VARCHAR and VARBINARY stands for "varying", so if you put a 1-byte value into a 50-byte column, it will take but 1 byte.
This totally depends on what you are storing.
If you need x chars use x not some arbitrarily predefined amount.

Strategy for storing an string of unspecified length in Sql Server?

So a column will hold some text that beforehand I won't know how long the length of this string can be. Realistically 95% of the time, it will probably be between 100-500 chars, but there can be that one case where it will 10000 chars long. I have no control over the size of this string and never does the user. Besides varchar(max), what other strategy have you guys found useful? Also what are some cons of varchar(max)?
Varchar(max) in sqlserver 2005 is what I use.
SqlServer handles large string fields weirdly, in that if you specify "text" or a large varchar, but not max, it stores part of the bits in the record and the rest outside.
To my knowledge with varchar(max) it goes ahead and stores the entire contents out of the record, which makes it less efficient than a small text input. But its more efficient than a "text" field since it does not have to look up that information 2 times by getting part inline and the rest from a pointer.
One inelegant but effective approach would be to have two columns in your table, one a varchar big enough to cover your majority of cases, and another of a CLOB/TEXT type to store the freakishly large ones. When inserting/updating, you can get the size of your string, and store it in the appropriate column.
Like I say, not pretty, but it would give you the performance of varchar for the majority case, without breaking when you have larger values.
Have you considered using the BLOB type?
Also, out of curiosity, is you don't control the size of the string, and neither does the user, who does?
nvarchar(max) is definitely your best bet - as i'm sure you know it will only allocate the space required for the data you are actually storing per row, not the actual max of the datatype per row.
The only con i would see would be if you are constantly updating a row and it is switching from less than 8000 bytes to > 8000 bytes often in which case SQL will change the storage to a LOB and store a pointer to the data whenever you go over 8000 bytes. Changing back and forth would be expensive in this case, but you don't really have any other options in this case that I can see - so it's kind of a moot point.

Resources