SQL Server varchar memory use

SQL Server varchar memory use - sql-server

There is a column field1 varchar(32)
If I just store 'ABC' in this field, it takes 3 bytes. When SQL Server access this column and caches it in memory, how much memory does it take? 3 bytes or 32 bytes, and how to prove your answer?
Thanks in advance

SQL Server will not cache the column. It will cache the entire page that happen to contain the value, so it will always cache 8192 bytes. The location of the page is subject to things like in-row vs. row-overflow storage and whether the column is sparse or not.
Now a better question would be how much does such a value occupy in the page? The answer is again not straight forward because the value could be stored uncompressed, row-compressed, page compressed or column-compressed. Is true that row compression has no effect on varchar fields, but page compression does.
Now for a straight forward way of answering how much storage does a varchar(32) type value occupy in-row in a non-compressed table the best resource is Inside the Storage Engine: Anatomy of a record. After you read Paul Randal's article you will be able to answer the question, and also prove the answer.
In addition you must consider any secondary index that has this column as a key or includes this column.

50% of their declared size on average. There is a lot documentation on this. Here is where I pulled this from How are varchar values stored in a SQL Server database?

Related

How can I store more than 8000 bytes of data inline in a SQL Server 2012 row?

I am trying to write a small blog engine. I would love to find a sample SQL Server schema to give me some ideas but have yet to find one.
I would like to have a blog table that allows me to store more than 8000 bytes of data. Can anyone tell me if a good way to do this would with two fields like this:
CREATE TABLE [Blog](
[BlogId] [int] IDENTITY(1,1) NOT NULL,
[BlogText1] [nvarchar](8000) NOT NULL,
[BlogText2] [nvarchar](8000),
..
What I was thinking was to store the text in two fields and have my application append the contents of the two fields when it was displaying the data and when storing data have the first xxx characters stored in BlogText1 and then any remainder stored in BlogText2.
Is this a reasonable thing to do or should I just use a nvarchar(max)?
If I use nvarchar(8000) how many characters can I fit into that?
What I am concerned about is the time it will take to retrieve a row. Am I correct in assuming that if I use nvarchar(max) it will take much longer to retrieve the row.

The short version - use NVARCHAR(MAX) until you identify that there is a definite performance problems to solve - attempting to manually split up large blog entries so that they are saved "inline" is almost certainly going to result in worse performance than leaving it up to SQL Server.
The long version - SQL Server stores data up in 8060 byte size chunks called pages. Normally the length of an individual column cannot exceed this size, however certain large-value types (e.g. TEXT) can be handled specially and their value replace with a 24-byte pointer to the actual data which is stored elsewhere (in the ROW_OVERFLOW_DATA allocation unit)
The NVARCHAR(MAX) data types actually provide a hybrid approach - in the case where the data is small enough the value is stored in the Data pages as it would be normally, however when the data is too large it is seamlessly converted into a large-value type for you. This generally means you get the best of both worlds.

The size of a column in the database can slow a query?

I have a table with a column contains HTML content and is relative greater than the other columns.
Having a column with a great size can slow the queries in this table?
I need to put this big fields in another table?

The TOAST Technique should handle this for you, after a given size the storage will be transparently set in a _toast table and some internal things are done to avoid slowing down your queries (check the given link).
But of course if you always retrieve the whole content you'll loose time in the retrieval. And it's also clear that requests on this table where this column is not used won't suffer from this column size.

The bigger the database the slower the queries. Always.

It's likely that if you have large column, there is going to be more disk I/O since caching the column itself takes more space. However, putting these in a different table won't likely alleviate this issue (other than the issue below). When you don't explicitly need the actual HTML data, be sure not to SELECT it.
Sometimes the ordering of the columns can matter because of the way rows are stored, if you're really worried about it, store it as the last column so it doesn't get paged when selecting other columns

You would have to look at how Postgres internally stores things to see if you need to split this out but a very large field could cause the way the data is stored on the disk to be broken up and thus adds to the time it takes to access it.
Further, returning 100 bytes of data vice 10000 bytes of data for one record is clearly going to be slower, the more records the slower. If you are doing SELECT * this is clearly a problem espcially if you usually do not need the HTML.
Another consideration could be ptting the HTML information in a noSQL database. This kind of document information is what they excel at. No reason you can't use both a realtional database for some info and a noSQL database for other info.

How are varchar values stored in a SQL Server database?

My fellow programmer has a strange requirement from his team leader; he insisted on creating varchar columns with a length of 16*2n.
What is the point of such restriction?
I can suppose that short strings (less than 128 chars for example) a stored directly in the record of the table and from this point of view the restriction will help to align fields in the record, larger strings are stored in the database "heap" and only the reference to this string is saved in the table record.
Is it so?
Is this requirement has a reasonable background?
BTW, the DBMS is SQL Server 2008.

Completely pointless restriction as far as I can see. Assuming standard FixedVar format (as opposed to the formats used with row/page compression or sparse columns) and assuming you are talking about varchar(1-8000) columns
All varchar data is stored at the end of the row in a variable length section (or in offrow pages if it can't fit in row). The amount of space it consumes in that section (and whether or not it ends up off row) is entirely dependant upon the length of the actual data not the column declaration.
SQL Server will use the length declared in the column declaration when allocating memory (e.g. for sort operations). The assumption it makes in that instance is that varchar columns will be filled to 50% of their declared size on average so this might be a better thing to look at when choosing a size.

I have heard of this practice before, but after researching this question a bit I don't think there is a practical reason for having varchar values in multiples of 16. I think this requirement probably comes from trying to optimize the space used on each page. In SQL Server, pages are set at 8 KB per page. Rows are stored in pages, so perhaps the thinking is that you could conserve space on the pages if the size of each row divided evenly into 8 KB (a more detailed description of how SQL Server stores data can be found here). However, since the amount of space used by a varchar field is determined by its actual content, I don't see how using lengths in multiples of 16 or any other scheme could help you optimize the amount of space used by each row on the page. The length of the varchar fields should just be set to whatever the business requirements dictate.
Additionally, this question covers similar ground and the conclusion also seems to be the same:
Database column sizes for character based data

You should always store the data in the data size that matches the data being stored. It is part of how the database can maintain integrity. For instance suppose you are storing email addresses. If your data size is the size of the maximum allowable emailaddress, then you will not be able to store bad data that is larger than that. That is a good thing. Some people want to make everything nvarchar(max) or varchar(max). However, this causes only indexing problems.
Personally I would have gone back to the person who make this requirement and asked for a reason. Then I would have presented my reasons as to why it might not be a good idea. I woul never just blindly implement something like this. In pushing back on a requirement like this, I would first do some research into how SQL Server organizes data on the disk, so I could show the impact of the requirement is likely to have on performance. I might even be surprised to find out the requirement made sense, but I doubt it at this point.

Performance implication on nvarchar(4000)?

I have a column that is declared as nvarchar(4000) and 4000 is SQL Server's limit on the length of nvarchar. it would not be used as a key to sort rows.
Are there any implication that i should be aware of before setting the length of the nvarchar field to 4000?
Update
I'm storing an XML, an XML Serialized Object to be exact. I know this isn't favorable but we will most likely never need to perform queries on it and this implementation dramatically decreases development time for certain features that we plan on extending. I expect the XML data to be 1500 characters long on average but there can be those exceptions where it can be longer than 4000. Could it be longer than 4000 characters? It could be but in very rare occasions, if it ever happens. Is this application mission critical? Nope, not at all.

SQL Server has three types of storage: in-row, LOB and Row-Overflow, see Table and Index Organization. The in-row storage is fastest to access. LOB and Row-Overflow are similar to each other, both slightly slower than in-row.
If you have a column of NVARCHAR(4000) it will be stored in row if possible, if not it will be stored in the row-overflow storage. Having such a column does not necesarily indicate future performance problems, but it begs the question: why nvarchar(4000)? Is your data likely to be always near 4000 characters long? Can it be 4001, how will your applicaiton handle it in this case? Why not nvarchar(max)? Have you measured performance and found that nvarchar(max) is too slow for you?
My recommendation would be to either use a small nvarchar length, appropiate for the real data, or nvarchar(max) if is expected to be large. nvarchar(4000) smells like unjustified and not tested premature optimisation.
Update
For XML, use the XML data type. It has many advantages over varchar or nvarchar, like the fact that it supports XML indexes, it supports XML methods and can actually validate the XML for a compliance to a specific schema or at least for well-formed XML compliance.
XML will be stored in the LOB storage, outside the row.
Even if the data is not XML, I would still recommend LOB storage (nvarchar(max)) for something of a length of 1500. There is a cost associated with retrieving the LOB stored data, but the cost is more than compensated by macking the table narrower. The width of a table row is a primary factor of performance, because wider tables fit less rows per page, so any operation that has to scan a range of rows or the entire table needs to fetch more pages into memory, and this shows up in the query cost (is actualy the driving factor of the overall cost). A LOB stored column only expands the size of the row with the width of a 'page id', which is 8 bytes if I remember correctly, so you can get much better density of rows per page, hence faster queries.

Are you sure that you'll actually need as many as 4000 characters? If 1000 is the practical upper limit, why not set it to that? Conversely, if you're likely to get more than 4000 bytes, you'll want to look at nvarchar(max).
I like to "encourage" users not use storage space too freely. The more space required to store a given row, the less space you can store per page, which potentially results in more disk I/O when the table is read or written to. Even though only as many bytes of data are stored as are necessary (i.e. not the full 4000 per row), whenever you get a bit more than 2000 characters of nvarchar data, you'll only have one row per page, and performance can really suffer.
This of course assumes you need to store unicode (double-byte) data, and that you only have one such column per row. If you don't, drop down to varchar.

Do you certainly need nvarchar or can you go with varchar? The limitation applies mainly to sql server 2k. Are you using 2k5 / 2k8 ?

Should Data types be sizes of powers of 2 in SQL Server?

What are good sizes for data types in SQL Server? When defining columns, i see data types with sizes of 50 as one of the default sizes(eg: nvarchar(50), binary(50)). What is the significance of 50? I'm tempted to use sizes of powers of 2, is that better or just useless?
Update 1
Alright thanks for your input guys. I just wanted to know the best way of defining the size of a datatype for a column.

There is no reason to use powers of 2 for performance etc. Data length should be determined by the size stored data.

Why not the traditional powers of 2, minus 1 such as 255...
Seriously, the length should match what you need and is suitable for your data.
Nothing else: how the client uses it, aligns to 32 bit word boundary, powers of 2, birthdays, Scorpio rising in Uranus, roll of dice...

The reason so many fields have a length of 50 is that SQL Server defaults to 50 as the length for most data types where length is an issue.
As has been said, the length of a field should be appropriate to the data that is being stored there, not least because there is a limit to the length of single record in SQL Server (it's ~8000 bytes). It is possible to blow past that limit.
Also, the length of your fields can be considered part of your documentation. I don't know how many times I've met lazy programmers who claim that they don't need to document because the code is self documenting and then they don't bother doing the things that would make the code self documenting.

You won't gain anything from using powers of 2. Make the fields as long as your business needs really require them to be - let SQL Server handle the rest.
Also, since the SQL Server page size is limited to 8K (of which 8060 bytes are available to user data), making your variable length strings as small as possible (but as long as needed, from a requirements perspective) is a plus.
That 8K limit is a fixed SQL Server system setting which cannot be changed.
Of course, SQL Server these days can handle more than 8K of data in a row, using so called "overflow" pages - but it's less efficient, so trying to stay within 8K is generally a good idea.
Marc

The size of a field should be appropriate for the data you are planning to store there, global defaults are not a good idea.

It's a good idea that the whole row fits into page several times without leaving too much free space.
A row cannot span two pages, an a page has 8096 bytes of free space, so two rows that take 4049 bytes each will occupy two pages.
See docs on how to calculate the space occupied by one row.
Also note that VAR in VARCHAR and VARBINARY stands for "varying", so if you put a 1-byte value into a 50-byte column, it will take but 1 byte.

This totally depends on what you are storing.
If you need x chars use x not some arbitrarily predefined amount.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight