I understand there are multiple questions about this on SO, but I have yet to find a definitive answer of "yes, here's how..."
So here it is again: What are the possible ways to store an unsigned integer value (32-bit value or 32-bit bitmap) into a 4-byte field in SQL Server?
Here are ideas I have seen:
1) Use a -1*2^31 offset for all values
Disadvantages: need to perform math on the values before reading/writing/aggregating.
2) Use 4 tinyint fields
Disadvantages: need to concatenate values to perform any operations
3) Use binary(4)
Disadvantages: actually uses 4 + 2 bytes of space (Edit: varbinary(4) uses 4+2, binary(4) only uses 4)
Need to work in SqlBinary or cast to/from other types
IMO, you have the correct answers to storing 2^32 positive values in 4 bytes: either a standard int and you do the math or a binary(4) which, contrary to what you have said, will only consume 4 bytes of space. (Only varbinary will incur an extra 2 bytes of storage). A series of tinyint or smallint columns would be unjustifiably cumbersome IMO.
Of course there is another solution for storing 2^32 positive values but it takes eight bytes: a bigint with a check constraint. Given how cheap storage and memory is today, IMO, this is the simplest and cheapest solution given the programmatic hoops you will have to jump through with the other solutions, however clearly you have a reason for wanting to save the extra 4 bytes on each row.
Related
I have a table in SQL Server with large amount of data - around 40 million rows. The base structure is like this:
Title
type
length
Null distribution
Customer-Id
number
8
60%
Card-Serial
number
5
70%
-
-
-
-
-
-
-
-
Note
string-unicode
2000
40%
Both numeric columns are filled by numbers with specific length.
I have no idea which data type to choose to have a database in the smallest size and having good performance by indexing the customerId column. Refer to this Post if I choose CHAR(8), database consume 8 bytes per row even in null data.
I decided to use INT to reduce the database size and having good index, but null data will use 4 bytes per rows again. If I want to reduce this size, I can use VARCHAR(8), but I don't know, the system has good performance on setting index on this type or not. The main question is reducing database size is important or having good index on numeric type.
Thanks.
If it is a number - then by all means choose a numeric datatype!! Don't store your numbers as char(n) or varchar(n) !! That'll just cause you immeasurable grief and headaches later on.
The choice is pretty clear:
if you have whole numbers - use TINYINT, SMALLINT, INT or BIGINT - depending on the number range you need
if you need fractional numbers - use DECIMAL(p,s) for the best and most robust behaviour (no rounding errors like FLOAT or REAL)
Picking the most appropriate datatype is much more important than any micro-optimization for storage. Even with 40 million rows - that's still not a big issue, whether you use 4 or 8 bytes. Whether you use a numeric type vs. a string type - that makes a huge difference in usability and handling of your database!
So as I understand it an int in SQL Server is automatically set to a length of 4 digits. A bigint has a default length of 8. It seems these cannot be made any other length, so what do you do when you want a column that will only contain digits and you need it to be a length of 10?
I already tried float and while it will store the 10 digits it does so in scientific notation.
int takes 4 bytes (-2^31 to 2^31 - 1), and bitint takes 8 bytes (-2^64 to 2^64 - 1). They're 32-bit and 64-bit signed integers, respectively.
Please refer to the data type documentation.
Additionally, you should avoid float and real unless you really need them, as they're approximate number types. decimal or numeric are preferred for decimal values.
If you want the equivalent of an "INT(10)", then you should use decimal(10), which will support -9999999999 to 9999999999. Bear in mind that this will use more disk space than a bigint (9 bytes), and may perform differently at very large scales.
You are mixing the concept of a human readable number (the common digits) with its digital representation (bits).
INT which takes 4 Bytes (32 bit) is not at its end at "9999"... There are 4.294.967.295 different values possible with an int...
From other comments I take, that you want to store phone numbers...
Take this as a general rule: Store in numeric fields values, which you want to use in mathematical computations.
Would you ever think that a phone number +2 or a phonenumber divided by 4 does make any sense?
Anyway: Very often phonenumbers are stored with some kind of delimiters.
Put this all together and you come to the conclusion: no DECIMAL(10), no INT, no BIGINT but VARCHAR(50) :-)
Which version of sql server are you using. I am using sql server 2014. There is a datatype decimal in it. It does what you want. If it is available in your sql server try it.
Im currently developing an application that needs to store a 10 to 20 digit value into the database.
My question is, what datatype should i need to be using? This digit is used as an primary key, and therefore the performance of the DB is important for my accplication. In Java i use this digit as and BigDecimal.
Quote from the manual:
numeric: up to 131072 digits before the decimal point; up to 16383 digits after the decimal point
http://www.postgresql.org/docs/current/static/datatype-numeric.html
131072 digits should cover your needs as far as I can tell.
Edit:
To answer the question about efficiency:
The first and most important question is: what kind of data is stored in that column and how do you use it?
If it's a number then use numeric.
If it's not a number use a varchar.
Never, ever store (real) numbers in character columns!
If you need to sort by that column you won't be satifisfied with what you get if you use a character datatype (e.g. 2 will be sorted after 10)
Coming back to the efficiency question. I assume this is mostly space efficiency you are concerned. You can calculate the space requirements for your values yourself.
The storage requirement for the numeric data type is documented as well:
The actual storage requirement is two bytes for each group of four decimal digits, plus five to eight bytes overhead
So for 20 digits this would be a maximum of 10 bytes plus the five to eight bytes overhead. So max. 18 bytes.
To store 20 digits in a varchar column you need 21 bytes.
So from a space "efficiency" point of view numeric is slightly better. But that should never influence your decision, because the choice of datatypes should be driven by the requirements of the column's content.
From a performance point of view I don't think there will be a big difference either.
Try BIGINT instead of NUMERIC.It should work.
http://www.postgresql.org/docs/current/static/datatype-numeric.html
Considering what MSDN states regarding SQL Server 2008 R2 storage of NUMERIC/DECIMAL precision.
Precision of 1 to 9 is 5 bytes
Precision of 10 to 19 is 9 bytes
So if my business case logically requires a data type with 2 decimal places and a precision of 5 digits it makes no actual performance or storage difference if I define it as NUMERIC(5, 2) or NUMERIC(9, 2).
One considering I'm intentionally ignoring is the implied check constraint as I'd most likely put an actual check constraint on the column limiting the actual allowed range.
Does this make a difference when it comes to indexes, query performance or any other aspect of the system?
Numeric(5, 2) allows numbers up to and including 999.99. If you try to insert 1000.0 into that, you'll get an arithmetic overflow.
Numeric(9,2) allows numbers up to and including 9 999 999.99
Bear in mind that if you plan to ever sum this value, allow extra space, otherwise you'll get an overflow or you'll need to do an explicit cast.
They take up the same number of bytes. Since they're the same storage size, they're the same in the data page, in memory, in indexes, in network transmission, etc.
It's for this reason that I usually work out what size number I need to store (if using numeric), then increase precision (and maybe scale) so that I'm just below the point where the storage size increases.
So if I need to store up to 99 million with 4 decimal places, that would be numeric (12,4). For the same storage, I can have a numeric (19,6) and give some safe space for when the business user announces that they really do need to store a couple billion in there.
I love things that are a power of 2. I celebrated my 32nd birthday knowing it was the last time in 32 years I'd be able to claim that my age was a power of 2. I'm obsessed. It's like being some Z-list Batman villain, except without the colourful adventures and a face full of batarangs.
I ensure that all my enum values are powers of 2, if only for future bitwise operations, and I'm reasonably assured that there is some purpose (even if latent) for doing it.
Where I'm less sure, is in how I define the lengths of database fields. Again, I can't help it. Everything ends up being a power of 2.
CREATE TABLE Person
(
PersonID int IDENTITY PRIMARY KEY
,Firstname varchar(64)
,Surname varchar(128)
)
Can any SQL super-boffins who know about the internals of how stuff is stored and retrieved tell me whether there is any benefit to my inexplicable obsession? Is it more efficient to size character fields this way? Can anyone pop in with an "actually, what you're doing works because ....."?
I suspect I'm just getting crazier in my older age, but it'd be nice to know that there is some method to my madness.
Well, if I'm your coworker and I'm reading your code, I don't have to use SVN blame to find out who wrote it. That's kind of cool. :)
The only relevant powers of two are 512 and 4096, which is the default disk block size and memory page size respectively. If your total row-length crosses these boundaries, you might notice un-proportional jumps and dumps in performance if you look very closely. For example, if your row is 513 bytes long, you need to read twice as many blocks for a single row than for a row that is 512 bytes long.
The problem is calculating the proper row size, as the internal storage format is not very well documented.
Also, I do not know whether the SQL Server actually keeps the rows block aligned, so you might be out of luck there anyways.
With varchar, you only stored the number of characters + 2 for length.
Generally, the maximum row size is 8060
CREATE TABLE dbo.bob (c1 char(3000), c2 char(3000), c31 char(3000))
Msg 1701, Level 16, State 1, Line 1
Creating or altering table 'bob' failed because the minimum row size would be 9007, including 7 bytes of internal overhead. This exceeds the maximum allowable table row size of 8060 bytes.
The power of 2 stuff is frankly irrational and that isn't good in a programmer...