Considering what MSDN states regarding SQL Server 2008 R2 storage of NUMERIC/DECIMAL precision.
Precision of 1 to 9 is 5 bytes
Precision of 10 to 19 is 9 bytes
So if my business case logically requires a data type with 2 decimal places and a precision of 5 digits it makes no actual performance or storage difference if I define it as NUMERIC(5, 2) or NUMERIC(9, 2).
One considering I'm intentionally ignoring is the implied check constraint as I'd most likely put an actual check constraint on the column limiting the actual allowed range.
Does this make a difference when it comes to indexes, query performance or any other aspect of the system?
Numeric(5, 2) allows numbers up to and including 999.99. If you try to insert 1000.0 into that, you'll get an arithmetic overflow.
Numeric(9,2) allows numbers up to and including 9 999 999.99
Bear in mind that if you plan to ever sum this value, allow extra space, otherwise you'll get an overflow or you'll need to do an explicit cast.
They take up the same number of bytes. Since they're the same storage size, they're the same in the data page, in memory, in indexes, in network transmission, etc.
It's for this reason that I usually work out what size number I need to store (if using numeric), then increase precision (and maybe scale) so that I'm just below the point where the storage size increases.
So if I need to store up to 99 million with 4 decimal places, that would be numeric (12,4). For the same storage, I can have a numeric (19,6) and give some safe space for when the business user announces that they really do need to store a couple billion in there.
Related
I have a table in SQL Server with large amount of data - around 40 million rows. The base structure is like this:
Title
type
length
Null distribution
Customer-Id
number
8
60%
Card-Serial
number
5
70%
-
-
-
-
-
-
-
-
Note
string-unicode
2000
40%
Both numeric columns are filled by numbers with specific length.
I have no idea which data type to choose to have a database in the smallest size and having good performance by indexing the customerId column. Refer to this Post if I choose CHAR(8), database consume 8 bytes per row even in null data.
I decided to use INT to reduce the database size and having good index, but null data will use 4 bytes per rows again. If I want to reduce this size, I can use VARCHAR(8), but I don't know, the system has good performance on setting index on this type or not. The main question is reducing database size is important or having good index on numeric type.
Thanks.
If it is a number - then by all means choose a numeric datatype!! Don't store your numbers as char(n) or varchar(n) !! That'll just cause you immeasurable grief and headaches later on.
The choice is pretty clear:
if you have whole numbers - use TINYINT, SMALLINT, INT or BIGINT - depending on the number range you need
if you need fractional numbers - use DECIMAL(p,s) for the best and most robust behaviour (no rounding errors like FLOAT or REAL)
Picking the most appropriate datatype is much more important than any micro-optimization for storage. Even with 40 million rows - that's still not a big issue, whether you use 4 or 8 bytes. Whether you use a numeric type vs. a string type - that makes a huge difference in usability and handling of your database!
I am creating a database in SQL Server hosted on AWS RDS (I want to use the 2016 or 2017 version).
My question is about the creation of IDs for the tables that I suspect will have many of rows (that's true that probably my database will not have a lot of rows, but I want to learn how to do correctly my work). I do not know what data type is better to choose.
The structure of my id is going to be:
[Two items for schema][Three items for tables][Five items for rows]
Example that I have been considering:
Data type and length | Example | Storage (bytes)
---------------------+------------+---------------
VARCHAR(10) | S1TA100001 | 10 bytes
NUMBER(10,0) | 0100100001 | 9 bytes
BIGINT | 100100001 | 8 bytes
If I use VARCHAR, I have more range (0-9 and A-Z -38-) so maybe I can reduce de ID ([Schema one][Table-one][Row-(tree|four)] that is 5 or 6 bytes that's suppose 54.872 or 2.085.136 rows. But I guess that translates into computing cost.
If I use NUMBER I only have the range (0-9) that is supposed 100.000 rows.
If I use BIGINT which has a range of -2^63 (-9,223,372,036,854,775,808) to 2^63-1 (9,223,372,036,854,775,807). I only use a few of them, but I do not if the computing cost is relevant.
Thank you very much!
My recommendation would be clearly using a numerical datatype - preferably INT or BIGINT - for a database ID.
These types are small, fast, nimble - and they don't have any trouble with lower-/UPPER-case, regional/language settings, Unicode or non-UNicode and many more things that a string-based ID would have. Spare yourself this trouble, if you can!
Whether you need INT or BIGINT depends on how many rows you expect - 2 billion vs. 9 quintillion :-)
With a type INT, starting at 1, you get over 2 billion possible rows - that should be sufficient for the majority of cases.
If you use an INT (as an IDENTITY in SQL Server) starting at 1, and you insert a row every second, you need 66.5 years before you hit the 2 billion limit ...
If you use a BIGINT starting at 1, and you insert one thousand rows per second, you need a mind-boggling 292 million years before you hit the 9.22 quintillion limit ...
If you don't have a natural key, then you should only consider any of the integer data types for your key.
So for large tables, you have to ask yourself if you will exceed 2 billion (or 4 billion if you start at -2147483648). If so, then you should go for bigint, otherwise int will be enough.
From a performance point of view, varchar is much worse due to the more expensive comparison (due to case and accent insensitivity)
decimal is just marginally slower than bigint. Since decimal requires more space, you never need it for generated key.
So as I understand it an int in SQL Server is automatically set to a length of 4 digits. A bigint has a default length of 8. It seems these cannot be made any other length, so what do you do when you want a column that will only contain digits and you need it to be a length of 10?
I already tried float and while it will store the 10 digits it does so in scientific notation.
int takes 4 bytes (-2^31 to 2^31 - 1), and bitint takes 8 bytes (-2^64 to 2^64 - 1). They're 32-bit and 64-bit signed integers, respectively.
Please refer to the data type documentation.
Additionally, you should avoid float and real unless you really need them, as they're approximate number types. decimal or numeric are preferred for decimal values.
If you want the equivalent of an "INT(10)", then you should use decimal(10), which will support -9999999999 to 9999999999. Bear in mind that this will use more disk space than a bigint (9 bytes), and may perform differently at very large scales.
You are mixing the concept of a human readable number (the common digits) with its digital representation (bits).
INT which takes 4 Bytes (32 bit) is not at its end at "9999"... There are 4.294.967.295 different values possible with an int...
From other comments I take, that you want to store phone numbers...
Take this as a general rule: Store in numeric fields values, which you want to use in mathematical computations.
Would you ever think that a phone number +2 or a phonenumber divided by 4 does make any sense?
Anyway: Very often phonenumbers are stored with some kind of delimiters.
Put this all together and you come to the conclusion: no DECIMAL(10), no INT, no BIGINT but VARCHAR(50) :-)
Which version of sql server are you using. I am using sql server 2014. There is a datatype decimal in it. It does what you want. If it is available in your sql server try it.
Im currently developing an application that needs to store a 10 to 20 digit value into the database.
My question is, what datatype should i need to be using? This digit is used as an primary key, and therefore the performance of the DB is important for my accplication. In Java i use this digit as and BigDecimal.
Quote from the manual:
numeric: up to 131072 digits before the decimal point; up to 16383 digits after the decimal point
http://www.postgresql.org/docs/current/static/datatype-numeric.html
131072 digits should cover your needs as far as I can tell.
Edit:
To answer the question about efficiency:
The first and most important question is: what kind of data is stored in that column and how do you use it?
If it's a number then use numeric.
If it's not a number use a varchar.
Never, ever store (real) numbers in character columns!
If you need to sort by that column you won't be satifisfied with what you get if you use a character datatype (e.g. 2 will be sorted after 10)
Coming back to the efficiency question. I assume this is mostly space efficiency you are concerned. You can calculate the space requirements for your values yourself.
The storage requirement for the numeric data type is documented as well:
The actual storage requirement is two bytes for each group of four decimal digits, plus five to eight bytes overhead
So for 20 digits this would be a maximum of 10 bytes plus the five to eight bytes overhead. So max. 18 bytes.
To store 20 digits in a varchar column you need 21 bytes.
So from a space "efficiency" point of view numeric is slightly better. But that should never influence your decision, because the choice of datatypes should be driven by the requirements of the column's content.
From a performance point of view I don't think there will be a big difference either.
Try BIGINT instead of NUMERIC.It should work.
http://www.postgresql.org/docs/current/static/datatype-numeric.html
I understand there are multiple questions about this on SO, but I have yet to find a definitive answer of "yes, here's how..."
So here it is again: What are the possible ways to store an unsigned integer value (32-bit value or 32-bit bitmap) into a 4-byte field in SQL Server?
Here are ideas I have seen:
1) Use a -1*2^31 offset for all values
Disadvantages: need to perform math on the values before reading/writing/aggregating.
2) Use 4 tinyint fields
Disadvantages: need to concatenate values to perform any operations
3) Use binary(4)
Disadvantages: actually uses 4 + 2 bytes of space (Edit: varbinary(4) uses 4+2, binary(4) only uses 4)
Need to work in SqlBinary or cast to/from other types
IMO, you have the correct answers to storing 2^32 positive values in 4 bytes: either a standard int and you do the math or a binary(4) which, contrary to what you have said, will only consume 4 bytes of space. (Only varbinary will incur an extra 2 bytes of storage). A series of tinyint or smallint columns would be unjustifiably cumbersome IMO.
Of course there is another solution for storing 2^32 positive values but it takes eight bytes: a bigint with a check constraint. Given how cheap storage and memory is today, IMO, this is the simplest and cheapest solution given the programmatic hoops you will have to jump through with the other solutions, however clearly you have a reason for wanting to save the extra 4 bytes on each row.