I'm really confused about something named Accuracy in SQL Server.
I found some information about date and time datatypes in SQL server but I observed something named accuracy.
So please can someone help me and explain with a simple way to understand what is really accuracy means?
Massive thanks in advance.
It comes down to how many bits you have and perhaps how many you decide to use. 1-bit has 2 possible values. 2 bits has 4. 3 has 8. In general, given n bits of storage, there are 2^n possible distinct values. A byte with 8 bits can have 256 values.
For a positive integer stored in a byte, the range is 0 to 255 because zero counts as one possible value. Including negative numbers does not change the number of values. For example, a signed byte can have a range of -127 to 128. (2's compliment is convenient for hardware.)
Things are "exact" for integers with the restriction on range. For something like time or distance, we use real numbers. A property of a real numbers is that there there are an infinite number of real numbers between any two real numbers. Any if we pick a range, there are not enough bytes to represent all the real numbers between. We approximate the number by assigning some bits to an exponent and some to a mantissa.
So long story short, we are assigning a fixed number of bits to represent an infinite number of time values over a given range.
For a smalldatetime, the 1st 2 bytes are for the day and the 2nd 2 bytes are for the time. For a datetime, each are 4 bytes. The smalldatetime day part of 2 bytes allows 2^16 values or 65536 days. 65536 * (1 year/265 days) = 179 does match the year range of the smalldatetime. For the datetime day part 2^32 / 365 = 11767 years is more than the datetime range 1753 to 9999 used. The storage is there, but engineers don't have to use it.
Now for the time part. The datetime can be converted to a float. The integer part will be the day and the fraction the fraction of the day. (This does not work for datetime2.)
If we used every bit of the time part, then 2 bytes allows for 65,536 values and 4 bytes allows for 4,294,967,296 values for a single day. These are the best possible precisions for each.
24 hr * (3600 sec/ 1 hr) * (1000 ms / 1 sec) / 4,294,967,296 = .02 milliseconds
24 hr * (3600 sec/ 1 hr) / 1 sec) / 65,536 = 1.3 seconds
These are not the precisions that Microsoft engineers decided to use; however, they are the best that can be stored in this number of bytes. The choice of a 3ms precision was likely related to hardware and API restrictions at the time. (A slow HW clock might have been the only standard.) The 1 minute precision was likely a round up. Can't have a second, so let's round it to minutes.
The end result is that if you measure the time 1 million times between now and a millisecond from now and store it in a datetime, you will see only 1 or 2 distinct values stored at most. If you store it as a smalldatetime, you see at most 1 or 2 distinct values. If you see 2 values, it's because you crossed into the interval of the next value.
Clear as mud?
Related
RFC4122 what does phrase uniqueness across space and time mean and please explain.
It basically means that for all realistic purposes, that ID is statistically guaranteed to be unique
While it is not technically impossible for the UUID to be duplicated again, the formula for calculating the probability of that looks like this. n is equal to the number of digits in the UUID and r is the number of UUIDs you want. Try plugging this into a calculator.
Well, the calculator broke if you tried that as the computation is massive. And the value of that for just 32 character UUIDS would be very close to 1 for any number less than 10^16 number of numbers. That is 10 quadrillion.
You are not going to run out of UUIDs if you are Facebook. You are not going to run out of IDs if you are the US government. Both store a massive amount of data (space) and have been generating data for a long time (time).
"across space and time" describes how unlikely it is for two UUIDs to be the same.
128 bits of entropy is quite large and a collision would be like flipping a coin 128 heads in a row twice or rolling a 6 on a 6 sided die 50 times twice. To determine the bits of entropy here, you take the log_2 of the number of options:
128 UUID bits / log_2(2) => 128 / 1 => 128 coin flips
128 UUID bits / log_2(6) => 128 / 2.6 ~=> 50 dice rolls
This duplication is called a UUID collision, and it is possible; however, the chance is extremely small and not worth worrying about.
In my table I would like to define an amount field. This field is usually a number with only 2 decimal places
e.g.
Amount can be up to 999,999,999.99 or 999,999,999
This field is multiplied by a rate.
The rate field can be be 1.11 or 0.1234 (2 decimal or 4 decimal places)
Then another field will hold the total amount that rounds the result up to 2 decimal places. If the total amount is 2.516, save result as 2.51. If total amount is 2.556, save result as 2.56
What data types should I use for the following fields?
1. Amount
2. Rate
3. Total Amount
I have seen examples defined as FLOAT, decimal, money.
I would always choose DECIMAL when given those options, especially when dealing with currency. Reasons:
DECIMAL can be more flexible than MONEY; the latter is fixed to 4 decimal places; usually you want less, but sometimes more (penny trading, converting euros to yen, etc).
FLOAT has a much wider range of approximation, which can lead to funny results - I don't get that result for that specific calculation on current versions, BTW, but there are other examples I'm sure, probably in this thread.
MONEY can lose important precision. Try this:
SELECT [money] = $0.46 / $345.70,
[decimal] = 0.46 / 345.70;
Also see this thread (look beyond the accepted answer).
More information in the following posts:
Performance / Storage Comparisons : MONEY vs. DECIMAL (sorry for missing images)
Bad habits to kick : choosing the wrong data type
Most systems I have worked with, both existing and new use Decimal with 4 decimal places.
I am working on a simulation of poker and now I have to rank hands effectively:
Every hand is a combination of 5 cards and is represented as an uint64_t.
Every bit from 0 (Ace of Spades), 1 (Ace of Hearts) to 51 (Two of Clubs) indicates if the corresponding card is part (bit == 1) or isn't part (bit == 0) of the hand. The bits from 52 to 63 are always set to zero and don't hold any information.
I already know how I theoretically could generate a table, so that every valid hand can be mapped to rang (represented as uint16_t) between 1 (2,3,4,5,7 - not in the same color) and 7462 (Royal Flush) and all the others to the rang zero.
So a naive lookup table (with the integer value of the card as index) would have an enormous size of
2 bytes * 2^52 >= 9.007 PB.
Most of this memory would be filled with zeros, because almost all uint64_t's from 0 to 2^52-1 are invalid hands and therefor have a rang equal to zero.
The valuable data occupies only
2 bytes * 52!/(47!*5!) = 5.198 MB.
What method can I use for the mapping so that I only have to save the ranks from the valid cards and some overhead (max. 100 MB's) and still don't have to do some expensive search...
It should be as fast as possible!
If you have any other ideas, you're welcome! ;)
You need only a table of 13^5*2, with the extra level of information dictating if all the cards are of the same suit. If for some reason 'heart' outranks 'diamond', you need still at most a table with size of 13^6, as the last piece of information encodes as '0 = no pattern, 1 = all spades, 2 = all hearts, etc.'.
A hash table is probably also a good and fast approach -- Creating a table from nCk(52,5) combinations doesn't take much time (compared to all possible hands). One would, however, need to store 65 bits of information for each entry to store both the key (52 bits) and the rank (13 bits).
Speeding out evaluation of the hand, one first rules out illegal combinations from the mask:
if (popcount(mask) != 5); afterwards once can use enough bits from e.g. crc32(mask), which has instruction level support in i7-architecture at least.
If I understand your scheme correctly, you only need to know that the hamming weight of a particular hand is exactly 5 for it to be a valid hand. See Calculating Hamming Weight in O(1) for information on how to calculate the hamming weight.
From there, it seems you could probably work out the rest on your own. Personally, I'd want to store the result in some persistent memory (if it's available on your platform of choice) so that subsequent runs are quicker since they don't need to generate the index table.
This is a good source
Cactus Kev's
For a hand you can take advantage of at most 4 of any suit
4 bits for the rank (0-12) and 2 bits for the suit
6 bits * 5 cards is just 30 bit
Call it 4 bytes
There are only 2598960 hands
Total size a little under 10 mb
A simple implementation that comes to mind would be to change your scheme to a 5-digit number in base 52. The resulting table to hold all of these values would still be larger than necessary, but very simple to implement and it would easily fit into RAM on modern computers.
edit: You could also cut down even more by only storing the rank of each card and an additional flag (e.g., lowest bit) to specify if all cards are of the same suit (i.e., flush is possible). This would then be in base 13 + one bit for the ranking representation. You would presumably then need to store the specific suits of the cards separately to reconstruct the exact hand for display and such.
I would represent your hand in a different way:
There are only 4 suits = 2bits and only 13 cards = 4 bits for a total of 6 bits * 5 = 30 - so we fit into a 32bit int - we can also force this to always be sorted as per your ordering
[suit 0][suit 1][suit 2][suit 3][suit 4][value 0][value 1][value 2][value 3][value 4]
Then I would use a separate hash for:
consectutive values (very small) [mask off the suits]
1 or more multiples (pair, 2 pair, full house) [mask off the suits]
suits that are all the same (very small) [mask off the values]
Then use the 3 hashes to calculate your rankings
At 5MB you will likely have enough caching issues that will make a bit of math and three small lookups faster
I was browsing msdn in regards to data type sizes in T-SQL and noticed something I'm a bit confused on.
According to http://msdn.microsoft.com/en-us/library/ms186724.aspx, datetime uses 8 bytes and stores a date from years 1753-9999 with a time precision of hh:mm:ss[.nnn]. Now if you look at date and time separately, time uses 3-5 bytes to store hh:mm:ss[.nnnnnnn] and date uses 3 bytes to store from years 0 - 9999.
What confuses me is that using date and time separately gives you a wider range of years and time with four more digits of precision than datetime, yet they both use 8 bytes? Why does datetime have a smaller range and less precision yet uses the same size to store itself?
The Datetime data type precedes the separate date and time data types. The Datetime data type uses 8 bytes, as two integers. The first integer stores 01/01/1900 as 0 any days before 1900 are stored as a negative number of days before and any date after is stored as a positive integer denoting the number of days after 01/01/1900.
The reason the date only starts at 1753 is that this is the start of the Gregorian calendar and any days before this you need to know the country to determine the date. This was the decision made by the original Sybase developers, from which Sql Server is descended.
The time integer portion stores the number of ticks since midnight. A tick is 1/300 second.
Good examples and info can be found here;
http://blogs.lessthandot.com/index.php/DataMgmt/DataDesign/how-are-dates-stored-in-sql-server
and http://karaszi.com/the-ultimate-guide-to-the-datetime-datatypes
Im currently developing an application that needs to store a 10 to 20 digit value into the database.
My question is, what datatype should i need to be using? This digit is used as an primary key, and therefore the performance of the DB is important for my accplication. In Java i use this digit as and BigDecimal.
Quote from the manual:
numeric: up to 131072 digits before the decimal point; up to 16383 digits after the decimal point
http://www.postgresql.org/docs/current/static/datatype-numeric.html
131072 digits should cover your needs as far as I can tell.
Edit:
To answer the question about efficiency:
The first and most important question is: what kind of data is stored in that column and how do you use it?
If it's a number then use numeric.
If it's not a number use a varchar.
Never, ever store (real) numbers in character columns!
If you need to sort by that column you won't be satifisfied with what you get if you use a character datatype (e.g. 2 will be sorted after 10)
Coming back to the efficiency question. I assume this is mostly space efficiency you are concerned. You can calculate the space requirements for your values yourself.
The storage requirement for the numeric data type is documented as well:
The actual storage requirement is two bytes for each group of four decimal digits, plus five to eight bytes overhead
So for 20 digits this would be a maximum of 10 bytes plus the five to eight bytes overhead. So max. 18 bytes.
To store 20 digits in a varchar column you need 21 bytes.
So from a space "efficiency" point of view numeric is slightly better. But that should never influence your decision, because the choice of datatypes should be driven by the requirements of the column's content.
From a performance point of view I don't think there will be a big difference either.
Try BIGINT instead of NUMERIC.It should work.
http://www.postgresql.org/docs/current/static/datatype-numeric.html