SQL Server - Integers being used as Dates - sql-server

I have been using SQL for a little while now - having to use Oracle and SQL Server for my work - however, I have just come across something I have not seen before.
After seeing this and doing a bit of research it is said that, in SQL server, the number 0 can be used as the base date, which is:
1900-01-01
so for example:
select DATEDIFF(yy, 0, '2017-12-31') would return 117, as 1900-01-01 substitutes the 0.
My first question is, why is this, considering this is not the minimum date value in SQL (which is the year 1753 I believe)?
My second question is that I came across another piece of SQL which uses the number -1 instead of 0.
After some testing, I can assume that it is referring to 1899-12-31. but i cannot be sure as I cannot find anything on this number being used as a date anywhere online. Am I correct?
Thank you for your time.

First, there is little reason to use 0 as a date. The primary reason (originally) was to truncate the time component:
select dateadd(day, datediff(day, 0, <datetime>), 0)
Note that this would work with any date value, but 0 tended to be used.
This is now done using select cast(<datetime> as date).
But to answer your question. SQL Server does support negative numbers as dates. The reason for using "-1" is for compatibility with Excel. In Excel the "0" date is "1900-01-00" which is effectively "1899-12-31".
I'm not sure why Microsoft software has two different zero values for dates. (Well, I do . . . Microsoft historically acquired software rather than write it itself.) It is a little confusing, but that is probably why "-1" is being used.

Correct, in SQL Server, the date 0 represents 01 January 1900.
0 = 19000101
1 = 19000102
2 = 19000103
...
43080 = 20171212
Unsurprisingly, therefore, numbers below zero work the same way:
-1 = 18991231
-2 = 18991230
...
-53690 = 17530101
As for why 0 is 1900, this likely dates back to prior to SQL Server, when it was called SyBase. Most likely, back then, 0 represented 19000101, and as SyBase become SQL Server, it was prudent to keep the the same process (as otherwise it could well break people's existing code).

Related

DT_DBTIMESTAMP2 having only 3 digits

I'm having (DT_DBTIMESTAMP2,7)GETDATE() in SSIS Derived Column Transformation and Table column with datetime2(7).
Even though I set 7 Digit Second Scale in both, but seems it comes only 3 digit.
For example, I expected like '2018-05-02 16:45:15.6192346' but it comes '2018-05-02 16:45:15.6190000'.
The reason why I need the millseconds, I'd like to sort out the latest record from any duplications using timestamp. I realized only 3 digit second scale is not enough for this pourpose.
Except for Derived Column Transformation and Table Columns, is there any requrired setting in SSIS package? Any advices would be appreciated.
GETDATE() returns a datetime, you should use SYSDATETIME() instead. See documentation.
edit
As noted by Larnu, you are probably using SSIS expression GETDATE, rather that the sql expression GETDATE as I assumed. The point is more-or-less the same though. GETDATE returns a DT_DBTIMESTAMP, where "The fractional seconds have a maximum scale of 3 digits." (Source).
Although this is almost the same as what HoneyBadger has said, I'm expanding a little, as the OP isn't using the GETDATE() expression in SQL Server. The value 2018-05-02 16:45:15.619 could never be returned by GETDATE() (Transact-SQL) as it's only accurate to 1/300th of a second (thus the final digit can only every be 0,3, and 7 (technically 0, 333333333~ and 666666666~, which is why the final digit is a 7, as it's rounded up)).
In SSIS the GETDATE() expression returns a datatype of DB_TIMESTAMP. According to the Documentation:
A timestamp structure that consists of year, month, day, hour, minute,
second, and fractional seconds. The fractional seconds have a maximum
scale of 3 digits.
Thus, the last 4 characters are lost. Unfortunately, I don't believe there is a function in SSIS that returns the the current date and time to the accuracy you require. Thus, if you need this high level, you'll likely need to use an expression in SQL Server that does, such as SYSDATETIME() that HoneyBadger recommended.

Difference in performance in SQL

I have a date column on SQL Server table called dd.
dd
---------------------------
10-01-2015 00:00:00.000
22-05-2015 10:22:32.521
27-05-2015 12:30:48.310
24-12-2014 09:51:11.728
27-05-2015 02:05:40.775
....
I need to retrieve all rows where dd value is from the last 24 hours.
I found 3 options for filtering to get the result needed:
1. `dd >= getdate() - 1`
2. `dd >= dateadd(day, -1, getdate())
3. `dateadd(day, 1, dd) >= getdate()
My questions are:
Are all the 3 options will retrieve all rows I need?
If so what is the difference between them?
dd >= getdate() - 1
This is something like a hack, but it works, but sometimes it can lead to errors(http://www.devx.com/dbzone/Article/34594/0/page/2).
dd >= dateadd(day, -1, getdate())
This is standard way of doing things.
dateadd(day, 1, dd) >= getdate()
This will also work but there is one NO. It will not use index if any index is created on that column. Because it is not a Search Argument(What makes a SQL statement sargable?). When you apply an expression to some column it becomes non SARG and will not use any index.
All 3 version will produce same result, but first is hack and in some cases will lead to bug. Third will not use index. So it is obvious that one should stick on option 2.
First two are exactly like Giorgi said, but on the third one your Index Seek will become Index Scan. SQL Server will still use that index but it is no longer able to jump to specific record but instead it has to scan it to find what it need.
For the purpose of demonstration I selected the table that had DATETIME column indexed and only selected that column to avoid any key lookups, and to keep plan simple.
Also take a look at reads on the table and estimated vs returned row count. As soon as you wrapped the column in a function it is not able to estimate correct number of rows which will cause large performance issues when queries become more complex.

SQL date values converted to integers

Ok, I can't understand this thing.
A customer of mine has a legacy Windows application (to produce invoices) which stores date values as integers.
The problem is that what is represented like '01.01.2002' (value type: date) is indeed stored in SQL Server 2000 as 731217 (column type: integer).
Is it an already known methodology to convert date values into integers (for example - I don't know - in order to make date difference calculations easier?)
By the way, I have to migrate those data into a new application, but for as much I googled about it I can't figure out the algorithm used to apply such conversion.
Can anybody bring some light?
It looks like the number of days since Jan 1st 0000 (although that year doesn't really exists).
Anyway, take a date as a reference like Jan 1st 2000 and look what integer you have for that date (something like 730121).
You then take the difference between the integer you have for a particular date and the one for your reference date and you that number of days to your reference date with the DATEADD function.
DATEADD(day, *difference (eg 731217 - 730121)*, *reference date in proper SQLServer format*)
You can adjust if you're off by a day a two.

Are these two pieces of SQL code the same?

I'm working with a bit of an older code base where conciseness is not found in many places.
One piece of code we are constantly using in the database is to determine if two dates are within the same program year. For instance, Program Year 2011 begins on July 1st, 2011 and ends on July 1st, 2012(or technically the day before)
The usual way I see that this problem is solved is by using this kind of code:
if Month(#EnrollmentDate)>=7 begin
set #StartDate='07/01/'+LTRIM(RTRIM(Year(#EnrollmentDate)))
set #EndDate='07/01/'+LTRIM(RTRIM(Year(#EnrollmentDate)+1))
end else begin
set #StartDate='07/01/'+LTRIM(RTRIM(Year(#EnrollmentDate)-1))
set #EndDate='07/01/'+LTRIM(RTRIM(Year(#EnrollmentDate)))
end
...
where (ENROLLMENTDATE >= #StartDate and ENROLLMENTDATE < #EndDate)
I recently happened to have to solve this problem and the instant thing that popped in my head was something much more concise:
where year(dateadd(mm,-6,ENROLLMENTDATE)) = year(dateadd(mm,-6,#EnrollmentDate))'
Before I go introducing new bugs into a system that "just works", I'd like to ask SO about this. Are these two pieces of code exactly the same? Will they always give the same output(assuming valid dates)?
the problem I see is that, depending on the optimizer, your solution (that looks better) may not use an index defined on ENROLLMENTDATE since you're doing operations on it, while the original solution would. If there are no indexes on that field, then I don't see an issue
An even easier way to handle this situation is to create a calendar table with a column for the program year. Then there is no logic at all, you just query the values and compare them:
if
(select ProgramYear from dbo.Calendar where BaseDate = #StartDate) =
(select ProgramYear from dbo.Calendar where BaseDate = #EndDate)
begin
-- do something
end
There are many posts on this site about creating calendar tables and using them for many different purposes. In my experience, using a table in this way is always clearer and more maintainable than creating formulas in code.
As it happens, December has 31 days, so no matter how many months you need to subtract, you will always be able to align the resulting date range with a whole year, and therefore your expression will always be true whenever the original one was, even in the general case for an enrolment year starting on other dates.
I once used a dialect of SQL with a range of date manipulation functions that made this slightly easier than string twiddling, something like this:
WHERE enrolmentdate >= #YearBeg(:enrolmentdate + 6 MONTHS) - 6 MONTHS
AND enrolmentdate < #YearBeg(:enrolmentdate + 6 MONTHS) + 6 MONTHS

SQL DataType - How to store a year?

I need to insert a year(eg:1988 ,1990 etc) in a database. When I used Date or Datetime
data type, it is showing errors. Which datatype should I use.
regular 4 byte INT is way too big, is a waste of space!
You don't say what database you're using, so I can't recommend a specific datatype. Everyone is saying "use integer", but most databases store integers as 4 bytes, which is way more than you need. You should use a two byte integer (smallint on SQL Server), which will conserve space.
If you need to store a year in the database, you would either want to use an Integer datatype (if you are dead set on only storing the year) or a DateTime datatype (which would involve storing a date that basically is 1/1/1990 00:00:00 in format).
Hey,you can Use year() datatype in MySQL
It is available in two-digit or four-digit format.
Note: Values allowed in four-digit format: 1901 to 2155. Values allowed in two-digit format: 70 to 69, representing years from 1970 to 2069
Storing a "Year" in MSSQL would ideally depend on what you are doing with it and what the meaning of that "year" would be to your application and database. That being said there are a few things to state here. There is no "DataType" for Year as of 2012 in MSSQL. I would lean toward using SMALLINT as it is only 2 bytes (saving you 2 of the 4 bytes that INT demands). Your limitation is that you can not have a year older than 32767 (as of SQL Server 2008R2). I really do not think SQL will be the database of choice ten thousand years from now let alone 32767. You may consider INT as the Year() function in MSSQL does convert the data type "DATE" to an INT. Like I said, it depends on where you are getting the data and where it is going, but SMALLINT should be just fine. INT would be overkill ... unless you have other reasons like the one I mentioned above or if the code requirements need it in INT form (e.g. integrating with existing application). Most likely SMALLINT should be just fine.
Just a year, nothing else ?
Why not use a simple integer ?
Use integer if all you need to store is the year. You can also use datetime if you think there will be date based calculations while querying this column
Storage may be only part of the issue. How will this value be used in a query?
Is it going to be compared with another date-time data types, or will all the associated rows also have numeric values?
How would you deal with a change to the requirements? How easily could you react to a request to replace the year with a smaller time slice? i.e. Now they want it broken down by quarters?
A numeric type can be easily used in a date time query by having a look-up table to join with containing things like the start and stop dates (1/1/X to 12/31/x), etc..
I don't think using an integer or any subtype of integer is a good choice. Sooner or later you will have to do other date like operations on it. Also in 2019 let's not worry too much about space. See what those saved 2 bytes costed us in 2000.
I suggest use a date of year + 0101 converted to a true date. Similarly if you need to store a month of a year store year + month + 01 as a true date.
If you have done that you will be able to properly do "date stuff" on it later on

Resources