Database: Storing Dates as Numeric Values

Database: Storing Dates as Numeric Values - database

I'm considering storing some date values as ints.
i.e 201003150900
Excepting the fact that I lose any timezone information, is there anything else I should be concerned about with his solution?
Any queries using this column would be simple 'where after or before' type lookups.
i.e Where datefield is less than 201103000000 (before March next year).
currently the app is using MSSQL2005.
Any pointers to pitfalls appreciated.

Using a proper datetime datatype will give you more efficient storage (smalldatetime consumes 4 bytes) and indexing, and will give you semantics that will be easier to develop against. You'd have to come up with a compelling argument to not use them.

Why wouldn't you use proper UNIX timestamps? They're just ints too, but they're not nearly as wide as 201103000000.

Just use the DATETIME or SMALLDATETIME datatypes they are more flexible.

The only reason to do it the way you suggest is so that you have a time dimension member name for a business intelligence tool. If that is what you intend to use it for, then it makes sense.
Otherwise, use the built-in time types as others have pointed out.

Related

storing non gregorian datetimes in database for performance

i want to store non gregorian datetime values in my database (postgresql or sql server)
i have two ways to do this.
1- storing standard datetime in database and then convert it to my sightly date system in my application.
2- storing datetime as varchar in two different fields (a date field and a time field) as YYYY-MM-DD and HH:MM:SS format in sightly date system
which way is better for improving performance regarding that thousands or millions of rows may exists in tables and sometimes i need to order rows.

Storing dates as strings will generally be very inefficient, both in storage and in processing. In Postgres, you have the possibility of defining your own type, and overloading the existing date functions and operators, but this is likely to be a lot of work (unless you find that someone did it already).
A quick search turned up this old mailing list thread, where one suggestion is to build input and output functions around the existing date types. This would let you make use of some existing functions (for instance, I'm guessing that intervals such as '1 day' and '1 year' have the same meaning; forgive my ignorance if not).
Another option would be to use integers or floats for storage, e.g. a Unix timestamp is a number of seconds since a fixed time, so has no built-in assumption about calendars. Unlike a string representation, however, it can be efficiently stored and indexed, and has useful operations defined such as sorting and addition. Internally, all dates will be stored using some variant of this approach; a custom type would simply keep these details more conveniently hidden.

T-SQL implicit vs. explicit casting for DATE datatype

When taking a DATETIME value and changing it to a DATE value in T-SQL, should the CAST be performed explicitly or not? For example:
DECLARE #Today_NoCast DATE = GETDATE();
DECLARE #Today_Cast DATE = CAST(GETDATE() AS DATE);
SELECT [GETDATE()] = GETDATE();
SELECT
[#Today_NoCast] = #Today_NoCast,
[#Today_Cast] = #Today_Cast;
Both #Today_NoCast and #Today_Cast give the desired result. But what is considered "best practice"?

As far as SQL Server is concerned there is no difference in this case - both the implicit (no CAST) and explicit conversion will produce the same result.
However, from a code maintenance point of view there is a slight difference. The implicit conversion is just another line of code, probably (based on my experience, no reflection on you!) without any comments or documentation or anything else. The explicit conversion in the same place also says "the person who wrote this code knows that GETDATE returns a DATETIME but they don't want the TIME part so are deliberately just saving it into a DATE".
It's not making one line more maintainable that counts, it's taking the approach to all of your work that makes a difference in the long run.
Rhys

when the datatype in Date. You don't have to do a explicit conversion.
so avoid using the explicit conversion

The example you gave is what is called an "implicit conversion". In otherwords SQL Server guesses the best way to convert it. These conversions are on a pretty diagram on MSDN. As SQL Server will do it implicitly, there is no need for you to do it explicitly. You are just adding noise.
Personally i like to do the minimum amount of date manipulation possible. Otherwise what you tend to find is people casting them all over the place causing code smells.
If you need the time portion and are consequently storing the time, you can format-away the time in the UI if required.
If you find you need queries that are creating WHERE clauses on your date with a time, consider separating the date and the time into separate fields to prevent SEARCHARGs i.e. developers wrapping functions around your indexed columns in the where clause which prevents the optimizer from using the index, therefore reducing performance.

How can I store data in a vertical table without losing precision?

I would like to have a vertical table (I know, I know... it's pretty much unavoidable) that is able to store an identifying field, the value, and the original type. Obviously the value field needs to be something generic or I won't be able to store various different types of data in it (varchar, text, int, decimal, bit, etc).
What suggestions do you have for this type of setup that would allow me to not lose precision on number data types while offering flexibility and ease of use?

Your best bet is definitely a VARCHAR column, given that there are pretty standard mechanisms to convert most types to/from a String.
Please bear in mind that every time someone makes a table like this, a kitten dies. Seriously, find a better way.

You're going to be storing the data vertically, but for the most part, aren't you going to end up taking certain types of values and breaking them out horizontally anyway? I've seen systems where they have generic type columns of : varchar, int, float, date.
It will take some more coding to determine which column to use for what field; otherwise, you'll just have to deal with converting to another format.

How to Handle Unknown Data Type in one Table

I have a situation where I need to store a general piece of data (could be an int, float, or string) in my database, but I don't know ahead of time which it will be. I need a table (or less preferably tables) to store this unknown typed data.
What I think I am going to do is have a column for each data type, only use one for each record and leave the others NULL. This requires some logic above the database, but this is not too much of a problem because I will be representing these records in models anyway.
Basically, is there a best practice way to do something like this? I have not come up with anything that is less of a hack than this, but it seems like this is a somewhat common problem. Thanks in advance.
EDIT: Also, is this considered 3NF?

You could easily do that if you used SQLite as a database backend :
Any column in a version 3 database, except an INTEGER PRIMARY KEY column, may be used to store any type of value.
For other RDBMS systems, I would go with Philip's solution.
Note that in my line of software (business applications), I cannot think of any situation where this kind of requirement would be needed (a value with an unknown datatype). Unless the domain model was flawed, of course... I can imagine that other lines of software may incur different practices, but I suggest that you consider rethinking your overall design.

If your application can reliably convert datatypes, you might consider a single column solution based on a variable-length binary column, with a second column to track original data type. (I did a very small routine based on this once before, and it worked well enough.) Testing would show if conversion is more efficiently handled on the application or database side.

If I were to do this I would choose either your method, or I would cast everything to string and use only one column. Of course there would be another column with the type (which would probably be useful for the first method too).
For faster code I would probably go with your method.

What datatype should be used for storing phone numbers in SQL Server 2005?

I need to store phone numbers in a table. Please suggest which datatype should I use?
Wait. Please read on before you hit reply..
This field needs to be indexed heavily as Sales Reps can use this field for searching (including wild character search).
As of now, we are expecting phone numbers to come in a number of formats (from an XML file). Do I have to write a parser to convert to a uniform format? There could be millions of data (with duplicates) and I dont want to tie up the server resources (in activities like preprocessing too much) every time some source data comes through..
Any suggestions are welcome..
Update: I have no control over source data. Just that the structure of xml file is standard. Would like to keep the xml parsing to a minimum.
Once it is in database, retrieval should be quick. One crazy suggestion going on around here is that it should even work with Ajax AutoComplete feature (so Sales Reps can see the matching ones immediately). OMG!!

Does this include:
International numbers?
Extensions?
Other information besides the actual number (like "ask for bobby")?
If all of these are no, I would use a 10 char field and strip out all non-numeric data. If the first is a yes and the other two are no, I'd use two varchar(50) fields, one for the original input and one with all non-numeric data striped and used for indexing. If 2 or 3 are yes, I think I'd do two fields and some kind of crazy parser to determine what is extension or other data and deal with it appropriately. Of course you could avoid the 2nd column by doing something with the index where it strips out the extra characters when creating the index, but I'd just make a second column and probably do the stripping of characters with a trigger.
Update: to address the AJAX issue, it may not be as bad as you think. If this is realistically the main way anything is done to the table, store only the digits in a secondary column as I said, and then make the index for that column the clustered one.

We use varchar(15) and certainly index on that field.
The reason being is that International standards can support up to 15 digits
Wikipedia - Telephone Number Formats
If you do support International numbers, I recommend the separate storage of a World Zone Code or Country Code to better filter queries by so that you do not find yourself parsing and checking the length of your phone number fields to limit the returned calls to USA for example

Use CHAR(10) if you are storing US Phone numbers only. Remove everything but the digits.

I'm probably missing the obvious here, but wouldn't a varchar just long enough for your longest expected phone number work well?
If I am missing something obvious, I'd love it if someone would point it out...

I would use a varchar(22). Big enough to hold a north american phone number with extension. You would want to strip out all the nasty '(', ')', '-' characters, or just parse them all into one uniform format.
Alex

nvarchar with preprocessing to standardize them as much as possible. You'll probably want to extract extensions and store them in another field.

SQL Server 2005 is pretty well optimized for substring queries for text in indexed varchar fields. For 2005 they introduced new statistics to the string summary for index fields. This helps significantly with full text searching.

using varchar is pretty inefficient. use the money type and create a user declared type "phonenumber" out of it, and create a rule to only allow positive numbers.
if you declare it as (19,4) you can even store a 4 digit extension and be big enough for international numbers, and only takes 9 bytes of storage. Also, indexes are speedy.

Normalise the data then store as a varchar. Normalising could be tricky.
That should be a one-time hit. Then as a new record comes in, you're comparing it to normalised data. Should be very fast.

Since you need to accommodate many different phone number formats (and probably include things like extensions etc.) it may make the most sense to just treat it as you would any other varchar. If you could control the input, you could take a number of approaches to make the data more useful, but it doesn't sound that way.
Once you decide to simply treat it as any other string, you can focus on overcoming the inevitable issues regarding bad data, mysterious phone number formating and whatever else will pop up. The challenge will be in building a good search strategy for the data and not how you store it in my opinion. It's always a difficult task having to deal with a large pile of data which you had no control over collecting.

Use SSIS to extract and process the information. That way you will have the processing of the XML files separated from SQL Server. You can also do the SSIS transformations on a separate server if needed. Store the phone numbers in a standard format using VARCHAR. NVARCHAR would be unnecessary since we are talking about numbers and maybe a couple of other chars, like '+', ' ', '(', ')' and '-'.

Use a varchar field with a length restriction.

It is fairly common to use an "x" or "ext" to indicate extensions, so allow 15 characters (for full international support) plus 3 (for "ext") plus 4 (for the extension itself) giving a total of 22 characters. That should keep you safe.
Alternatively, normalise on input so any "ext" gets translated to "x", giving a maximum of 20.

It is always better to have separate tables for multi valued attributes like phone number.
As you have no control on source data so, you can parse the data from XML file and convert it into the proper format so that there will not be any issue with formats of a particular country and store it in a separate table so that indexing and retrieval both will be efficient.
Thank you.

I realize this thread is old, but it's worth mentioning an advantage of storing as a numeric type for formatting purposes, specifically in .NET framework.
IE
.DefaultCellStyle.Format = "(###)###-####" // Will not work on a string

Use data type long instead.. dont use int because it only allows whole numbers between -32,768 and 32,767 but if you use long data type you can insert numbers between -2,147,483,648 and 2,147,483,647.

For most cases, it will be done with bigint
Just save unformatted phone numbers like: 19876543210, 02125551212, etc.
Check the topic about bigint vs varchar