I have a form that records a student ID number. Some of those numbers contain a leading zero. When the number gets recorded into the database it drops the leading 0.
The field is set up to only accept numbers. The length of the student ID varies.
I need the field to be recorded and displayed with the leading zero.
If you are always going to have a number of a certain length (say, it will always be 10 characters), then you can just get the length of the number in the database (after it is converted to a string) and then add the appropriate 0's.
However, if this is an arbitrary amount of leading zeros, then you will have to store the content as a string in the database so you can capture the leading zeros.
It sounds like this should be stored as string data. It sounds like the leading zeros are part of the data itself, not just part of it's formatting.
You could reformat the data for display with the leading zeros in it, however I believe you should store the correct form of the ID number, it will lead to less bugs down the road (ex: you forgot to format it in one place but not in another).
There are a few ways of doing this - depending on the answers to my comments in your question:
Store the extra data in the database by converting the datatype from numeric to varchar/string.
Advantages: Very simple in its implementation; You can treat all the values in the same way.
Disadvantage: If you've got very large amounts of data, storage sizes will escalate; indexing and sorting on strings doesn't perform so well.
Use if: Each number may have an arbitrary length (and hence number of zeros).
Don't use if: You're going to be spending a lot of time sorting data, sorting numeric strings is a pain in the ass - look up natural sorting to see some of the pitfalls;
Continue to store the data in the database as numeric but pad the numeric back to a set length (i.e. 10 as I have suggested in my example below):
Advantages: Data will index better, search better, not require such large amounts of storage if you've got large amounts of data.
Disadvantage: Every query or display of data will require every data instance to be padded to the correct length causing a slight performance hit.
Use if: All the output numbers will be the same length (i.e. including zeros they're all [for example] 10 digits); Large amounts of sorting will be necessary.
Add a field to your table to store the original length of the numeric, continue to store the value as numeric (to leverage sorting/indexing performance gains of numeric vs. string) in your new field store the length as it would include the significant zeros:
Advantages: Reduction in required storage space; maximum use of indexing; sorting of numerics is far easier than sorting text numerics; You still get the ability to pad numerics to arbitrary lengths like you have with option 1.
Disadvantages: An extra field is required in your database, so all your queries will have to pull that extra field thus potentially requiring a slight increase in resources at query/display time.
Use if: Storage space/indexing/sorting performance is any sort of concern.
Don't use if: You don't have the luxury of changing the table structure to include the extra value; This will overcomplicate already complex queries.
If I were you and I had access to modify the db structure slightly, I'd go with option 3, sure you need to pull out an extra field to get the length. The slightly increased complexity pays huge dividends in the advantages versus the disadvantages. The performance hit of padding the string back out the correct length will be far superceded by the performance increase of the indexing and storage space required.
I worked with a database with a similar problem. They were storing zip codes as a number. The consequence was that people in New Jersey couldn't use our app.
You're using data that is logically a text string and not a number. It just happens to look like a number, but you really need to treat it as text. Use a text-oriented data type, or at least create a database view that enables you to pull back a properly formatted value for this.
See here: Pad or remove leading zeroes from numbers
declare #recordNumber integer;
set #recordNumber = 93088;
declare #padZeroes integer;
set #padZeroes = 8;
select
right( replicate('0',#padZeroes)
+ convert(varchar,#recordNumber), #padZeroes);
Unless you intend on doing calculations on that ID, its probably best to store them as text/string.
Another option is since the field is an id, i would recommend creating a secondary field for display number (nvarchar) that you can use for reports, etc...
Then in your application when the student id is entered you can insert that into the database as the number, as well as the display number.
An Oracle solution
Store the ID as a number and convert it into a character for display. For instance, to display 42 as a zero-padded, three-character string:
SQL> select to_char(42, '099') from dual;
042
Change the format string to fit your needs.
(I don't know if this is transferable to other SQL flavors, however.)
You could just concatenate '1' to the beginning of the ID when storing it in the database. When retrieving it, treat it as a string and remove the first char.
MySQL Example:
SET #student_id = '123456789';
INSERT INTO student_table (id,name) VALUES(CONCAT('1',#student_id),'John Smith');
...
SELECT SUBSTRING(id,1) FROM student_table;
Mathematically:
Initially I thought too much and did it mathematically by adding an integer to the student ID, depending on its length (like 1,000,000,000 if it's 9 digits), before storing it.
SET #new_student_id = ABS(#student_id) + POW(10, CHAR_LENGTH(#student_id));
INSERT INTO student_table (id,name) VALUES(#new_student_id,'John Smith');
Related
I have some data which I will be putting in the database. Say I make a field like "coupondetail text(10000)" which will store the coupon detail, now consider that not all coupondetail will be 10,000 chars long. I m curious to know how much space will the column take in the database when the coupondetail text is lesser than 10,000 say 1000 chars?
sqlite does not care much how you declare your column types and ignores any maximum length specified. The declared type is just a hint; any non-INTEGER PRIMARY KEY column can contain any type.
The size taken up in the database file depends on the values you put in. In the record format, strings are stored as length followed by string data. No empty space is necessarily left there.
What is the best way to store the following value in SQL Server ?
1234-56789 or
4567-12892
The value will always have 4 digits followed by a hyphen and 5 digits
char(10) is a possibility that I was thinking of using or removing the hyphen and storing as int
If it is a business requirement to have "The value will always have 4 digits followed by a hypen and 5 digits" Then CHAR(10) but if you think Users should be able to add values even if isnt in the expected format then VARCHAR(10) or VARCHAR(15) whatever suits you better.
You should store those kind of values as int only if really represents a number as opposed to a series of digits. Number means something that you can make calculations on, compare are numbers, etc.
Otherwise store it as char. Make it length of 10 if the format is set and won't change.
Another option would be to create a CHAR(4) column and a CHAR(5) column. This would be useful (only) if you envision ever having to query against one or the other part independently.
Very easy to concatenate these back together using a view, computed column, or inline - so you don't have to waste storage space on a dash that will always be there, and so that you can keep these two pieces of data separate if, in fact, they are independent.
Since you didn't provide much detail about what these "numbers" represent or how they will be used / queried, you're going to get a whole bunch of opinions, some of which might not be very relevant to your data model.
Well, if it's guaranteed to always be like that, a char(10) datatype seems appropriate.
But you should also add a check constraint:
column LIKE '[0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9][0-9]'
Here is a SO answer that should help you sort out what you need -
nchar and nvarchar can store Unicode characters.
char and varcharcannot store Unicode characters.
char and nchar are fixed-length which will reserve storage space for number of characters you specify even if you don't use up all that space.
varchar and nvarchar are variable-length which will only use up spaces for the characters you store. It will not reserve storage like char or nchar.
I have a set of data which has a name, some sub values and then a associative numeric value. For example:
James Value1 Value2 "1.232323/1.232334"
Jim Value1 Value2 "1.245454/1.232999"
Dave Value1 Value2 "1.267623/1.277777"
There will be around 100,000 entries like this stored in either a file or database. I would like to know, what is the quickest way of being able to return the results which match a search, along with their associated numeric value.
For example, a query of "J" would return both James and Jim results which the numeric values in the last column.
I've heard people mention binary tree searching, dictionary searching, indexed searching. I have no idea which is a good route to peruse.
This is a poorly characterized problem. As with many optimization problems, there are trade-offs in resources. If you truly want the fastest response possible, then a likely approach is to compile all possible searches into a table of prepared results, so that, given a search key, you can look the search key up in the table and return the result.
Assuming your character set is limited to A-Z and a-z, a table with an entry for each search key from 0 to 4 characters will use a modest amount of memory by today’s standards. Each table entry merely needs to have two values in it: The start and end positions in a list of the numeric values. (Compile the list in this way: Sort the records by the name field. Extract just the numeric values from the records, maintaining the order, putting them in a list. Any search key must return a sublist of contiguous records from that list. This is because the search is for a prefix string of the name field, so any records that match the search key are adjacent, when sorted by the name field.)
Thus, to create a table to look up any key of 0 to 4 characters, you need fewer than 534 entries in a table of pairs, where each member of the pair contains a record number (32 bits or fewer). So 8•534 = 60.2 MiB suffices. (53 is because you have 52 characters plus one sentinel character to mark the end of the key. Alternate encodings could reduce this some.)
To support keys of more than 4 characters, you need to extend this. With typical data, 4 characters will have narrowed down the search greatly, so you can take the set of records indicated by the first 4 characters and prune it to get the final results. If the data has pathological cases where 4 characters does not reduce the search much, you can embellish this technique.
So, is that really what you want to do, make the speed as fast as possible, regardless of other resources (including engineering time) consumed? If not, what are your actual goals?
I want to load data with 4 columns and 80 millon rows in MySQL on Redis, so that I can reduce fetching delay.
However, when I try to load all the data, it becomes 5 times larger.
The original data was 3gb (when exported to csv format), but when I load them on Redis, it takes 15GB... it's too large for our system.
I also tried different datatypes -
1) 'table_name:row_number:column_name' -> string
2) 'table_name:row_number' -> hash
but all of them takes too much.
am I missing something?
added)
my data have 4 col - (user id(pk), count, created time, and a date)
The most memory efficient way is storing values as a json array, and splitting your keys such that you can store them using a ziplist encoded hash.
Encode your data using say json array, so you have key=value pairs like user:1234567 -> [21,'25-05-2012','14-06-2010'].
Split your keys into two parts, such that the second part has about 100 possibilities. For example, user:12345 and 67
Store this combined key in a hash like this hset user:12345 67 <json>
To retrieve user details for user id 9876523, simply do hget user:98765 23 and parse the json array
Make sure to adjust the settings hash-max-ziplist-entries and hash-max-ziplist-value
Instagram wrote a great blog post explaining this technique, so I will skip explaining why this is memory efficient.
Instead, I can tell you the disadvantages of this technique.
You cannot access or update a single attribute on a user; you have to rewrite the entire record.
You'd have to fetch the entire json object always even if you only care about some fields.
Finally, you have to write this logic on splitting keys, which is added maintenance.
As always, this is a trade-off. Identify your access patterns and see if such a structure makes sense. If not, you'd have to buy more memory.
+1 idea that may free some memory in this case - key zipping based on crumbs dictionary and base62 encoding for storing integers,
it shrinks user:12345 60 to 'u:3d7' 'Y', which take two times less memory for storing key.
And with custom compression of data, not to array but to a loooong int (it's possible to convert [21,'25-05-2012','14-06-2010'] to such int: 212505201214062010, two last part has fixed length then it's obvious how to pack/repack such value )
So whole bunch of keys/values size is now 1.75 times less.
If your codebase is ruby-based I may suggest me-redis gem which is seamlessly implement all ideas from Sripathi answer + given ones.
In sql server does it make a difference if I define a varchar column to be of length 32 or 128?
A varchar is a variable character field. This means it can hold text data to a certain length. A varchar(32) can only hold 32 characters, whereas a varchar(128) can hold 128 characters. If I tried to input "12345" into a varchar(3) field; this is the data that will be stored:
"123"
The "45" will be "truncated" (lost).
They are very useful in instances where you know that a certain field will only be (or only should be) a certain length at maximum. For example: a zip code or state abbreviation. In fact, they are generally used for almost all types of text data (such as names/addresses/et cetera) - but in these instances you must be careful that the number you supply is a sane maximum for the type of data that will fill that column.
However, you must also be careful when using them to only allow the user to input the maximum amount of characters that the field will support. Otherwise it may lend to confusion when it truncates the user's input.
There should be no noticeable difference as the backend will only store the amount of data you insert into that column. It's not padded out to the full size of the field like it is with a char column.
Edit: For more info, this link should be useful.
It should not. It just defines the maximum length it can accommodate, the actual used length depends on the data inserted.