Sqlite giving column more size than needed - database

I have some data which I will be putting in the database. Say I make a field like "coupondetail text(10000)" which will store the coupon detail, now consider that not all coupondetail will be 10,000 chars long. I m curious to know how much space will the column take in the database when the coupondetail text is lesser than 10,000 say 1000 chars?

sqlite does not care much how you declare your column types and ignores any maximum length specified. The declared type is just a hint; any non-INTEGER PRIMARY KEY column can contain any type.
The size taken up in the database file depends on the values you put in. In the record format, strings are stored as length followed by string data. No empty space is necessarily left there.

Related

SQL Server data types to store strings and take less space

I have a question in regards to data types that are available in SQL language to store data into the database itself. Since I'm dealing with database that is quite large, and has a tendency to expand over 150GB+ of data, I need to pay close attention and save up every bit of space on the server's hard drive so that the database doesn't takes up all the precious space. So my question is as following:
Which data type is the best to store 80-200 character long string in database?
I'm aware of for example varchar(200) and nvarchar(200) where the nvarchar supports unicode character. Which one of these would take up less space in database, or if there's a 3rd data type that I'm not aware of, and which I could use to store the data (if I know for a fact that the string I would store is just a combination of numbers and letters, without any special characters)
Are there some other techniques that I could use to save up space in database so that it doesn't expands rapidly ?
Can someone help me out with this ?
P.S. Guys, I have a 4th question as well:
If for example I have nvarchar(max) data type which is in a table, and the entered record takes up only 100 characters, how much data is reserved for that kind of record?
Let's say that I have ID which is of following form 191697193441 ... Would it make more sense to store this number as varchar(200) or bigint ?
The size needed for nvarchar is 2 bytes per character, as it represents unicode data. varchar needs 1 byte per character. The storage size is the actual number of characters entered + 2 bytes overhead. This is also true for varchar(max).
From https://learn.microsoft.com/en-us/sql/t-sql/data-types/char-and-varchar-transact-sql:
varchar [ ( n | max ) ] Variable-length, non-Unicode string data. n defines the string length and can be a value from 1 through 8,000. max indicates that the maximum storage size is 2^31-1 bytes (2 GB). The storage size is the actual length of the data entered + 2 bytes.
So for your 4th question, nvarchar would need 100 * 2 + 2 = 202 bytes, varchar would need 100 * 1 + 2 = 102 bytes.
There's no performance or data size difference as they're variable length data types, so they'll only use the space they need.
Think of the size parameter as more of a useful constraint. For e.g. if you have a surname field, you can reasonably expect 50 characters to be a sensible maximum size and you have more chance of a mistake (misuse of the field, incorrect data capture etc.) throwing an error, rather than adding nonsense to the database and needing future data cleansing.
So, my general rule of thumb is make them as large as the business requirements demand, but no larger. It's trivial to amend variable data sizes to a larger length in the future if requirements change.

Memory Usage in SQL Server?

I have been thinking about database design lately and I have the following question:
When a type, say a varchar(max), is set for a column is 2GB of space set aside every time a row is inserted?
Or is the space allocated on the server equal to the amount of data in the column?
Thanks!
The varchar data type in SQL Server and elsewhere means roughly variable-length character data. The max or any other constant value represents its upper bound and not its absolute size. So your latter observation is correct:
the space allocated on the server
equal to the amount of data in the
column
Now, if you define something like char(200) (notice the lack of var in front of char there) then yes, 200 characters are allocated regardless of how much data (up to 200 chars) you store in that field. The maximum upper bound for the char data type is 8000, by the way.
Of course not :)
Varchar(max) is a bastardised hybrid data type (if you do not mind me being a bit frank):
It is stored with the rest of the columns in the row if total row length does not exceed 8KB limit
Otherwise stored as a blob. A blob (TEXT,IMAGE) takes as much as its length.

PostgreSQL character varying length limit

I am using character varying data type in PostgreSQL.
I was not able to find this information in PostgreSQL manual.
What is max limit of characters in character varying data type?
Referring to the documentation, there is no explicit limit given for the varchar(n) type definition. But:
...
In any case, the longest possible
character string that can be stored is
about 1 GB. (The maximum value that
will be allowed for n in the data type
declaration is less than that. It
wouldn't be very useful to change this
because with multibyte character
encodings the number of characters and
bytes can be quite different anyway.
If you desire to store long strings
with no specific upper limit, use text
or character varying without a length
specifier, rather than making up an
arbitrary length limit.)
Also note this:
Tip: There is no performance
difference among these three types,
apart from increased storage space
when using the blank-padded type, and
a few extra CPU cycles to check the
length when storing into a
length-constrained column. While
character(n) has performance
advantages in some other database
systems, there is no such advantage in
PostgreSQL; in fact character(n) is
usually the slowest of the three
because of its additional storage
costs. In most situations text or
character varying should be used
instead.
From documentation:
In any case, the longest possible character string that can be stored is about 1 GB.
character type in postgresql
character varying(n), varchar(n) = variable-length with limit
character(n), char(n) = fixed-length, blank padded
text = variable unlimited length
based on your problem I suggest you to use type text. the type does not require character length.
In addition, PostgreSQL provides the text type, which stores strings of any length. Although the type text is not in the SQL standard, several other SQL database management systems have it as well.
source : https://www.postgresql.org/docs/9.6/static/datatype-character.html
The maximum string size is about 1 GB. Per the postgres docs:
Very long values are also stored in background tables so that they do not interfere with rapid access to shorter column values. In any case, the longest possible character string that can be stored is about 1 GB. (The maximum value that will be allowed for n in the data type declaration is less than that.)
Note that the max n you can specify for varchar is less than the max storage size. While this limit may vary, a quick check reveals that the limit on postgres 11.2 is 10 MB:
psql (11.2)
=> create table varchar_test (name varchar(1073741824));
ERROR: length for type varchar cannot exceed 10485760
Practically speaking, when you do not have a well rationalized length limit, it's suggested that you simply use varchar without specifying one. Per the official docs,
If you desire to store long strings with no specific upper limit, use text or character varying without a length specifier, rather than making up an arbitrary length limit.

Dropping Leading Zeros

I have a form that records a student ID number. Some of those numbers contain a leading zero. When the number gets recorded into the database it drops the leading 0.
The field is set up to only accept numbers. The length of the student ID varies.
I need the field to be recorded and displayed with the leading zero.
If you are always going to have a number of a certain length (say, it will always be 10 characters), then you can just get the length of the number in the database (after it is converted to a string) and then add the appropriate 0's.
However, if this is an arbitrary amount of leading zeros, then you will have to store the content as a string in the database so you can capture the leading zeros.
It sounds like this should be stored as string data. It sounds like the leading zeros are part of the data itself, not just part of it's formatting.
You could reformat the data for display with the leading zeros in it, however I believe you should store the correct form of the ID number, it will lead to less bugs down the road (ex: you forgot to format it in one place but not in another).
There are a few ways of doing this - depending on the answers to my comments in your question:
Store the extra data in the database by converting the datatype from numeric to varchar/string.
Advantages: Very simple in its implementation; You can treat all the values in the same way.
Disadvantage: If you've got very large amounts of data, storage sizes will escalate; indexing and sorting on strings doesn't perform so well.
Use if: Each number may have an arbitrary length (and hence number of zeros).
Don't use if: You're going to be spending a lot of time sorting data, sorting numeric strings is a pain in the ass - look up natural sorting to see some of the pitfalls;
Continue to store the data in the database as numeric but pad the numeric back to a set length (i.e. 10 as I have suggested in my example below):
Advantages: Data will index better, search better, not require such large amounts of storage if you've got large amounts of data.
Disadvantage: Every query or display of data will require every data instance to be padded to the correct length causing a slight performance hit.
Use if: All the output numbers will be the same length (i.e. including zeros they're all [for example] 10 digits); Large amounts of sorting will be necessary.
Add a field to your table to store the original length of the numeric, continue to store the value as numeric (to leverage sorting/indexing performance gains of numeric vs. string) in your new field store the length as it would include the significant zeros:
Advantages: Reduction in required storage space; maximum use of indexing; sorting of numerics is far easier than sorting text numerics; You still get the ability to pad numerics to arbitrary lengths like you have with option 1.
Disadvantages: An extra field is required in your database, so all your queries will have to pull that extra field thus potentially requiring a slight increase in resources at query/display time.
Use if: Storage space/indexing/sorting performance is any sort of concern.
Don't use if: You don't have the luxury of changing the table structure to include the extra value; This will overcomplicate already complex queries.
If I were you and I had access to modify the db structure slightly, I'd go with option 3, sure you need to pull out an extra field to get the length. The slightly increased complexity pays huge dividends in the advantages versus the disadvantages. The performance hit of padding the string back out the correct length will be far superceded by the performance increase of the indexing and storage space required.
I worked with a database with a similar problem. They were storing zip codes as a number. The consequence was that people in New Jersey couldn't use our app.
You're using data that is logically a text string and not a number. It just happens to look like a number, but you really need to treat it as text. Use a text-oriented data type, or at least create a database view that enables you to pull back a properly formatted value for this.
See here: Pad or remove leading zeroes from numbers
declare #recordNumber integer;
set #recordNumber = 93088;
declare #padZeroes integer;
set #padZeroes = 8;
select
right( replicate('0',#padZeroes)
+ convert(varchar,#recordNumber), #padZeroes);
Unless you intend on doing calculations on that ID, its probably best to store them as text/string.
Another option is since the field is an id, i would recommend creating a secondary field for display number (nvarchar) that you can use for reports, etc...
Then in your application when the student id is entered you can insert that into the database as the number, as well as the display number.
An Oracle solution
Store the ID as a number and convert it into a character for display. For instance, to display 42 as a zero-padded, three-character string:
SQL> select to_char(42, '099') from dual;
042
Change the format string to fit your needs.
(I don't know if this is transferable to other SQL flavors, however.)
You could just concatenate '1' to the beginning of the ID when storing it in the database. When retrieving it, treat it as a string and remove the first char.
MySQL Example:
SET #student_id = '123456789';
INSERT INTO student_table (id,name) VALUES(CONCAT('1',#student_id),'John Smith');
...
SELECT SUBSTRING(id,1) FROM student_table;
Mathematically:
Initially I thought too much and did it mathematically by adding an integer to the student ID, depending on its length (like 1,000,000,000 if it's 9 digits), before storing it.
SET #new_student_id = ABS(#student_id) + POW(10, CHAR_LENGTH(#student_id));
INSERT INTO student_table (id,name) VALUES(#new_student_id,'John Smith');

Size of varchar columns

In sql server does it make a difference if I define a varchar column to be of length 32 or 128?
A varchar is a variable character field. This means it can hold text data to a certain length. A varchar(32) can only hold 32 characters, whereas a varchar(128) can hold 128 characters. If I tried to input "12345" into a varchar(3) field; this is the data that will be stored:
"123"
The "45" will be "truncated" (lost).
They are very useful in instances where you know that a certain field will only be (or only should be) a certain length at maximum. For example: a zip code or state abbreviation. In fact, they are generally used for almost all types of text data (such as names/addresses/et cetera) - but in these instances you must be careful that the number you supply is a sane maximum for the type of data that will fill that column.
However, you must also be careful when using them to only allow the user to input the maximum amount of characters that the field will support. Otherwise it may lend to confusion when it truncates the user's input.
There should be no noticeable difference as the backend will only store the amount of data you insert into that column. It's not padded out to the full size of the field like it is with a char column.
Edit: For more info, this link should be useful.
It should not. It just defines the maximum length it can accommodate, the actual used length depends on the data inserted.

Resources