Which data type do I use in T-SQL if the field can be very small or very large? - sql-server

Suppose I have a column in a table which may hold information as short as a single char 'a' or as big as a huge binary chunk which translates to a jpg, png, mp3, whatever.
Which data type should I use for this?
I thought varbinary(max) or varchar(max), but will it occupy unused space if I just store a single char or a short string?
How is data stored when a field has a data-type that may have variable lengths?
According to this qa, https://dba.stackexchange.com/questions/1767/how-do-too-long-fields-varchar-nvarchar-impact-performance-and-disk-usage-ms, it shouldn't matter, except for this:
Memory
If the client application allocates memory using the maximum size, the application would allocate significantly more memory than is necessary. Special considerations would have to be done to avoid this.
How do I know this? Sorry if I'm being dumb but it seems too vague.

What I would do if I were you, is using File-Stream enable database.
You can learn more about it here:
http://technet.microsoft.com/en-us/library/bb933993(v=sql.105).aspx
Also there are a lot of information in the net, so you should face no issues while using it.
Generally, you will have a table with many columns as you need and one column that will store the information in binary format or BLOB - Binary Large Object. The good thing is that the information that it can store depends only by your hard disk drive space, because it is store in your drive. In the other columns you can have a type field - for eaxmaple mp3/gpef/avi/etc that will help your application to convert this BLOB in its original type.

Related

Why each data type has two choices in pgAdmin 4?

Using pgAdmin 4, defining data type for a table column discover each type has two choices, what's the difference?
for example char has two choices.
Because Postgres does not work like programming data structures. It is not keeping them on memory but disk. So each data structure has a single and fixed length array value. See for more information: https://stackoverflow.com/a/42484838/13638824
Edit for your question (It was too long to be comment):
Writing data structures to a memory and disk is totally different. Memory is dynamic and writing to it is very cheap so it can handle changes in data very fast. But writing to disk is not cheap, when database systems decide to write a data they are trying to be efficient as possible. They are usually using fixed sized allocations in background to improve performance. When you define a Text dbms will not create a string. It will write it to disk as expandable data structure to ensure flexibility and performance. But if you use char[] the length will be predefined so it will be more performant.
TL;DR;
For memory data structures:
char[] == string.
But for DBMS char[] == fixed size byte array and text == flexible size byte array.

Is there a reason why not to store encrypted data as binary in a database?

I have to store AES-GCM encrypted data in a database. Currently we use MariaDB but with the option to later change to PostgreSQL. (however other databases should be considered as well)
Since the algorithm does not actually encrypt strings, but bytes and the output of an encryption algorithm is also a byte[], why not store the encrypted data directly in a binary column?
For MariaDB/MySql that would be as a BLOB. I understand PostgreSQL even has a preferred special data type for encrypted data called bytea.
However most programmers seem to encode the encrypted bytes as Base64 instead and store the resulting string in a VARCHAR.
Encoding to and decoding from Base64 seems counter intuitive to me. It makes the data up to 50% longer and is an extra step each time. It also forces the database to apply a character encoding when storing and retrieving the data. This is an extra step and surely costs extra time and resources, while all we really need to store are some bytes. The encrypted data makes no sense in any character encoding any way.
Question:
Is there any good reason for or against storing encrypted data as binary in a database? Is there a security, data integrity or performance reason why I may not want to store the encrypted data directly as binary?
(I assume this question will shortly be closed as "opinion based" - but nevertheless)
Is there any good reason for or against storing encrypted data as binary in a database
No. I don't see any reason against using a proper "blob" type (BLOB, bytea, varbinary(max), ....)
The general rule of thumb is: use the data type that matches the data. So BLOB (or the equivalent type) is the right choice.
Using base64 encoded strings might be reasoned because not all libraries (obfuscation layers like ORMs) might be able to deal with "blobs" correctly, so people chose to use something that is universally applicable (ignoring the overhead in storage and processing).
Note that Postgres' bytea is not "a special type for encrypted data". It's a general purpose data type for binary data (images, documents, music, ...)

The most suitable way to store encrypted data in Postgres

I've got a byte array (about 100 bytes maximum) of data encrypted with AES/GCM. I wonder if you could help me choose the best data type for it. On my past project we were just using varchar2 in Oracle but now I'm doing the choice myself.
Should I use byte array or varchar or something else?
Which type is the most suitable in terms of security and performance?
Thank you!
If that is binary data, the bytea data type would be the best match.
Using a string data tyoe would only cause encoding problems and you wouldn't be able to store zero bytes.

Format for storing EXIF metadata in database

I'm working on an application for which I need to be able to store EXIF metadata in a relational database. In the future I'd like to also support XMP and IPTC metadata, but at the moment my focus is on EXIF.
There are a few questions on Stack Overflow about what the table structure should look like when storing EXIF metadata. However, none of them really address my concern. The problem I have is that different EXIF tags have values in different formats, and there isn't really one column type which conveniently stores them all.
The most common type is a "rational" which is an array of two four-byte integers representing a fraction. But there are also non-fractional short and long integers, ASCII strings, byte arrays, and "undefined" (an 8-bit type which must be interpreted according to a priori knowledge of the specific tag.) I'd like to support all of these types, and I want to do so in a convenient, efficient, lossless (i.e. without converting the rationals to floats), extensible and searchable manner.
Here's what I've considered so far:
My current solution is to store everything as a string. This makes it pretty easy to store all of the different types, and is also convenient for searching and debugging. However, it's kind of clunky and inefficient because when I want to actually use the data, I have to do a bunch of string manipulation to convert the rational values into their fractional equivalents, e.g. fraction = float(value.split('/')[0]) / float(value.split('/')[1]). (It's not actually a messy one-liner like that in my real code, but this demonstrates the problem.)
I could grab the raw EXIF bytes for each value from the file and store them in a blob column, but then I'd have to reinterpret the raw bytes every time. This could be marginally more CPU-efficient than the string solution, but it's much, much worse in every other way - on the whole, not worth it.
I could have a different table for each different EXIF datatype. Using this pattern I can maintain my foreign key relationships while storing my values in several different tables. However, this will make my most common query, which is to select all EXIF metadata for a given photo, kind of nasty. It will also become unwieldy very quickly when I add support for other metadata formats.
I'm not a database expert by any means, so there some pattern or magic union-style column type I'm missing that can make this problem go away? Or am I stuck picking my poison from among the three options above?
This is probably a very cheap solution, but I would personally just store the json or something like that within the database.
There is a cool way to extract EXIF data and parse it to json.
Here is the link: Img2JSON
I hope this kind of helps you!

How to store videos in a PostgreSQL database?

I am storing image files (like jpg, png) in a PostgreSQL database. I found information on how to do that here.
Likewise, I want to store videos in a PostgreSQL database. I searched the net - some say one should use a data type such as bytea to store binary data.
Can you tell me how to use a bytea column to store videos?
I would generally not recommend to store huge blobs (binary large objects) inside PostgreSQL if referential integrity is not your paramount requirement. Storing huge files in the filesystem is much more efficient:
Much faster, less disk space used, easier backups.
I have written a more comprehensive assessment of the options you've got in a previous answer to a similar question. (With deep links to the manual.)
We did some tests about practical limits of bytea datatype. There are theoretical limit 1GB. But practical limit is about 20MB. Processing larger bytea data eats too much RAM and encoding and decoding takes some time too. Personally I don't think so storing videos is good idea, but if you need it, then use a large objects - blobs.
Without knowing what programming language you are using, I can only give a general approach:
Create a table with a column of type 'bytea'.
Get the contents of the video file into a variable.
Insert a row into that table with that variable as the data for the bytea column.

Resources