Storing sparse matrices in SQL for quick retrieval and deserealization - arrays

Interested in storing a sparse vector (technically the sparse array is a key value pair dictionary but let’s just say that the values are an array for simplicity) and retrieving it from Postgres efficiently. The table schema would be:
id
sparse_array
Few options being considered:
store array using ARRAY type on Postgres
Store as a JSON
Encrypt/decrypt the array using some sort of two-way encryption scheme - JWT concept?
Convert into character (comma, for example) separated string and serialize/deserialize using C
Are there industry practice/good ways of doing this? Big tech companies can deliver images super quickly to users, how is that made possible/how does storage/retrieval work?

Related

Storing complex data as key in NoSQL database?

I have data that has multiple dimensions, each of which are strings. For example, a Person is described by position, id, email, etc...
I want to use one piece of multi-dimensional datum as a key into my NoSQL database. I don't need to do any complex querying, just periodic full table scans (the table will be small). What are some ways / best practices to format this data as a key?
I have considered colon delimiting (i.e. position:id:email) but it has hard readability and low flexibility. I've also considered hashing this colon-delimited string. Is there a good hash function for this type of thing? Or any completely other suggestions?
Thanks in advance!
Storing multi-dimensional data under a one-dimensional key is a challenging task in key-value-stores / NoSQL databases. Projects like MD-HBase or GeoMESA do exist; they place the multi-dimensional data into an n-dimensional space and use a space-filling curve to encode the location of the data into a one-dimensional key. However, most projects are limited to 2-dimensional spatial data, and string attributes could not be handled.
Shameless Plug: I have started a new open-source-project called BBoxDB. BBoxDB is a distributed storage manager that is capable of handling multi-dimensional data. In BBoxDB a bounding box is used to describe the location of multi-dimensional data in the n-dimensional space. You could map the string attributes of your Person entity to a point in the n-dimensional space and use this point as the bounding box for your data. Then BBoxDB can run queries on your data (e.g., full table scans or scans that are restricted to some dimensions). The project is at an early stage, but maybe it is interesting for you.

Store any hash in GDBM and can I search in it?

Reading about GDBM in this book they only give simple examples of the data structure that can stored. E.g.
$dbm{'key'} = "value";
Background
I would like to save many small text files in the database for local use only, and use nested hashes and arrays to represent the file paths. It doesn't have to be GDBM, but it seams to be the only key/value database library for Perl.
Question
Can I store any hash in GDBM no matter have many nested hashes and arrays it contains?
Does GDBM offer any search features, or I am left to implement my own in Perl?
DBM databases don't support arrays at all. They are esssentially the same as a Perl hash, except that the value of an item can only be a simple string and may not be a number or a reference. The keys and values for each data item in a DBM database are simple byte sequences. That is, the API represents them by a char pointer and an int size.
Within that constraint you can use the database however you like, but remember that, unlike SQL databases, every key must be unique.
You could emulate nested hashes by using the data fetched by one access as a key for the next access but, bearing in mind the requirement for unique keys, that's far from ideal.
Alternatively, the value fetched could be the name of another DBM database which you could go on to query further.
A final option is to concatenate all the keys into a single value, so that
$dbm{aa}{bb}{cc}
would actually be implemented as something like
$dbm{aa_bb_cc}
Actually, you can store hashes of hashes, or lists of lists in perl. You use the MLDBM module from CPAN, along with the dbm of your choice..
check out this online pdf book and go to chapter 13.
[https://www.perl.org/books/beginning-perl/][1]
The complex part is figuring out how to access the various levels of references. To search you would have to run through the keys and parse the values.

How is it possible to build database index on top of key/value store?

I was reading about LevelDB and found out that:
Upcoming versions of the Chrome browser include an implementation of the IndexedDB HTML5 API that is built on top of LevelDB
IndexedDB is also a simple key/value store that has the ability to index data.
My question is: how is it possible to build an index on top of a key/value store? I know that an index is at it's lowest level is n-ary tree and I understand the way that data is indexed in a database. But how can a key/value store like LevelDB be used for creating a database index?
The vital feature is not that it supports custom comparators but that it supports ordered iteration through keys and thus searches on partial keys. You can emulate fields in keys just using conventions for separating string values. The many scripting layers that sit on top of leveldb use that approach.
The dictionary view of a Key-Value store is that you can only tell if a key is present or not by exact match. It is not really feasible to use just such a KV store as a basis for a database index.
As soon as you can iterate through keys starting from a partial match, you have enough to provide the searching and sorting operations for an index.
Just a couple of things, LevelDB supports sorting of data using a custom comparer, from the page you linked to:
According to the project site the key features are:
Keys and values are arbitrary byte arrays.
Data is stored sorted by key.
Callers can provide a custom comparison function to override the sort order.
....
So LevelDB can contain data this can be sorted/indexed based on 1 sort order.
If you needed several indexable fields, you could just add your own B-Tree that works on-top of LevelDB. I would imagine that this is the type of approach that the Chrome browser takes, but I'm just guessing.
You can always look through the Chrome source.

Store array of numbers in database field

Context: SQL Server 2008, C#
I have an array of integers (0-10 elements). Data doesn't change often, but is retrieved often.
I could create a separate table to store the numbers, but for some reason it feels like that wouldn't be optimal.
Question #1: Should I store my array in a separate table? Please give reasons for one way or the other.
Question #2: (regardless of what the answer to Q#1 is), what's the "best" way to store int[] in database field? XML? JSON? CSV?
EDIT:
Some background: numbers being stored are just some coefficients that don't participate in any relationship, and are always used as an array (i.e. never a value is being retrieved or used in isolation).
Separate table, normalized
Not as XML or json , but separate numbers in separate rows
No matter what you think, it's the best way. You can thank me later
The "best" way to store data in a database is the way that is most conducive to the operations that will be performed on it and the one which makes maintenance easiest. It is this later requirement which should lead you to a normalized solution which means storing the integers in a table with a relationship. Beyond being easier to update, it is easier for the next developer that comes after you to understand what and how the information is stored.
Store it as a JSON array but know that all accesses will now be for the entire array - no individual read/writes to specific coefficients.
In our case, we're storing them as a json array. Like your case, there is no relationship between individual array numbers - the array only make sense as a unit and as a unit it DOES has a relationship with other columns in the table. By the way, everything else IS normalized. I liken it to this: If you were going to store a 10 byte chunk, you'd save it packed in a single column of VARBINARY(10). You wouldn't shard it into 10 bytes, store each in a column of VARBINARY(1) and then stitch them together with a foreign key. I mean you could - but it wouldn't make any sense.
YOU as the developer will need to understand how 'monolithic' that array of int's really is.
A separate table would be the most "normalized" way to do this. And it is better in the long run, probably, since you won't have to parse the value of the column to extract each integer.
If you want you could use an XML column to store the data, too.
Sparse columns may be another option for you, too.
If you want to keep it really simple you could just delimit the values: 10;2;44;1
I think since you are talking about sql server that indicates that your app may be a data driven application. If that is the case I would keep definately keep the array in the database as a seperate table with a record for each value. It will be normalized and optimized for retreival. Even if you only have a few values in the array you may need to combine that data with other retreived data that may need to be "joined" with your array values. In which case sql is optimized for by using indexes, foreign keys, etc. (normalized).
That being said, you can always hard code the 10 values in your code and save the round trip to the DB if you don't need to change the values. It depends on how your application works and what this array is going to be used for.
I agree with all the others about the best being a separate normalized table. But if you insist in having it all in the same table don't place the array in one only column. In instead create the 10 columns and store each array value in a different column. It will save you the parsing and update problems.

Array/list vs Dictionary (why we have them at first place)

To me they are both same and that is why i am wondering why we have dictionary data structure when we can do everything with arrays/list? What is so fancy in dictionaries?
Arraylists just store a set of objects (that can be accessed randomly). Dictionaries store pairs of objects. This makes array/lists more suitable when you have a group of objects in a set (prime numbers, colors, students, etc.). Dictionaries are better suited for showing relationships between a pair of objects.
Why do we need dictionaries? lets say you have some data you need to convert from one form to another, like roman numeral characters to their values. Without dictionaries, you'd have to hack this association together with two arrays, where you first find the position the key is in the first list and access that position in the second. This is terribly error prone and inefficient, and dictionaries provide a more direct approach.
Arrays provide random access of a sequential set of data. Dictionaries (or associative arrays) provide a map from a set of keys to a set of values.
I believe you are comparing apples and oranges - they serve two completely different purposes and are both useful data structures.
Most of the time a dictionary-like type is built as a hash table - this type is very useful as it provides very fast lookups on average (depending on the quality of the hashing algorithm).
The confusion lies in the different naming conventions in different languages. In my understanding, what is called a "Dictionary" in Python is the same as "Associative Array" in PHP.
To build on what Andrew said, in some languages such as PHP and Javascript, the array can also function as a dictionary (known as associative arrays). It also comes down to loose v strict typing in the language.
You could in theory do everything with dictionaries.
But do not forget that at some point the program runs on a real machine which has limitations due to the hardware: processor, memory, nature of the storage (disc/SSD) ...
Behind the scenes the dictionaries are often using a Hash table
In some languages you can choose between many different types of list/array and hash tables as there are many different implementations of those structures, each with advantages and disadvantaged.
Use an array when you work with a sequence of elements or need to randomly access an element at a given index (0, 1, 2, ...)
Use a dictionary when you have key/value format and need fast retrieval via key
If you want to understand more about these I recommend you learn more about data structures as they are fundamental
NOTE: depending on the language the name of those structures may vary and is a source of confusion.

Resources