I saw a lecture explaining that the database engine uses the divide-and-conquer algorithm for sorting to split the data set into separate runs and then sort them if the data that we need to sort couldn't fit in memory.
In this algorithm when the pages are sorted and then written to disk, then merge the pages and compare each value, the output page is written to disk, this will finally merge two big pages which contain all tuples, my question is if the size of the tuple is very large more than memory size how will it handle at the last point when merging two big pages if these two pages size is large than memory size?
I have a constant number of columns - they correspond to real-time coordinates of a few/maybe even a few hundred points in space (constant id and x, y coordinates of a detected pose in an OpenCV image - they are analyzed grid by grid so a lot of data comes in at once).
I read that Redis runs on RAM and you can set a time to delete the data.
Cassandra stores data in columns next to each other so as for fixed coordinates it should be suitable.
It would be nice if you could perform operations on them such as subtraction or multiplication.
I'm looking for a database that will be able to quickly write and read this data and at the same time will not be performance-intensive.
thanks
As I known about the relational database, records are stored with the same size. However, array datatype in PostgreSQL is flexible in size. So how PostgreSQL actually store Array datatype? Do it store the pointer of the array to the record and store the value somewhere else?
In PostgreSQL (and other databases I am aware of) rows do not have a fixed size.
Arrays are stored like all other values of a type with variable size: if the row threatens to exceed 2000 bytes, the TOAST machinery will first compress such values, and if that is not enough, store them out of line in a TOAST table.
See the documentation for details.
Say I have a large database table whose entries are vectors. I wish to do a search and sort by distance to a vector. The naive way consists in every time computing the distance between my vector and each of the ones from the database, then sorting by that distance.
Is there any other known algorithm for doing this, perhaps involving some type of indexing in advance?
Alternatively, are there known implementations of such algorithms, for say SQL or Elasticsearch?
I am looking to store 2D arrays of 900x100 elements in a database. Efficient recall and comparison of the arrays is important. I could use a table with a schema like [A, x, y, A(x,y)] such that a single array would compromise 90,000 records. This seems like an ~ok~ table design to store the array, and would provide for efficient recall of single elements, but inefficient recall of a whole array and would make for very inefficient array comparisons.
Should I leave the table design this way and build and compare my arrays in code? Or is there a better way to structure the table such that I can get efficient array comparisons using database only operations?
thanks
If the type of data allows, store it in a concatenated format and compare in memory after it has been de-concatenated. The database operation will be much faster and the in-memory operations will be faster than database retrievals as well.
Who knows, you may even be able to compare it without de-concatenating.
900 x 100 elements is actually very small (even if the elements are massive 1K things that'd only be 90 MB). Can't you just compare in memory when needed and store on disk in some serialized format?
It doesn't make sense to store 2D arrays in the database, especially if it is immutable data.
When I used to work in the seismic industry we used to just dump our arrays (typically 1d of a few thousand elements) to binary file. The database would only be used for what was essentially meta data (location, indexing, etc). This would be considerably quicker, but it also allowed the data to decoupled if necessary: In production this was usual, a few thousand elements doesn't sound much, but a typical dataset could easily be hundreds of GB - this is the 1990s, so we had to decouple to tape.