I need to store many values in a database. The quantity of values could vary. I am using postgreSQL, and it gives me the possibility of using arrays to store data. My question is:
¿What are the advantages and disadvantages of using arrays or rows to store a dynamic quantity of data?
Related
Interested in storing a sparse vector (technically the sparse array is a key value pair dictionary but let’s just say that the values are an array for simplicity) and retrieving it from Postgres efficiently. The table schema would be:
id
sparse_array
Few options being considered:
store array using ARRAY type on Postgres
Store as a JSON
Encrypt/decrypt the array using some sort of two-way encryption scheme - JWT concept?
Convert into character (comma, for example) separated string and serialize/deserialize using C
Are there industry practice/good ways of doing this? Big tech companies can deliver images super quickly to users, how is that made possible/how does storage/retrieval work?
I am looking for a good solution to represent a 2-D array in a database. With a few catches of course. For instance, the array could grow in terms of columns and rows (which could be inserted at any point). Also important is that people (users) can manipulate specific cells in this array (and hopefully without having to update the entire array). Also there may be multiple arrays to store.
I have thought about using a json but that would require always writing the entire json to database when an update is made on a a specific cell which would not be idea especially when multiple people could manipulate the array at once.
The typical data structure would be three columns:
row
column
value
Of course, you might have other columns if you have a separate array for each user (say a userid or arrayid column). Or, you might have multiple values for each cell.
For data assurance, my task is two compare two datasets from different databases. Currently I am performing the cell-by-cell value comparison which is a Brute Force method and is consuming lot of time.
I would like to know if there are any methods which would save my time and memory space, which is able to provide a result indication "Tables are identical" or "Tables are not identical".
Thank you for your assistance.
How about creating a checksum for each table and compare the them?
Something like:
SELECT CHECKSUM_AGG(CHECKSUM(*)) FROM TableX
This might need a ORDER BY to be more precise.
If they are from different sources, there is no other way than comparing them cell by cell AFAIK. However I can suggest you something that will probably increasing comparison speed by many folds. If your DataTables have identical structures, which they hopefully should since you're already comparing them cell by cell, try comparing ItemArray of each pair of rows instead of accessing them by column index or column names (or row properties if you're using strongly-typed DataSets). This will hopefully give you much better results.
If you're using .NET 3.5 or above, this line should do it:
Enumerable.SequenceEqual(row1.ItemArray, row2.ItemArray);
I have two database in excel being extracted not in matlab. How can I compare them within matlab? Is the right way, storing them in a structure ? how I can find the similarities and differences?
Each database has 4 columns, and database has around 400 rows of data.
If you are interested in the structure, you could create a struct for each database where each fieldname is equal to a column name.
In this case one can use visdiff.
However if you are going to compare a lot of numbers this is not practical.
To confirm that they are equal, one can use something like isequal
To see how they are different, plot them or plot the difference of them.
To see whether the data behaves differently calculate some basic statistics like max, min, mean, std, you may also be interested in the correlation between columns of the two datasets.
Context: SQL Server 2008, C#
I have an array of integers (0-10 elements). Data doesn't change often, but is retrieved often.
I could create a separate table to store the numbers, but for some reason it feels like that wouldn't be optimal.
Question #1: Should I store my array in a separate table? Please give reasons for one way or the other.
Question #2: (regardless of what the answer to Q#1 is), what's the "best" way to store int[] in database field? XML? JSON? CSV?
EDIT:
Some background: numbers being stored are just some coefficients that don't participate in any relationship, and are always used as an array (i.e. never a value is being retrieved or used in isolation).
Separate table, normalized
Not as XML or json , but separate numbers in separate rows
No matter what you think, it's the best way. You can thank me later
The "best" way to store data in a database is the way that is most conducive to the operations that will be performed on it and the one which makes maintenance easiest. It is this later requirement which should lead you to a normalized solution which means storing the integers in a table with a relationship. Beyond being easier to update, it is easier for the next developer that comes after you to understand what and how the information is stored.
Store it as a JSON array but know that all accesses will now be for the entire array - no individual read/writes to specific coefficients.
In our case, we're storing them as a json array. Like your case, there is no relationship between individual array numbers - the array only make sense as a unit and as a unit it DOES has a relationship with other columns in the table. By the way, everything else IS normalized. I liken it to this: If you were going to store a 10 byte chunk, you'd save it packed in a single column of VARBINARY(10). You wouldn't shard it into 10 bytes, store each in a column of VARBINARY(1) and then stitch them together with a foreign key. I mean you could - but it wouldn't make any sense.
YOU as the developer will need to understand how 'monolithic' that array of int's really is.
A separate table would be the most "normalized" way to do this. And it is better in the long run, probably, since you won't have to parse the value of the column to extract each integer.
If you want you could use an XML column to store the data, too.
Sparse columns may be another option for you, too.
If you want to keep it really simple you could just delimit the values: 10;2;44;1
I think since you are talking about sql server that indicates that your app may be a data driven application. If that is the case I would keep definately keep the array in the database as a seperate table with a record for each value. It will be normalized and optimized for retreival. Even if you only have a few values in the array you may need to combine that data with other retreived data that may need to be "joined" with your array values. In which case sql is optimized for by using indexes, foreign keys, etc. (normalized).
That being said, you can always hard code the 10 values in your code and save the round trip to the DB if you don't need to change the values. It depends on how your application works and what this array is going to be used for.
I agree with all the others about the best being a separate normalized table. But if you insist in having it all in the same table don't place the array in one only column. In instead create the 10 columns and store each array value in a different column. It will save you the parsing and update problems.