I am looking for a good solution to represent a 2-D array in a database. With a few catches of course. For instance, the array could grow in terms of columns and rows (which could be inserted at any point). Also important is that people (users) can manipulate specific cells in this array (and hopefully without having to update the entire array). Also there may be multiple arrays to store.
I have thought about using a json but that would require always writing the entire json to database when an update is made on a a specific cell which would not be idea especially when multiple people could manipulate the array at once.
The typical data structure would be three columns:
row
column
value
Of course, you might have other columns if you have a separate array for each user (say a userid or arrayid column). Or, you might have multiple values for each cell.
Related
I'm trying to make my spreadsheet as 'dynamic' as possible as I need to create a number of tables with different variables that influence the data in them... the variable columns will be the same for every table created, but the data within the columns may vary.
I noticed that wild cards in the cells themselves are helpful to a point. Where there's a SUMIF, COUNTIF, these are awesome for solving my problem... but if it's an embedded IF statement, the logical test can't equal the cell with the wildcard in without causing errors.
Ultimately I will have four data tabs and an abundance of different tables based on the variables. I would love to do this in pivot tables that would absolve me of this issue, however I can't figure out how to do percentiles in the pivots :)
For the Level column to calculate properly, I need to change the <> to = manually... which I am hoping to avoid if possible given the number of these I need to create
I have created a spreadsheet userform that allows users to input information, do surface level data verification, and other things. The problem that I run into is because part of it involves adding all items involved in their submission (the input this is used for could be related to one or more items, sometimes in the higher 30's) I had to create an array and have the array write to the new sheet. This is terribly slow.
I've been learning about classes and thinking that they may be able to provide a sufficient alternative to an array since each item has the exact same information needed. (QTY, UOM, Type, etc) I was thinking of doing 2 classes, one where I have the member class for the item and another where it's the collection and it composes the members objects.
My question is if the performance of doing this would be better than using an array. Are arrays generally the better way of handling collections of data that need to be inputted and outputted to a sheet?
For data assurance, my task is two compare two datasets from different databases. Currently I am performing the cell-by-cell value comparison which is a Brute Force method and is consuming lot of time.
I would like to know if there are any methods which would save my time and memory space, which is able to provide a result indication "Tables are identical" or "Tables are not identical".
Thank you for your assistance.
How about creating a checksum for each table and compare the them?
Something like:
SELECT CHECKSUM_AGG(CHECKSUM(*)) FROM TableX
This might need a ORDER BY to be more precise.
If they are from different sources, there is no other way than comparing them cell by cell AFAIK. However I can suggest you something that will probably increasing comparison speed by many folds. If your DataTables have identical structures, which they hopefully should since you're already comparing them cell by cell, try comparing ItemArray of each pair of rows instead of accessing them by column index or column names (or row properties if you're using strongly-typed DataSets). This will hopefully give you much better results.
If you're using .NET 3.5 or above, this line should do it:
Enumerable.SequenceEqual(row1.ItemArray, row2.ItemArray);
I have two database in excel being extracted not in matlab. How can I compare them within matlab? Is the right way, storing them in a structure ? how I can find the similarities and differences?
Each database has 4 columns, and database has around 400 rows of data.
If you are interested in the structure, you could create a struct for each database where each fieldname is equal to a column name.
In this case one can use visdiff.
However if you are going to compare a lot of numbers this is not practical.
To confirm that they are equal, one can use something like isequal
To see how they are different, plot them or plot the difference of them.
To see whether the data behaves differently calculate some basic statistics like max, min, mean, std, you may also be interested in the correlation between columns of the two datasets.
Context: SQL Server 2008, C#
I have an array of integers (0-10 elements). Data doesn't change often, but is retrieved often.
I could create a separate table to store the numbers, but for some reason it feels like that wouldn't be optimal.
Question #1: Should I store my array in a separate table? Please give reasons for one way or the other.
Question #2: (regardless of what the answer to Q#1 is), what's the "best" way to store int[] in database field? XML? JSON? CSV?
EDIT:
Some background: numbers being stored are just some coefficients that don't participate in any relationship, and are always used as an array (i.e. never a value is being retrieved or used in isolation).
Separate table, normalized
Not as XML or json , but separate numbers in separate rows
No matter what you think, it's the best way. You can thank me later
The "best" way to store data in a database is the way that is most conducive to the operations that will be performed on it and the one which makes maintenance easiest. It is this later requirement which should lead you to a normalized solution which means storing the integers in a table with a relationship. Beyond being easier to update, it is easier for the next developer that comes after you to understand what and how the information is stored.
Store it as a JSON array but know that all accesses will now be for the entire array - no individual read/writes to specific coefficients.
In our case, we're storing them as a json array. Like your case, there is no relationship between individual array numbers - the array only make sense as a unit and as a unit it DOES has a relationship with other columns in the table. By the way, everything else IS normalized. I liken it to this: If you were going to store a 10 byte chunk, you'd save it packed in a single column of VARBINARY(10). You wouldn't shard it into 10 bytes, store each in a column of VARBINARY(1) and then stitch them together with a foreign key. I mean you could - but it wouldn't make any sense.
YOU as the developer will need to understand how 'monolithic' that array of int's really is.
A separate table would be the most "normalized" way to do this. And it is better in the long run, probably, since you won't have to parse the value of the column to extract each integer.
If you want you could use an XML column to store the data, too.
Sparse columns may be another option for you, too.
If you want to keep it really simple you could just delimit the values: 10;2;44;1
I think since you are talking about sql server that indicates that your app may be a data driven application. If that is the case I would keep definately keep the array in the database as a seperate table with a record for each value. It will be normalized and optimized for retreival. Even if you only have a few values in the array you may need to combine that data with other retreived data that may need to be "joined" with your array values. In which case sql is optimized for by using indexes, foreign keys, etc. (normalized).
That being said, you can always hard code the 10 values in your code and save the round trip to the DB if you don't need to change the values. It depends on how your application works and what this array is going to be used for.
I agree with all the others about the best being a separate normalized table. But if you insist in having it all in the same table don't place the array in one only column. In instead create the 10 columns and store each array value in a different column. It will save you the parsing and update problems.