Storing 10,000 lines of data in arrays in VB.net - arrays

I am a complete noob to VB.Net, and have had my share of growing pains. I'm starting to get a handle on what I need to do, though.
The program I am writing needs to take about 500 .csv files, sucks out the info from them line by line, stores the data into about four different arrays, then export the data into one long index.
Each line in the files starts with a code word, and contains between 5 and 20 fields of data. The code word determines how many fields there are, and how the data needs to be stored. If it's Code A, it needs to go into Array A. If it's Index B, it needs to go in Array B and set some variables for Arrays A, B, C, and D. Code C means it goes intoArray C. And so on.
My problem is that I will not know how many lines of data there will be, so using a number of standard arrays. I've got the code figured out so that I have each line of data channeled into the the correct sub. But I am unsure how to STORE the data.
I will need to manipulate/sort the data in Array C, but will be able to just dump data into and suck it out of Index A, B, and D.
Should I use 2D arrays for all the indexes? Would collections work better? If so, which kind of collection would work better?
//Array A= 4 columns per row, unknown number (500) of rows
//Array B= 18 columns of columns, unknown number (10,000+) rows
//Array C= 3 columns, unknown number (2000) of rows, must be able to sort and alter
//Array D= 3 columns, unknown number (1000) rows.
Thank you

In a nutshell:
Should I use 2D arrays for all the indexes?
no.
Would collections work better?
Yes, much better.
If so, which kind of collection would work better?
Generic lists (List(Of T)), where you define objects (classes) with fields the match the columns for each type of record in your csv data, and use those classes as the types for your lists.
For ArrayB, beware the Large Object Heap causing problems with OutOfMemoryExceptions. You may need to keep that mostly on disk.

Related

How to Create a table in C that consists of Arrays of fixed lenght

I am trying to implement a search table that basically consists of arrays of a fixed size. In my case each array would consist of 4 elements (Say for ex letters W,X, Y and Z). I need each element in the array to have a fixed index in the table using which it can be found and accessed by the user. The table would something like this..( the symbol | is used below to show the ending of that particular array)
WXYZ|XYWZ|WXZY|....|.... and so on
Could somebody tell me which is the best way to implement htis? I have heard of linked lists and hash tables but I am not sure if that is the best method to do this..
I can't get it, why don't you just use an ordinary two dimensional array like array[100][4], the user can access through an ordered pair of two indexes.

Saving memory, huge array alternative c programming

I'm using an two arrays (unsigned int) with dimensions: 20000x20000.
I have a lot of empty spacing inside the arrays, many zeros or nulls.
There is something I can do to save memory?, because I'm running out of it.
I tried reading from a list in a file, but it's extremely slow.
I have heard that in other languages they have vectors.
You are looking for a sparse matrix, which basically works by storing entries as a list of (index1, index2, value), and only has entries for nonzero elements.

Array multiplication in Excel

In my excel document I have two sheets. The first is a data set and the second is a matrix of the relationship between two of the variables in my data set. Each possibility of the variable is a column in my matrix. I'm trying to get the sum of the products of the elements in two different arrays. Right now I'm using the formula {=SUM(N3:N20 * F3:F20)} and manually changing the columns each time. But my data set is over 800 items...
Ideally I'd like to know how to write a program that reads the value of the variable in my dataset looks up the correct columns in the matrix, multiplies them together, sums the products, and puts the result in the correct place in my data set. However, just knowing the result for all the possible combinations of columns would also save me alot of time. Its an 18x18 matrix. Thanks for any feedback!
Your question is a little bit ambiguous but as far as i understand your question you want to multiply different sets of two columns in the same sheet and put their result into the next sheet, is it so? if so, please post images of your work (all sheets). Your answer is possible even in Excel only without any vba code, thanks.
you can also use =SUMPRODUCT(N3:N20,F3:F20) for your formula instead of {=SUM(N3:N20 * F3:F20)}

Multi dimensional array with varying size

I want to make a 2D array "data" with the following dimensions: data(T,N)
T is a constant and N I dont know anything about to begin with. Is it possible to do something like this in fortran
do i = 1, T
check a few flags
if (all flags ok)
c = c+ 1
data(i,c) = some value
end if
end do
Basically I have no idea about the second dimension. Depending on some flags, if those flags are fine, I want to keep adding more elements to the array.
How can I do this?
There are several possible solutions. You could make data an allocatable array and guess the maximum value for N. As long as you don't excess N, you keep adding data items. If a new item would exceed the array size, you create a temporary array, copy data to the temporary array, deallocate data and reallocate with a larger dimension.
Another design choice would be to use a linked list. This is more flexible in that the length is indefinite. You loss "random access" in that the list is chained rather than indexed. You create an user defined type that contains various data, e.g., scalers, arrays, whatever, and also a pointer. When you add a list item, the pointer points to that next item. The is possible in Fortran >=90 since pointers are supported.
I suggest searching the web or reading a book about these data structures.
Assuming what you wrote is more-or-less how your code really goes, then you assuredly do know one thing: N cannot be greater than T. You would not have to change your do-loop, but you will definitely need to initialize data before the loop.

Matrices and databases

I went through the topic and found out this link quite useful and simple at the same time.
Storing matrices in a relational database
But can you please let me know if the way mentioned as
A B C D
E F G H
I J K L
[A B C D E F G H I J K L]
is the best and simple or even reliable way of storing the matrix elements in the database. Moreover I need to multiply two matrices and make the operation dynamic. So will the storage of data this create any problems for the task?
In postgresql you can actually have multidimensional arrays, define your own types and define your own functions on those types. For instance one could simply do:
CREATE TABLE tictactoe (
squares integer[3][3]
);
See The PostgreSQL manual for info on how to create your own types.
I think it pretty much depends on how you want to use the matrices in your application.
Is the DB only for persistence for the same application, speed is important, and sizes cannot be known in advance? Make your own serialization scheme, and save the binary blob.
Is the DB for sharing in between applications, with the size not known in advance? Use the comma delimited list.
Are you concerned with data integrity, type safety, and would like to query individual cells? Then use the (row, col, cell value) schema.
Do you know that your matrices are of fixed size and relatively small, for example 4X4 transformation matrices, and will have a 1 to 1 relationship to whatever element you have in the DB? Then you could actually have 16 rows in your table, layed out in line.
Think about your use cases, and experiment!
is the best and simple or even reliable way of storing the matrix elements in the database. Moreover I need to multiply two matrices and make the operation dynamic. So will the storage of data this create any problems for the task?
I'll start by saying both approaches are valid, but the second one is not sufficient as written by you. You have to have some other information, like the length of the rows or the (row, col) indexes of each element to store a matrix as a 1D array. This is commonly done for sparse matricies, where there are lots of zeros surrounding values clustered on either side of the diagonal.
Persisting the matrix in a database and operating on it in memory are two separate things.
Tasks like multiplying require (row, col) indexes. Storing the matrix as a 2D array means that you'll have them, so no other info is needed. The 1D array needs this info too, so you'll have to supply it.
The advantage swings to the 1D array for sparse matricies. You don't have to store zero values outside the bandwidth in that case, but your operations like addition and multiplication become more complex to code.

Resources