Sparse Multidimensional Array taking huge space - HashTable better?

Sparse Multidimensional Array taking huge space - HashTable better? - c

Is there a better approach to use of Multidimensional Arrays to compute values to be displayed in a table. Please note each of the dimension of the array is huge but is sparse. Can something like a HashTable be considered?
Output Table after the computation looks like this

This answer is outdated, because the OP added the information, that the data is a sparse matrix
Not really. Maybe a one dimensional array (would save the pointers to the dimensions - but that's err... pointless).
An array is the data structure with the fewest metadata (because there is no metadata at all). So your approach can't be optimized much, if you really need to store all that data in memory.
Any other data structure (tree, linked lists, etc.) would contain extra metadata and would therefore consume more memory.
The only way for you to use less memory is to actually use less memory (by only loading the data into memory you really need and leaving the rest on your hard drive or whatever).
You want to display a table, so maybe you can limit the rows you save in memory to an area slightly bigger than the viewport of your table (so you can scroll through the table fluently). Then you can dynamically compute and overwrite the rows according to the scroll state of your table.

There are a number of different ways to manage memory for a sparse matrix. I would start by defining a struct to hold an individual entry in your matrix
struct sparse_matrix_data{
int i;
int j;
int /* or double or whatever */ value;
};
so that you would store the two indices and the value for each non-zero entry. From there, you need to decide what data structure works best for the computations you need to do: hash table on one or both indices, array of these structs, linked list, ...
Note that this will only decrease the memory required if the additional memory required to store the indices is less than the memory you used to store the zeros in your original multidimensional array.

Related

Array VS single linked list VS double link list

I am learning about arrays, single linked list and double linked list now a days and this question came that
" What is the best option between these three data structures when it comes to fast searching, less memory, easily insertion and updating of things "
As far I know array cannot be the answer because it has fixed size. If we want to insert a new thing. it wouldn't always be possible. Double linked list can do the task but there will be two pointers needed for each node so there will be memory problem, so I think single linked list will fulfill all given requirements. Am I right? Please correct me if I am missing any point. There is also one more question that instead of choosing one of them, can I make combination of one or more data structures given here to meet all the requirements?

"What is the best option between these three data structures when it comes to fast searching, less memory, easily insertion and updating of things".
As far as I can tell Arrays serve the purpose.
Fast search: You could do binary search if array is sorted. You dont get that option in linkedlist
Less memory: Arrays will take least memory (but contiguous memory )
Insertion: Inserting in array is a matter of a[i] = "value". If array size is exceeded then simply export data into a new array. That is exactly how HashMaps / ArrayLists work under covers.
Updating things: Only Arrays provide you with Random access. a[i] ="new value".. updated in O(1) time if you know the index.

Each of those has its own benefits and downsides.
For search speed, I'd say arrays are better suitable due to the quick lookup times.
Since an array is a sequence of same-size elements, retrieving the value at an index is just memoryLocation + index * elementSize. For a linked list, the whole list needs traversing.
Arrays also win in the "less memory" category, since there's no need to store extra pointers.
For insertions, arrays are slow. You'll need to traverse the array, copy contents to a new array, assign the new array, delete the old one...
Insertions go much quicker in linked- or double lists, because it's just a matter of changing one or two pointers.
In the end, it all just depends on the use case. Are you inserting a lot? Then you probably want to consider a non-array structure.
Do you need many quick lookups? Consider those arrays again. Etc..
See also this question.

A linked list is usually the best choice when we don’t know in advance the number of elements we will have to store or the number can change dynamically.
Arrays have slow insertion and deletion times. To insert an element to the front or middle of the array, the first step is to ensure that there is space in the array for the new element, otherwise, the array needs to be RESIZED. This is an expensive operation. The next step is to open space for the new element by shifting every element after the desired index. Likewise, for deletion, shifting is required after removing an element. This implies that insertion time for arrays is Big O of n (O(n)) as n elements must be shifted.
Using static arrays, we can save some extra memory in
comparison to linked lists because we do not need to store pointers to the next node
a doubly-linked list support fast insertion/removal at their ends. This is used in LRU cache, where you need to enter new item to front and remove the oldest item from the end.

Multi dimensional array with varying size

I want to make a 2D array "data" with the following dimensions: data(T,N)
T is a constant and N I dont know anything about to begin with. Is it possible to do something like this in fortran
do i = 1, T
check a few flags
if (all flags ok)
c = c+ 1
data(i,c) = some value
end if
end do
Basically I have no idea about the second dimension. Depending on some flags, if those flags are fine, I want to keep adding more elements to the array.
How can I do this?

There are several possible solutions. You could make data an allocatable array and guess the maximum value for N. As long as you don't excess N, you keep adding data items. If a new item would exceed the array size, you create a temporary array, copy data to the temporary array, deallocate data and reallocate with a larger dimension.
Another design choice would be to use a linked list. This is more flexible in that the length is indefinite. You loss "random access" in that the list is chained rather than indexed. You create an user defined type that contains various data, e.g., scalers, arrays, whatever, and also a pointer. When you add a list item, the pointer points to that next item. The is possible in Fortran >=90 since pointers are supported.
I suggest searching the web or reading a book about these data structures.

Assuming what you wrote is more-or-less how your code really goes, then you assuredly do know one thing: N cannot be greater than T. You would not have to change your do-loop, but you will definitely need to initialize data before the loop.

Why are linked lists faster than arrays?

I am very puzzled about this. Everywhere there is written "linked lists are faster than arrays" but no one makes the effort to say WHY. Using plain logic I can't understand how a linked list can be faster. In an array all cells are next to each other so as long as you know the size of each cell it's easy to reach one cell instantly. For example if there is a list of 10 integers and I want to get the value in the fourth cell then I just go directly to the start of the array+24 bytes and read 8 bytes from there.
In the other hand when you have a linked list and you want to get the element in the fourth place then you have to start from the beginning or end of the list(depending on if it's a single or double list) and go from one node to the other until you find what you're looking for.
So how the heck can going step by step be faster than going directly to an element?

This question title is misleading.
It asserts that linked lists are faster than arrays without limiting the scope well. There are a number of times when arrays can be significantly faster and there are a number of times when a linked list can be significantly faster: the particular case of linked lists "being faster" does not appear to be supported.
There are two things to consider:
The theoretical bounds of linked-lists vs. arrays in a particular operation; and
the real-world implementation and usage pattern including cache-locality and allocations.
As far as the access of an indexed element: The operation is O(1) in an array and as pointed out, is very fast (just an offset). The operation is O(k) in a linked list (where k is the index and may always be << n, depending) but if the linked list is already being traversed then this is O(1) per step which is "the same" as an array. If an array traversal (for(i=0;i<len;i++) is faster (or slower) depends upon particular implementation/language/run-time.
However, if there is a specific case where the array is not faster for either of the above operations (seek or traversal), it would be interesting to see to be dissected in more detail. (I am sure it is possible to find a language with a very degenerate implementation of arrays over lists cough Haskell cough)
Happy coding.
My simple usage summary: Arrays are good for indexed access and operations which involve swapping elements. The non-amortized re-size operation and extra slack (if required), however, may be rather costly. Linked lists amortize the re-sizing (and trade slack for a "pointer" per-cell) and can often excel at operations like "chopping out or inserting a bunch of elements". In the end they are different data-structures and should be treated as such.

Like most problems in programming, context is everything. You need to think about the expected access patterns of your data, and then design your storage system appropriately. If you insert something once, and then access it 1,000,000 times, then who cares what the insert cost is? On the other hand, if you insert/delete as often as you read, then those costs drive the decision.

Depends on which operation you are referring to. Adding or removing elements is a lot faster in a linked list than in an array.
Iterating sequentially over the list one by one is more or less the same speed in a linked list and an array.
Getting one specific element in the middle is a lot faster in an array.
And the array might waste space, because very often when expanding the array, more elements are allocated than needed at that point in time (think ArrayList in Java).
So you need to choose your data structure depending on what you want to do:
many insertions and iterating sequentially --> use a LinkedList
random access and ideally a predefined size --> use an array

Because no memory is moved when insertion is made in the middle of the array.
For the case you presented, its true - arrays are faster, you need arithmetic only to go from one element to another. Linked list require indirection and fragments memory.
The key is to know what structure to use and when.

Linked lists are preferable over arrays when:
a) you need constant-time insertions/deletions from the list (such as in real-time computing where time predictability is absolutely critical)
b) you don't know how many items will be in the list. With arrays, you may need to re-declare and copy memory if the array grows too big
c) you don't need random access to any elements
d) you want to be able to insert items in the middle of the list (such as a priority queue)
Arrays are preferable when:
a) you need indexed/random access to elements
b) you know the number of elements in the array ahead of time so that you can allocate the correct amount of memory for the array
c) you need speed when iterating through all the elements in sequence. You can use pointer math on the array to access each element, whereas you need to lookup the node based on the pointer for each element in linked list, which may result in page faults which may result in performance hits.
d) memory is a concern. Filled arrays take up less memory than linked lists. Each element in the array is just the data. Each linked list node requires the data as well as one (or more) pointers to the other elements in the linked list.
Array Lists (like those in .Net) give you the benefits of arrays, but dynamically allocate resources for you so that you don't need to worry too much about list size and you can delete items at any index without any effort or re-shuffling elements around. Performance-wise, arraylists are slower than raw arrays.
Reference:
Lamar answer
https://stackoverflow.com/a/393578/6249148

LinkedList is Node-based meaning that data is randomly placed in memory and is linked together by nodes (objects that point to another, rather than being next to one another)
Array is a set of similar data objects stored in sequential memory locations
The advantage of a linked list is that data doesn’t have to be sequential in memory. When you add/remove an element, you are simply changing the pointer of a node to point to a different node, not actually moving elements around. If you don’t have to add elements towards the end of the list, then accessing data is faster, due to iterating over less elements. However there are variations to the LinkedList such as a DoublyLinkedList which point to previous and next nodes.
The advantage of an array is that yes you can access any element O(1) time if you know the index, but if you don’t know the index, then you will have to iterate over the data.
The down side of an array is the fact that its data is stored sequentially in memory. If you want to insert an element at index 1, then you have to move every single element to the right. Also, the array has to keep resizing itself as it grows, basically copying itself in order to make a new array with a larger capacity. If you want to remove an element in the begging, then you will have to move all the elements to left.
Arrays are good when you know the index, but are costly as they grow.
The reason why people talk highly about linked lists is because the most useful and efficient data structures are node based.

Sparse Array in C! How accomplish it? Can I alloc only parts of an array?

The first question is: "How I do a simple sparse array in C (with one dimension only)?" {with my own hands, without libraries.}
And the last one: "Can I allocate only parts of an array?"
like *array;
then use malloc to allocate some mem for this;
so, We free the index that we don't want.
Can I do it?
Thanks so much!

No, you can't do it.
What you can do is to allocate blocks, but you need to design it carefully.
Probably the best optimization is to use ranges of cell. So you can use a linked list (or a map) of available ranges:
struct SparseBlock
{
void *blockData;
int beginIndex;
int endIndex;
struct SparseBlock *next;
}
obviously if endIndex - beginIndex = 0 you have a single cell (that is isolated inside the array), otherwise you have got a block of cells, allowing you to allocate the right amount of memory for it.
This approach is simple for immutable sparse vectors, otherwise you should take care of
restructuring the blocks whenever a hole is filled or generated
just store single cells
In addition you have to decide how to index these blocks, you can keep them ordered in a linked list, or you can use a map to have a constant O(1) time to retrieve a n-th block (of course you will have to insert many equal keys for the same block if it's a range or reduce the index to the nearest lower index available).
Solutions are many, just express your creativity! :)

It is not uncommon to implement these in linked structures of one kind or another. In one dimension you can simple generate a linked list of occupied regions, and I've discussed a two dimensional implementation in another context before.
You do lose O(1) access time this way, but the win on space can be considerable if the structure really is sparse.

Trees: Linked Lists vs Arrays (Efficiency)

This is an assignment question that I am having trouble wording an answer to.
"Suppose a tree may have up to k children per node. Let v be the average number of children per node. For what value(s) of v is it more efficient (in terms of space used) to store the child nodes in a linked list versus storage in an array? Why?"
I believe I can answer the "why?" more or less in plain English -- it will be more efficient to use the linked list because rather than having a bunch of empty nodes (ie empty indexes in the array if your average is lower than the max) taking up memory you only alloc space for a node in a linked list when you're actually filling in a value.
So if you've got an average of 6 children when your maximum is 200, the array will be creating space for all 200 children of each node when the tree is created, but the linked list will only alloc space for the nodes as needed. So, with the linked list, space used will be approximately(?) the average; with the array, spaced used will be the max.
...I don't see when it would ever be more efficient to use the array. Is this a trick question? Do I have to take into account the fact that the array needs to have a limit on total number of nodes when it's created?

For many commonly used languages, the array will require allocating storage k memory addresses (of the data). A singly-linked list will require 2 addresses per node (data & next). A doubly-linked list would require 3 addresses per node.
Let n be the actual number of children of a particular node A:
The array uses k memory addresses
The singly-linked list uses 2n addresses
The doubly-linked list uses 3n addresses
The value k allows you to determine if 2n or 3n addresses will average to a gain or loss compared to simply storing the addresses in an array.

...I don't see when it would ever be more efficient to use the array. Is this a trick question?
It’s not a trick question. Think of the memory overhead that a linked list has. How is a linked list implemented (vs. an array)?
Also (though this is beyond the scope of the question!), space consumption isn’t the only deciding factor in practice. Caching plays an important role in modern CPUs and storing the individual child nodes in an array instead of a linked list can improve the cache locality (and consequently the tree’s performance) drastically.

Arrays must pre-allocate space but we can use them to access very fast any entry.
Lists allocate memory whenever they create a new node, and that isn't ideal because
memory allocation costs on cpu time.
Tip: you can allocate the whole array at once, if you want to, but usually we allocate,
lets say, 4 entries and we resize it by doubling the size when we need more space.

I can imagine it could be a very good idea and very efficient in many scenarios to use a LinkedList if the data item one uses has intrinsic logic for a previous and next item anyway, for example a TimeTableItem or anything that is somehow time-related. These should implement an interface, so the LinkedList implementation can leverage that and doesn't have to wrap the items into its own node objects. Inserting and removing would be much more efficient here than using a List implementation which internally juggles arrays around.

You're assuming the array can't be dynamically re-allocated. If it can, the array wins, because it need be no bigger than k items (plus a constant overhead), and because it doesn't need per-item storage for pointers.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight