Is it faster iterating columnwise than linewise over an n-m-Array? - arrays

I'll have this LABVIEW-program, where I have to iterate over large arrays (not queues) and thus I'm interested to speed them up the best as possible.
I think I've heard for OpenCV, when reading an element, the page where this element is extracted from, contains the following column elements. That means if I'd iterate by the lines for every element I'd have to load again a new page, which obviously slows down the whole process.
Does this apply to LABVIEW programs too?
Thanks for the support and kind regards

I benchmarked this.
I have 100000x5 2D array. By iterating rows first it takes some 9ms from my i7 processor to complete. Iterating by columns first takes some 35ms to complete.

LabVIEW is row-major. If you take a 2D array and wire it to the border of a For Loop for auto indexing, the 1D arrays that you get out are the rows. Wire that into a nested For Loop to process the individual elements.

In addition to row-then-column iteration, there are two techniques you can apply to maximize your array processing:
Pipelining - which helps maximize core utilization for sequential tasks
Parallel For loops - which provide data-parallelism
After that, there are other more complex designs like structured grids. There is an NI white paper that describes multi-core programming in LabVIEW, including these and other approaches, in more detail.

Related

Compare an array of dictionaries with it self for similarities

I use core data to store a JSON file. The core data is an array of dictionaries, [TestMO], and one of these dictionaries is an array of keywords (not a standard .count - maybe 3, 5, 7 etc). So what I am trying to do is to to compare the entire database with it self to find the objects, TestMO's, which have a similar (or > 50%) matching keywords. I tried a loop inside a loop but is just too time consuming and a terrible user experience. Any ideas how I can achieve this efficiently? Thank you.
Use your knowledge to reduce the complexity
If your array has n elements and you want to compare each element with every other element, you end up with n*(n-1)/2 comparisons. For n=10 you get 45 comparisons, for n=100 you get 4950, for n=1000 half a million, for n=1000000 half a trillion. Your complexity grows quadratically, with O(n2).
You will need to use your statistical knowledge on your array and how your analysis is used to beat this complexity. For instance, if your n is relatively small and you need to run your analysis only once, don't bother optimizing and just let it run for a night.
If you want to run your analysis every time a user adds another element, you may want to compare just this new element to all the other n elements, at a complexity of only O(n).
To further optimize, you may want to establish an index, say a dictionary that associates a set of elements to each keyword. Establishing the index will still be time-consuming and memory-intensive, on the order of O(n*m) if every keyword occurs on average m times on the existing elements. Depending on what you want to analyse, comparing a new element with k keywords, you might be able to get a complexity of the order of O(m*k) for adding that new element. If m and k are much smaller than n, that might reduce your waiting time significantly.
This question has nothing to do with swift nor core-data, but with computational complexity and time-complexity in particular. Please add the latter tag to your question.

Large matrices with a certain structure: how can one define where memory allocation is not needed?

Is there a way to create a 3D array for which only certain elements are defined, while the rest does not take up memory?
Context: I am running Monte-Carlo simulations in which I want to solve 10^5 matrices. All of these matrices have a majority of elements that are zero, for which I wouldn't need to use 8 bytes of memory per element. These elements are the same for all matrices. For simplicity, I have combined all of these matrices into a 3D array, but if my matrices start to become too large, I encounter memory issues (since at matrix dimensions of 100*100*100000, the array already takes up 8 GB of memory).
One workaround would be to store every matrix element with its 10^6 iterations in a vector, that way, no additional information needs to be stored. The inconvenience is that then I would need to work with more than 50 different vectors, and I prefer working with arrays.
Is there any way to tell R that some matrix elements don't need information?
I have been thinking that defining a new class could help for this, but since I have just discovered classes, I am not sure what all the options are. Do you think this could be a good approach? Are there specific things I should keep in mind?
I also know that there are packages made to deal with memory problems, but that did not seem like the quickest solution in terms of human and computation effort for this specific problem.

What are the advantages and disadvantages of 3d array in Mathematica

Edited...
Thanks for every one to try to help me!!!
i am trying to make a Finite Element Analysis in Mathemetica.... We can obtain all the local stiffness matrices that has 8x8 dimensions. I mean there are 2000 matrices they are similar but not same. every local stiffness matrix shown like a function that name is KK. For example KK[1] is first element local stiffness matrix
i am trying to assemble all the local matrices to make global stiffness matrix. To make it easy:
Do[K[e][i][j]=KK[[e]][[i]][[j]],{e,2000},{i,8},{j,8}]....edited
Here is my question.... this equality can affect the analysis time...If yes what can i do to improve this...
in matlab this is named as 3d array but i don't know what is called in Mathematica
what are the advantages and disadvantages of this explanation type in Mathematica...is t faster or is it easy way
Thanks for your help...
It is difficult to understand what your question is, so you might want to reformulate it.
As others have mentioned, there is no advantage to be expected from a switch from a 3D array to DownValues or SubValues. In fact you will then move from accessing data-structures to pattern matching, which is powerful and the real strength of Mathematica but not very efficient for what you plan to do, so I would strongly suggest to stay in the realm of ordinary arrays.
There is another thing that might not be clear for someone more familiar with matlab than with Mathematica: In Mathematica the "default" for arrays behave a lot like cell arrays in matlab: each entry can contain arbitrary content and they don't need to be rectangular (as High Performance Mark has mentioned they are just expressions with a head List and can roughly be compared to matlab cell arrays). But if such a nested list is a rectangular array and every element of it is of the same type such arrays can be converted to so called PackedArrays. PackedArrays are much more memory efficient and will also speed up many calculations, they behave in many respect like regular ("not-cell") arrays in matlab. This conversion is often done implicitly from functions like Table, which will oten return a packed array automatically. But if you are interested in efficiency it is a good idea to check with Developer`PackedArrayQ and convert explicitly with Developer`ToPackedArray if necessary. If you are working with PackedArrays speed and memory efficiency of many operations are much better and usually comparable to verctorized operations on normal matlab arrays. Unfortunately it can happen that packed arrays get "unpacked" by some operations, so if calculations become slow it is usually a good idea to check if that has happend.
Neither "normal" arrays nor PackedArrays are restricted in the rank (called Depth in Mathematica) they can have, so you can of course create and use "3D arrays" just as you can in matlab. I have never experienced or would know of any efficiency penalties when doing so.
It probably is of interest that newer versions of Mathematica (>= 10) bring the finite element method as one of the solver methods for NDSolve, so if you are not doing this as an exercise you might want to have a look what is available already, there is quite excessive documentation about it.
A final remark is that you can instead of kk[[e]][[i]][[j]] use the much more readable form kk[[e,i,j]] which is also easier and less error prone to type...
extended comment i guess, but
KK[e][[i]][[j]]
is not the (e,i,j) element of a "3d array". Note the single
brackets on the e. When you use the single brackets you are not denoting an array or list element but a DownValue, which is quite different from a list element.
If you do for example,
f[1]=0
f[2]=2
...
the resulting f appears similar to an array, but is actually more akin to an overloaded function in some other language. It is convenient because the indices need not be contiguous or even integers, but there is a significant performance drawback if you ever want to operate on the structure as a list.
Your 'do' loop example would almost certainly be better written as:
kk = Table[ k[e][i][j] ,{e,2000},{i,8},{j,8} ]
( Your loop wont even work as-is unless you previously "initialized" each of the kk[e] as an 8x8 array. )
Note now the list elements are all double bracketed, ie kk[[e]][[i]][[j]] or kk[[e,i,j]]

How to implement a huge matrix in C

I'm writing a program for a numerical simulation in C. Part of the simulation are spatially fixed nodes that have some float value to each other node. It is like a directed graph. However, if two nodes are too far away, (farther than some cut-off length a) this value is 0.
To represent all these "correlations" or float values, I tried to use a 2D array, but since I have 100.000 and more nodes, that would correspond to 40GB memory or so.
Now, I am trying to think of different solustions for that problem. I don't want to save all these values on the harddisk. I also don't want to calculate them on the fly. One idea was some sort of sparse matrix, like the one one can use in Matlab.
Do you have any other ideas, how to store these values?
I am new to C, so please don't expect too much experience.
Thanks and best regards,
Jan Oliver
How many nodes, on average, are within the cutoff distance for a given node determines your memory requirement and tells you whether you need to page to disk. The solution taking the least memory is probably a hash table that maps a pair of nodes to a distance. Since the distance is the same each way, you only need to enter it into the hash table once for the pair -- put the two node numbers in numerical order and then combine them to form a hash key. You could use the Posix hsearch/hcreate/hdestroy functions for the hash table, although they are less than ideal.
A sparse matrix approach sounds ideal for this. The Wikipedia article on sparse matrices discusses several approaches to implementation.
A sparse adjacency matrix is one idea, or you could use an adjacency list, allowing your to only store the edges which are closer than your cutoff value.
You could also hold a list for each node, which contains the other nodes this node is related to. You would then have an overall number of list entries of 2*k, where k is the number of non-zero values in the virtual matrix.
Implementing the whole system as a combination of hashes/sets/maps is still expected to be acceptable with regard to speed/performance compared to a "real" matrix allowing random access.
edit: This solution is one possible form of an implementation of a sparse matrix. (See also Jim Balter's note below. Thank you, Jim.)
You should indeed use sparse matrices if possible. In scipy, we have support for sparse matrices, so that you can play in python, although to be honest sparse support still has rough edges.
If you have access to matlab, it will definitely be better ATM.
Without using sparse matrix, you could think about using memap-based arrays so that you don't need 40 Gb of RAM, but it will still be slow, and only really make sense if you have a low degree of sparsity (say if 10-20 % of your 100000x100000 matrix has items in it, then full arrays will actually be faster and maybe even take less space than sparse matrices).

Vector.<> vs array

What are the pros and contras of using a Vector.<> instead of array?
From the adobe documentation page:
As a result of its restrictions, a Vector has two primary benefits over an Array instance whose elements are all instances of a single class:
Performance: array element access and iteration are much faster when using a Vector instance than when using an Array.
Type safety: in strict mode the compiler can identify data type errors such as assigning a value of the incorrect data type to a Vector or expecting the wrong data type when reading a value from a Vector. Note, however,
that when using the push() method or unshift() method to add values to a Vector, the arguments' data types are not checked at compile time but are checked at run time.
Pro: Vector is faster than Array - e.g. see this: Faster JPEG Encoding with Flash Player 10
Contra: Vector requires FP10, and according to http://riastats.com/ some 20% of users are still using FP9
Vectors are faster. Although for sequential iteration the fastest thing seems to be linked-lists.
Vectors can also be useful for bitmap operations (check out BitmapData.setVector, also BitmapData.lock and unlock).
The linked list example mentioned earlier in comments is incorrectly written though it skips odd nodes and because of that only iterates the half amount of the same data. No wonder he get so great results, might be faster with correct code as well, but not the same % difference. The loop sets current = current.next one time too much (both in the loop and as loop-condition) each iteration which cause that behavior.
According flash player penetration website it is a little higher. Around the 85%
This is the source

Resources