My idea is to create a 2 stage pipeline with global time clock (100 cycles). The two stages are represented as two functions and are the following: The first one generates 2D random matrix - once generated this data is moved to stage two, and immediately stage 1 starts generating new data. Meanwhile stage 2 sums the 2D matrix and provides an output. This movement/computation is repeated for the 100 cycles.
Do I utilize a local iteration of 100 cycles for each function. I don't want to use a fork/pipe option.
I have used the following but it generates sequentially:
for(i=0;i<100;i++){
stage_one();
stage_two();
}
The other option is to locally do the loop for each stage and use data queue to move between functions.
Can someone introduce me to a resource I can read/test how to do it. I thank you very much for your help.
Related
I have a row vector q with 200 elements, and another row vector, dij, which is the output of the pdist function with currently 48216200 elements, but I'd like to be able to go higher. The operation I want to do is essentially:
t=sum(q'*dij,2);
However, since this tries to allocate a 200x48211290 array, it complains that this would require 70GB of memory. Therefore I do it this way:
t = zeros(numel(q),1);
for i=1:numel(q)
qi = q(i);
factor = qi*dij;
t(i)=sum(factor);
end
However, this takes too much time. By too much time, I mean it takes about 36s, which is orders of magnitude longer than the time required by the pdist function. Is there a way I can speed up this operation without explicitly allocating so much memory? I'm assuming here, that if the first way could allocate the memory, (being a vector operation) it would be faster.
Just use the distributive property of multiplication with respect to addition:
t = q'*sum(dij);
for testing what Cris said in the first post comment I created 3 ".m" files as follows:
vec.m :
res=sum(sin(d.*q')./(d.*q'));
forloop.m
for i=1:200
res(i)=sum(sin(d.*q(i))./(d.*q(i)));
end
and test.m:
clc
clear all
d=rand(4e6,1);
q=rand(200,1);
res=zeros(1,200);
forloop;
vec;
forloop;
vec;
forloop;
vec;
then I used matlab run and time profiler ,
the results were very surprising ! :
3 calls to forloop : ~10.5 S
3 call to vec : 15.5 S !!!
and additionally when I converted data to single the results were:
... forloop : 7.5 S
... vec : 8.5 S
I don't know precisely why for-loop is faster in these scenarios, but as for your problem, you could speed up things by generating lesser variables in the loop and using vertical vectors( i think). and finally converting your data to single values :
q=single(rand(200,1));
...
I am trying to generate permutations of strings of length upto 50. This means 50! (50 factorial strings = approx 3.041 * 10^64). I have the following parallel algorithm in mind, considering its not possible to use a single virtual machine or process. I plan to hive of computations to a number of virtual machines:
Say input string is "abcdef"
1) Have a function that removes a request from a common global pool and processes it. A request object consists of two variables: ancestorSequence, remainingCollection
2) Once function is invoked, do the following:
2a) if remainingCollection has two characters then print this list:
[ancestorSequence+remainingCollection[0]+remainingCollection[1], ancestorSequence+remainingCollection[1]+remainingCollection[0]
otherwise do the following
2b) Create the following requests and add it to the common global pool:
2b1) (ancestorSequence+remainingCollection[0],remainingCollection without remainingCollection[0])
2b2) (ancestorSequence+remainingCollection[1],remainingCollection without remainingCollection[1])
2b3) (ancestorSequence+remainingCollection[2],remainingCollection without remainingCollection[2])
and so on until
2bn) (ancestorSequence+remainingCollection[n-1],remainingCollection without remainingCollection[n-1]
We can always optimize by having each function accept multiple such requests. Please let me know your thoughts on this.
I'll have this LABVIEW-program, where I have to iterate over large arrays (not queues) and thus I'm interested to speed them up the best as possible.
I think I've heard for OpenCV, when reading an element, the page where this element is extracted from, contains the following column elements. That means if I'd iterate by the lines for every element I'd have to load again a new page, which obviously slows down the whole process.
Does this apply to LABVIEW programs too?
Thanks for the support and kind regards
I benchmarked this.
I have 100000x5 2D array. By iterating rows first it takes some 9ms from my i7 processor to complete. Iterating by columns first takes some 35ms to complete.
LabVIEW is row-major. If you take a 2D array and wire it to the border of a For Loop for auto indexing, the 1D arrays that you get out are the rows. Wire that into a nested For Loop to process the individual elements.
In addition to row-then-column iteration, there are two techniques you can apply to maximize your array processing:
Pipelining - which helps maximize core utilization for sequential tasks
Parallel For loops - which provide data-parallelism
After that, there are other more complex designs like structured grids. There is an NI white paper that describes multi-core programming in LabVIEW, including these and other approaches, in more detail.
I have two dimensions - Invoice_In and Invoice_Out. I need to create a new dimension Invoice which combines both of these. Is there any easy way to do this with a TI process (or any other way using TI or Performance Modeler)? Thanks.
Have you consulted the Reference Guide (TM1 TurboIntegrator Functions chapter) about this?
You could use the All subsets of the two dimensions as a data source and iterate through both in the Metadata tab using two processes (or a master process which calls the same process and passes it parameters) but it would be just as easy (and more importantly you could keep it in one process by) doing this in the Prolog tab with a data source of None:
Use DimensionExists as an argument to an If() block to determine
whether the dimension Invoice exists;
If it doesn't, use DimensionCreate to create it. Add any consolidations that you want to add using DimensionElementInsert statements.
Use the DimSiz Rules function to get the number of elements in Invoice_In and Invoice_Out and store both in variables;
Your first loop iterates through InvoiceIn using a While block to count from 1 to the DimSiz value.
In your loop you would obtain the existing element using DimNm(). (You will also need to use ElLev or DType if you want to obtain only the N level elements.) You insert each element into Invoice through DimensionElementInsert. You may also need to use DimensionElementComponentAdd to add it to any top level consolidation.
Your second loop would do exactly the same but for Invoice_Out.
Where you may run into issues is if you have the same element names in both dimensions. DimensionElementInsert won't spit the dummy over that but it will ignore the insertion when it's encountered the second time.
Do NOT call any other processes which are intended to refer to this new dimension in the Prolog. You need to cross the Metadata boundary to ensure that the new dimension is registered with the server.
Export both Elements, copy and paste both list into one sheet.
Use the sheet as a source then use one line of code DimensionElementInsert in your TI.
DimensionElementInsert(DimName, InsertionPoint, ElName, ElType);
Alternatively, use the existing dimensions as a source. Then you don't need to construct a file.
You can set the datasourcename and cycle through N amount of dimensions.
(note: The new dimension needs to exist. Or you can create a new dimension within your TI. Depends how much you want to code. But I gave you the solution with the least coding).
Link to the vi: see xy_plot_problem_withcase
In the attached vi (xy_plot_problem_updated.vi) I am able to get 3 individual values x, y and z in an array, element 0 being x, element 1 being y and element 2 being z.
These three values come for every iteration of the outer while loop. I would like to store all generated x values into one array and same with y and z so I can use the final arrays to generate one final graph.
The outer while loop runs 30 times and I would like to record the 30 different values generated at index 0 in a separate array. I tried using a shift register, build array etc but its just replacing element 1 (of the new array) with the newest element generated (They are not getting accumulated).
I have encountered this problem while designing for a system which records 3 different readings for every 5 degree increase in temperature. I want to be able to plot the acquired values against the current temperature. Hence, the outer while loop is actually a case statement which gets triggered every time the temperature goes up by 5 degrees.
I have also attached the main VI too alongside (final.vi).
Any help appreciated!!
Thanks in advance!!!
In your final.vi you have a while loop, you should move everything in the case into the while loop. My advise for you would be to look at the LabVIEW fundamentals on data flow and on shift registers.
In your code you were resetting the shift register in the while loop every iteration.
Try to clean up your code and use the executing highlighting function (the light bulb).