Get average of two consecutive values in a vector depending on logical vector - arrays

I am reading data from a file and I am trying to do some manipulation on the vector containing the data basically i want to check if the values come from consecutive lines and if so i want to average each two and put the value in a output vector
part of the data and lines
lines=[153 152 153 154 233 233 234 235 280 279 280 281];
Sail=[ 3 4 3 1.5 3 3 1 2 2.5 5 2.5 2 ];
here is what i am doing
Sail=S(lines);
Y=diff(lines)==1;
for ii=1:length(Y)
if Y(ii)
output(ceil(ii/2))=(Sail(ii)+Sail(ii+1))/2;
end
end
is this correct also is there a way to do that without a for loop
Thanks

My suggestion:
y = find(diff(lines)==1);
output = mean([Sail(y);Sail(y+1)]);
This assumes that when you have, say [233 234 235], you want one value averaging the values from lines [233 234] and one value averaging those from [234 245]. If you wanted to do something more complex when longer sets of consecutive lines exist in your data, then the problem becomes more complex.
Incidentally it's a bad idea to do something like (ceil(ii/2)) - you can't guarantee a unique index for each matching value of ii. If you did want an output the same size as Sail (will have zeros in non-matching areas) then you can do something like this:
output2 = zeros(size(Sail));
output2(y)=output;

Related

How can I plot the last 20 points from a file in gnuplot?

I have a big file in gnuplot and I want to plot them as a gif. My file represents the trajectory of 20 particles. I have tried: do for [a=0:70000:10000] {plot 'posicion.dat' i 0:a u 2:3}. This one sohws the completed trajectory but I only want to show the last point of the trajectory of each particle.
How can I plot the last 20 points from a file in gnuplot?
Thank you!
To my knowledge there is no direct command to plot the last N lines.
If your data doesn't contain double empty lines you could do it with every (check help every).
You could also make a system call (e.g. under Linux using tail) to pass only the last N lines to gnuplot.
However, if you want a platform-independent gnuplot-only solution and if your data consists of lines which are all separated by two blank lines you could do the following:
determine the number of blocks via stats stored in the variable STATS_blocks
plot the last M blocks in a loop (keep in mind: numbering starts from 0)
Check help stats, help for and help index.
However, mind the difference: what is called "blocks" together with every is not the same what is called "blocks" together with stats.
The following example will plot the last 2 lines (blocks).
I hope you can adapt it to your data.
Script:
### plot the last N blocks
reset session
$Data <<EOD
1 10 11
2 20 21
3 30 31
4 40 41
5 50 51
6 60 61
EOD
stats $Data u 0 nooutput
N = STATS_blocks
M = 2 # M last values
set offset 10,10,10,10 # just to get some space to the border
plot for [i=1:M] $Data index N-M+i-1 u 2:3 w lp pt 7 lc i ti sprintf("Particle %d",i)
### end of script
Result:

Alternate for Array Sum Formula

Table copied as Text
Column1 Column2 Column3 Column4 Column5 Column6
A AA AAA 100 95 92
A AA AAA 85 83 81
A AA BBB 200 199 160
A BB AAA 65 55 49
B AA AAA 89 88 83
B AA BBB 150 149 145
B BB AAA 140 135
B BB BBB 190 185
B AA AAA 510
AA
AAA BBB
A 173 160
B 593 145
and some more explanation
Basically i want the sum of "Column 6" for the given criteria but the data in Column 6 can only be entered after some delay w.r.t. Column 1, Column 2, Column 3 & Column 4.
Till Column 6 data is entered, i want excel to use the number available in Column 5 which is also entered after some delay w.r.t. Column 1, Column 2, Column 3 & Column 4 but before Column 6.
And till Column 5 data is entered, i want excel to use the number available in Column 4.
Now I am familiar with two SUM/IF arrangements as included below in post.
First one is array sum/if arrangement which is convenient to write but results in terribly long calculation time with 1.5 seconds for just one column and I have over 100 columns in one sheet and about 9 sheets.
Second one is using SUMIFS which requires extensive time to write but relatively better calculation time of 0.5 seconds for column but is still quite high.
Now I need to do away with the array arrangement but doing so will take quite some time and I want to know if there is any better/other arrangement.
Just let me know other arrangement which can get the required result and I will check the arrangement for calculation timing. If the other arrangement is also convenient to write than that is a plus.
This is my table:
And I want to add the right most columns which are not empty i.e. have a number in it, but with the criteria for the first three columns in cell D15.
I only found option to add image. Please let me know how to upload excel file.
enter image description here
Can somebody please suggest an alternate to this array formula so it can calculate way faster
{=SUM(
IF(
($B$2:$B$10=$C15)*
($C$2:$C$10=$C$13)*
($D$2:$D$10=D$14)>0,
IF(
$G$2:$G$10<>"",
$G$2:$G$10,
IF(
$F$2:$F$10<>"",
$F$2:$F$10,
$E$2:$E$10))))}
I have tried below which reduces the calculation time to 1/3 but it is too much typing for the large data I am dealing with
=SUMIFS(
$G$2:$G$10,
$B$2:$B$10,$C15,
$C$2:$C$10,$C$13,
$D$2:$D$10,H$14,
$G$2:$G$10,"<>"&"")
+SUMIFS(
$F$2:$F$10,
$B$2:$B$10,$C15,
$C$2:$C$10,$C$13,
$D$2:$D$10,H$14,
$G$2:$G$10,"="&"",
$F$2:$F$10,"<>"&"")
+SUMIFS(
$E$2:$E$10,
$B$2:$B$10,$C15,
$C$2:$C$10,$C$13,
$D$2:$D$10,H$14,
$G$2:$G$10,"="&"",
$F$2:$F$10,"="&"")
If you're OK with using a helper column (which you should be), you can use this formula in a helper cell and drag down. (In my example at bottom, this formula is in cell H2 and drag down.)
= INDEX(E2:G2,MATCH(-1E+300,E2:G2,-1))
This gets all of the data in either column 4 5 or 6 all into one column.
Then you can use a simpler SUMIFS formula in cell D15:
= SUMIFS($H$2:$H$10, // Sum range (helper column)
$B$2:$B$10,$C15, // Criteria 1 (A or B)
$C$2:$C$10,$C$13, // Criteria 2 (AA or BB)
$D$2:$D$10,D$14) // Criteria 3 (AAA or BBB)
See below, working example:
DISCLAIMER
This answer will simplify your formulas, but I'm not sure if this will help with the performance problems you are experiencing. SUMIFS in itself I don't see being likely the cause of long calculation times. Probably you are experiencing long calculation times because other parts of your spreadsheet are using inefficient formulas and/or formulas involving volatile cells, but that is just a guess because I have no idea what the rest of your spreadsheet looks like.

saving hashtable using c so that random access is faster

I am writing a C code (call it database generation) processes an input file and generated a number in range [1,10^8] alongwith a sequence of float values whose length is fixed but unknown followed by 3 integers. All values are separated by space
Example:
19432 23.45 32.12 45.76 ...(156 such float values) 4 6 106
This will be one line of database where first number is hash index (one to 10^8) , and last 3 integers denote the x,y coordinated and document ID respectively.
Our database is saved in file xyz which has following content
2341 34.67 43.13 ... (234 such float values) 5 8 123
2352 46.92 41.89 ... (51 such float values) 1 9 145
2352 46.92 41.89 ... (98 such float values) 2 7 12
2359 12.71 72.90 ... (141 such float values) 8 12 13
The starting number (hash index value) will always be in non-decreasing order in database as we proceed from one line to next.
I have another C code (call it retrieval) which takes hash index value as input and should output all lines starting with that value.
I have 2 questions
How can I make sure that retrieval directly jumps to line containing asked hash index value skipping the starting lines of database so that its response is fast.
When I get another input file for database and its hash index value is 2352. How do i add another line starting with 2352 at its proper position in database?
I am considering following approach which is not ideal, as the database won't be organised in required non-decreasing order of hash index values. Also, database is split into 2 components. One contains byte offset entries for each hash index and another is the database file presented above.
It involves
(1)byte-offset.txt of the form
2341 byte-pos-1
2352 byte-pos-2
2359 byte-pos-3
2352 byte-pos-4
(2)database.txt of the form
2341 34.67 43.13 ... (234 such float values) 5 8 123
2352 46.92 41.89 ... (51 such float values) 1 9 145
2359 12.71 72.90 ... (141 such float values) 8 12 13
2352 46.92 41.89 ... (98 such float values) 2 7 12
the only good thing about it is that new entries can be appended to end in each file as database grows when we get more data.

How is an array sliced?

I have some sample code where the array is sliced as follows:
A = X(:,2:300)
What does this mean about the slice of the array?
: stands for 'all' if used by itself and 2:300 gives an array of integers from 2 to 300 with a spacing of 1 (1 is implicit) in MATLAB. 2:300 is the same as 2:1:300 and you can even use any spacing you wish, for example 2:37:300 (result: [2 39 76 113 150 187 224 261 298]) to generate equally spaced numbers.
Your statement says - select every row of the matrix A and columns 2 to 300. Suggested reading

Find all possible row-wise sums in a 2D array

Ideally I'm looking for a c# solution, but any help on the algorithm will do.
I have a 2-dimension array (x,y). The max columns (max x) varies between 2 and 10 but can be determined before the array is actually populated. Max rows (y) is fixed at 5, but each column can have a varying number of values, something like:
1 2 3 4 5 6 7...10
A 1 1 7 9 1 1
B 2 2 5 2 2
C 3 3
D 4
E 5
I need to come up with the total of all possible row-wise sums for the purpose of looking for a specific total. That is, a row-wise total could be the cells A1 + B2 + A3 + B5 + D6 + A7 (any combination of one value from each column).
This process will be repeated several hundred times with different cell values each time, so I'm looking for a somewhat elegant solution (better than what I've been able to come with). Thanks for your help.
The Problem Size
Let's first consider the worst case:
You have 10 columns and 5 (full) rows per column. It should be clear that you will be able to get (with the appropriate number population for each place) up to 5^10 ≅ 10^6 different results (solution space).
For example, the following matrix will give you the worst case for 3 columns:
| 1 10 100 |
| 2 20 200 |
| 3 30 300 |
| 4 40 400 |
| 5 50 500 |
resulting in 5^3=125 different results. Each result is in the form {a1 a2 a3} with ai ∈ {1,5}
It's quite easy to show that such a matrix will always exist for any number n of columns.
Now, to get each numerical result, you will need to do n-1 sums, adding up to a problem size of O(n 5^n). So, that's the worst case and I think nothing can be done about it, because to know the possible results you NEED to effectively perform the sums.
More benign incarnations:
The problem complexity may be cut off in two ways:
Less numbers (i.e. not all columns are full)
Repeated results (i.e. several partial sums give the same result, and you can join them in one thread). Much more in this later.
Let's see a simplified example of the later with two rows:
| 7 6 100 |
| 3 4 200 |
| 1 2 200 |
at first sight you will need to do 2 3^3 sums. But that's not the real case. As you add up the first column you don't get the expected 9 different results, but only 6 ({13,11,9,7,5,3}).
So you don't have to carry your nine results up to the third column, but only 6.
Of course, that is on the expense of deleting the repeating numbers from the list. The "Removal of Repeated Integer Elements" was posted before in SO and I'll not repeat the discussion here, but just cite that doing a mergesort O(m log m) in the list size (m) will remove the duplicates. If you want something easier, a double loop O(m^2) will do.
Anyway, I'll not try to calculate the size of the (mean) problem in this way for several reasons. One of them is that the "m" in the sort merge is not the size of the problem, but the size of the vector of results after adding up any two columns, and that operation is repeated (n-1) times ... and I really don't want to do the math :(.
The other reason is that as I implemented the algorithm, we will be able to use some experimental results and save us from my surely leaking theoretical considerations.
The Algorithm
With what we said before, it is clear that we should optimize for the benign cases, as the worst case is a lost one.
For doing so, we need to use lists (or variable dim vectors, or whatever can emulate those) for the columns and do a merge after every column add.
The merge may be replaced by several other algorithms (such as an insertion on a BTree) without modifying the results.
So the algorithm (procedural pseudocode) is something like:
Set result_vector to Column 1
For column i in (2 to n-1)
Remove repeated integers in the result_vector
Add every element of result_vector to every element of column i+1
giving a new result vector
Next column
Remove repeated integers in the result_vector
Or as you asked for it, a recursive version may work as follows:
function genResVector(a:list, b:list): returns list
local c:list
{
Set c = CartesianProduct (a x b)
Set c = Sum up each element {a[i],b[j]} of c </code>
Drop repeated elements of c
Return(c)
}
function ResursiveAdd(a:matrix, i integer): returns list
{
genResVector[Column i from a, RecursiveAdd[a, i-1]];
}
function ResursiveAdd(a:matrix, i==0 integer): returns list={0}
Algorithm Implementation (Recursive)
I choose a functional language, I guess it's no big deal to translate to any procedural one.
Our program has two functions:
genResVector, which sums two lists giving all possible results with repeated elements removed, and
recursiveAdd, which recurses on the matrix columns adding up all of them.
recursiveAdd, which recurses on the matrix columns adding up all of them.
The code is:
genResVector[x__, y__] := (* Header: A function that takes two lists as input *)
Union[ (* remove duplicates from resulting list *)
Apply (* distribute the following function on the lists *)
[Plus, (* "Add" is the function to be distributed *)
Tuples[{x, y}],2] (*generate all combinations of the two lists *)];
recursiveAdd[t_, i_] := genResVector[t[[i]], recursiveAdd[t, i - 1]];
(* Recursive add function *)
recursiveAdd[t_, 0] := {0}; (* With its stop pit *)
Test
If we take your example list
| 1 1 7 9 1 1 |
| 2 2 5 2 2 |
| 3 3 |
| 4 |
| 5 |
And run the program the result is:
{11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27}
The maximum and minimum are very easy to verify since they correspond to taking the Min or Max from each column.
Some interesting results
Let's consider what happens when the numbers on each position of the matrix is bounded. For that we will take a full (10 x 5 ) matrix and populate it with Random Integers.
In the extreme case where the integers are only zeros or ones, we may expect two things:
A very small result set
Fast execution, since there will be a lot of duplicate intermediate results
If we increase the Range of our Random Integers we may expect increasing result sets and execution times.
Experiment 1: 5x10 matrix populated with varying range random integers
It's clear enough that for a result set near the maximum result set size (5^10 ≅ 10^6 ) the Calculation time and the "Number of != results" have an asymptote. The fact that we see increasing functions just denote that we are still far from that point.
Morale: The smaller your elements are, the better chances you have to get it fast. This is because you are likely to have a lot of repetitions!
Note that our MAX calculation time is near 20 secs for the worst case tested
Experiment 2: Optimizations that aren't
Having a lot of memory available, we can calculate by brute force, not removing the repeated results.
The result is interesting ... 10.6 secs! ... Wait! What happened ? Our little "remove repeated integers" trick is eating up a lot of time, and when there are not a lot of results to remove there is no gain, but looses in trying to get rid of the repetitions.
But we may get a lot of benefits from the optimization when the Max numbers in the matrix are well under 5 10^5. Remember that I'm doing these tests with the 5x10 matrix fully loaded.
The Morale of this experiment is: The repeated integer removal algorithm is critical.
HTH!
PS: I have a few more experiments to post, if I get the time to edit them.

Resources