I have a C program that writes three files with the first column being the X values (clock cycles). The other columns are a set of metrics like % of memory usage, memory "holes", etc. Like I said before, there are three files like this (one per algorithm: first fit, best fit and worst fit).
Example - Headers: Clock Cycle, % Memory Usage and Number of "holes":
File 1 (First Fit):
1 20% 5
2 30% 9
3 70% 12
4 90% 3
File 2 (Best Fit):
1 15% 3
2 20% 5
3 80% 7
4 40% 3
5 60% 9
File 3 (Worst Fit):
1 15% 3
2 20% 5
3 80% 7
I would like to know if there is a way with gnuplot to generate one graph per metric comparing the three algorithms in those metrics.
By the way, sorry about my english, hope you guys understand.
set term pngcairo size 600,400
set output 'memory.png'
plot 'file1' using 1:2 w lp, \
'file2' using 1:2 w lp, \
'file3' using 1:2 w lp
set output 'holes.png'
plot 'file1' using 1:3 w lp, \
'file2' using 1:3 w lp, \
'file3' using 1:3 w lp
.......
(it would be better if you get rid of '%' in second column)
Related
I have a big file in gnuplot and I want to plot them as a gif. My file represents the trajectory of 20 particles. I have tried: do for [a=0:70000:10000] {plot 'posicion.dat' i 0:a u 2:3}. This one sohws the completed trajectory but I only want to show the last point of the trajectory of each particle.
How can I plot the last 20 points from a file in gnuplot?
Thank you!
To my knowledge there is no direct command to plot the last N lines.
If your data doesn't contain double empty lines you could do it with every (check help every).
You could also make a system call (e.g. under Linux using tail) to pass only the last N lines to gnuplot.
However, if you want a platform-independent gnuplot-only solution and if your data consists of lines which are all separated by two blank lines you could do the following:
determine the number of blocks via stats stored in the variable STATS_blocks
plot the last M blocks in a loop (keep in mind: numbering starts from 0)
Check help stats, help for and help index.
However, mind the difference: what is called "blocks" together with every is not the same what is called "blocks" together with stats.
The following example will plot the last 2 lines (blocks).
I hope you can adapt it to your data.
Script:
### plot the last N blocks
reset session
$Data <<EOD
1 10 11
2 20 21
3 30 31
4 40 41
5 50 51
6 60 61
EOD
stats $Data u 0 nooutput
N = STATS_blocks
M = 2 # M last values
set offset 10,10,10,10 # just to get some space to the border
plot for [i=1:M] $Data index N-M+i-1 u 2:3 w lp pt 7 lc i ti sprintf("Particle %d",i)
### end of script
Result:
What is the preferred approach in J for selectively summing multiple axes of an array?
For instance, suppose that a is the following rank 3 array:
]a =: i. 2 3 4
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
16 17 18 19
20 21 22 23
My goal is to define a dyad "sumAxes" to sum over multiple axes of my choosing:
0 1 sumAxes a NB. 0+4+8+12+16+20 ...
60 66 72 78
0 2 sumAxes a NB. 0+1+2+3+12+13+14+15 ...
60 92 124
1 2 sumAxes a NB. 0+1+2+3+4+5+6+7+8+9+10+11 ...
66 210
The way that I am currently trying to implement this verb is to use the dyad |: to first permute the axes of a, and then ravel the items of the necessary rank using ,"n (where n is the number axes I want to sum over) before summing the resulting items:
sumAxes =: dyad : '(+/ # ,"(#x)) x |: y'
This appears to work as I want, but as a beginner in J I am unsure if I am overlooking some aspect of rank or particular verbs that would enable a cleaner definition. More generally I wonder whether permuting axes, ravelling and summing is idiomatic or efficient in this language.
For context, most of my previous experience with array programming is with Python's NumPy library.
NumPy does not have J's concept of rank and instead expects the user to explicitly label the axes of an array to reduce over:
>>> import numpy
>>> a = numpy.arange(2*3*4).reshape(2, 3, 4) # a =: i. 2 3 4
>>> a.sum(axis=(0, 2)) # sum over specified axes
array([ 60, 92, 124])
As a footnote, my current implementation of sumAxes has the disadvantage of working "incorrectly" compared to NumPy when just a single axis is specified (as rank is not interchangeable with "axis").
Motivation
J has incredible facilities for handling arbitrarily-ranked arrays. But there's one facet of the language which is simultaneously almost universally useful as well as justified, but also somewhat antithetical to this dimensionality-agnostic nature.
The major axis (in fact, leading axes in general) are implicitly privileged. This is the concept that underlies, e.g. # being the count of items (i.e. the dimension of the first axis), the understated elegance and generality of +/ without further modification, and a host of other beautiful parts of the language.
But it's also what accounts for the obstacles you're meeting in trying to solve this problem.
Standard approach
So the general approach to solving the problem is just as you have it: transpose or otherwise rearrange the data so the axes that interest you become leading axes. Your approach is classic and unimpeachable. You can use it in good conscience.
Alternative approaches
But, like you, it niggles me a bit that we are forced to jump through such hoops in similar circumstances. One clue that we're kind of working against the grain of the language is the dynamic argument to the conjunction "(#x); usually arguments to conjunctions are fixed, and calculating them at runtime often forces us to use either explicit code (as in your example) or dramatically more complicated code. When the language makes something hard to do, it's usually a sign you're cutting against the grain.
Another is that ravel (,). It's not just that we want to transpose some axes; it's that we want to focus on one specific axis, and then run all the elements trailing it into a flat vector. Though I actually think this reflects more a constraint imposed by how we're framing the problem, rather than one in the notation. More on in the final section of this post.
With that, we might feel justified in our desire to address a non-leading axis directly. And, here and there, J provides primitives that allow us to do exactly that, which might be a hint that the language's designers also felt the need to include certain exceptions to the primacy of leading axes.
Introductory examples
For example, dyadic |. (rotate) has ranks 1 _, i.e. it takes a vector on the left.
This is sometimes surprising to people who have been using it for years, never having passed more than a scalar on the left. That, along with the unbound right rank, is another subtle consequence of J's leading-axis bias: we think of the right argument as a vector of items, and the left argument as a simple, scalar rotation value of that vector.
Thus:
3 |. 1 2 3 4 5 6
4 5 6 1 2 3
and
1 |. 1 2 , 3 4 ,: 5 6
3 4
5 6
1 2
But in this latter case, what if we didn't want to treat the table as a vector of rows, but as a vector of columns?
Of course, the classic approach is to use rank, to explicitly denote the the axis we're interested in (because leaving it implicit always selects the leading axis):
1 |."1 ] 1 2 , 3 4 ,: 5 6
2 1
4 3
6 5
Now, this is perfectly idiomatic, standard, and ubiquitous in J code: J encourages us to think in terms of rank. No one would blink an eye on reading this code.
But, as described at the outset, in another sense it can feel like a cop-out, or manual adjustment. Especially when we want to dynamically choose the rank at runtime. Notationally, we are now no longer addressing the array as a whole, but addressing each row.
And this is where the left rank of |. comes in: it's one of those few primitives which can address non-leading axes directly.
0 1 |. 1 2 , 3 4 ,: 5 6
2 1
4 3
6 5
Look ma, no rank! Of course, we now have to specify a rotation value for each axis independently, but that's not only ok, it's useful, because now that left argument smells much more like something which can be calculated from the input, in true J spirit.
Summing non-leading axes directly
So, now that we know J lets us address non-leading axes in certain cases, we simply have to survey those cases and identify one which seems fit for our purpose here.
The primitive I've found most generally useful for non-leading-axis work is ;. with a boxed left-hand argument. So my instinct is to reach for that first.
Let's start with your examples, slightly modified to see what we're summing.
]a =: i. 2 3 4
sumAxes =: dyad : '(< # ,"(#x)) x |: y'
0 1 sumAxes a
+--------------+--------------+---------------+---------------+
|0 4 8 12 16 20|1 5 9 13 17 21|2 6 10 14 18 22|3 7 11 15 19 23|
+--------------+--------------+---------------+---------------+
0 2 sumAxes a
+-------------------+-------------------+---------------------+
|0 1 2 3 12 13 14 15|4 5 6 7 16 17 18 19|8 9 10 11 20 21 22 23|
+-------------------+-------------------+---------------------+
1 2 sumAxes a
+-------------------------+-----------------------------------+
|0 1 2 3 4 5 6 7 8 9 10 11|12 13 14 15 16 17 18 19 20 21 22 23|
+-------------------------+-----------------------------------+
The relevant part of the definition of for dyads derived from ;.1 and friends is:
The frets in the dyadic cases 1, _1, 2 , and _2 are determined by the 1s in boolean vector x; an empty vector x and non-zero #y indicates the entire of y. If x is the atom 0 or 1 it is treated as (#y)#x. In general, boolean vector >j{x specifies how axis j is to be cut, with an atom treated as (j{$y)#>j{x.
What this means is: if we're just trying to slice an array along its dimensions with no internal segmentation, we can simply use dyad cut with a left argument consisting solely of 1s and a:s. The number of 1s in the vector (ie. the sum) determines the rank of the resulting array.
Thus, to reproduce the examples above:
('';'';1) <#:,;.1 a
+--------------+--------------+---------------+---------------+
|0 4 8 12 16 20|1 5 9 13 17 21|2 6 10 14 18 22|3 7 11 15 19 23|
+--------------+--------------+---------------+---------------+
('';1;'') <#:,;.1 a
+-------------------+-------------------+---------------------+
|0 1 2 3 12 13 14 15|4 5 6 7 16 17 18 19|8 9 10 11 20 21 22 23|
+-------------------+-------------------+---------------------+
(1;'';'') <#:,;.1 a
+-------------------------+-----------------------------------+
|0 1 2 3 4 5 6 7 8 9 10 11|12 13 14 15 16 17 18 19 20 21 22 23|
+-------------------------+-----------------------------------+
Et voila. Also, notice the pattern in the left hand argument? The two aces are exactly at the indices of your original calls to sumAxe. See what I mean by the fact that providing a value for each dimension smelling like a good thing, in the J spirit?
So, to use this approach to provide an analog to sumAxe with the same interface:
sax =: dyad : 'y +/#:,;.1~ (1;a:#~r-1) |.~ - {. x -.~ i. r=.#$y' NB. Explicit
sax =: ] +/#:,;.1~ ( (] (-#{.#] |. 1 ; a: #~ <:#[) (-.~ i.) ) ##$) NB. Tacit
Results elided for brevity, but they're identical to your sumAxe.
Final considerations
There's one more thing I'd like to point out. The interface to your sumAxe call, calqued from Python, names the two axes you'd like "run together". That's definitely one way of looking at it.
Another way of looking at it, which draws upon the J philosophies I've touched on here, is to name the axis you want to sum along. The fact that this is our actual focus is confirmed by the fact that we ravel each "slice", because we do not care about its shape, only its values.
This change in perspective to talk about the thing you're interested in, has the advantage that it is always a single thing, and this singularity permits certain simplifications in our code (again, especially in J, where we usually talk about the [new, i.e. post-transpose] leading axis)¹.
Let's look again at our ones-and-aces vector arguments to ;., to illustrate what I mean:
('';'';1) <#:,;.1 a
('';1;'') <#:,;.1 a
(1;'';'') <#:,;.1 a
Now consider the three parenthesized arguments as a single matrix of three rows. What stands out to you? To me, it's the ones along the anti-diagonal. They are less numerous, and have values; by contrast the aces form the "background" of the matrix (the zeros). The ones are the true content.
Which is in contrast to how our sumAxe interface stands now: it asks us to specify the aces (zeros). How about instead we specify the 1, i.e. the axis that actually interests us?
If we do that, we can rewrite our functions thus:
xas =: dyad : 'y +/#:,;.1~ (-x) |. 1 ; a: #~ _1 + #$y' NB. Explicit
xas =: ] +/#:,;.1~ -#[ |. 1 ; a: #~ <:###$#] NB. Tacit
And instead of calling 0 1 sax a, you'd call 2 xas a, instead of 0 2 sax a, you'd call 1 xas a, etc.
The relative simplicity of these two verbs suggests J agrees with this inversion of focus.
¹ In this code I'm assuming you always want to collapse all axes except 1. This assumption is encoded in the approach I use to generate the ones-and-aces vector, using |..
However, your footnote sumAxes has the disadvantage of working "incorrectly" compared to NumPy when just a single axis is specified suggests sometimes you want to only collapse one axis.
That's perfectly possible and the ;. approach can take arbitrary (orthotopic) slices; we'd only need to alter the method by which we instruct it (generate the 1s-and-aces vector). If you provide a couple examples of generalizations you'd like, I'll update the post here. Probably just a matter of using (<1) x} a: #~ #$y or ((1;'') {~ (e.~ i.###$)) instead of (-x) |. 1 ; a:#~<:#$y.
I want to create a matrix from a vector by concatenating the vector onto itself n times. So if my vector is mx1, then my matrix will be mxn and each column of the matrix will be equal to the vector.
Which of the following is the best/correct way, or maybe there is a better way I do not know?
matrix = repmat(vector, 1, n);
matrix = vector * ones(1, n);
Thanks
Here is some benchmarking using timeit with different vector sizes and repetition factors. The results to be shown are for Matlab R2015b on Windows.
First define a function for each of the considered approaches:
%// repmat approach
function matrix = f_repmat(vector, n)
matrix = repmat(vector, 1, n);
%// multiply approach
function matrix = f_multiply(vector, n)
matrix = vector * ones(1, n);
%// indexing approach
function matrix = f_indexing(vector,n)
matrix = vector(:,ones(1,n));
Then generate vectors of different size, and use different repetition factors:
M = round(logspace(2,4,15)); %// vector sizes
N = round(logspace(2,3,15)); %// repetition factors
time_repmat = NaN(numel(M), numel(N)); %// preallocate results
time_multiply = NaN(numel(M), numel(N));
time_indexing = NaN(numel(M), numel(N));
for ind_m = 1:numel(M);
for ind_n = 1:numel(N);
vector = (1:M(ind_m)).';
n = N(ind_n);
time_repmat(ind_m, ind_n) = timeit(#() f_repmat(vector, n)); %// measure time
time_multiply(ind_m, ind_n) = timeit(#() f_multiply(vector, n));
time_indexing(ind_m, ind_n) = timeit(#() f_indexing(vector, n));
end
end
The results are plotted in the following two figures, using repmat as reference:
figure
imagesc(time_multiply./time_repmat)
set(gca, 'xtick',1:2:numel(N), 'xticklabels',N(1:2:end))
set(gca, 'ytick',1:2:numel(M), 'yticklabels',M(1:2:end))
title('Time of multiply / time of repmat')
axis image
colorbar
figure
imagesc(time_indexing./time_repmat)
set(gca, 'xtick',1:2:numel(N), 'xticklabels',N(1:2:end))
set(gca, 'ytick',1:2:numel(M), 'yticklabels',M(1:2:end))
title('Time of indexing / time of repmat')
axis image
colorbar
Perhaps a better comparison is to indicate, for each tested vector size and repetition factor, which of the three approaches is the fastest:
figure
times = cat(3, time_repmat, time_multiply, time_indexing);
[~, fastest] = min(times, [], 3);
imagesc(fastest)
set(gca, 'xtick',1:2:numel(N), 'xticklabels',N(1:2:end))
set(gca, 'ytick',1:2:numel(M), 'yticklabels',M(1:2:end))
title('1: repmat is fastest; 2: multiply is; 3: indexing is')
axis image
colorbar
Some conclusions can be drawn from the figures:
The multiply-based approach is always slower than repmat
The indexing-based approach is similar to repmat. It tends to be faster for large values of vector size or repetition factor, and slower for small values.
Either method is correct if they provide you with the desired output.
However, depending on how you declare your vector you may get incorrect results with repmat that will be spotted if you use ones. For instance take this example
>> v = 1:10;
>> m = v * ones(1, n)
Error using *
Inner matrix dimensions must agree.
>> m = repmat(v, 1, n)
m =
Columns 1 through 22
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2
Columns 23 through 44
3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4
Columns 45 through 50
5 6 7 8 9 10
ones provides an error to let you know you aren't doing the right thing but repmat doesn't. Whilst this example works correctly with both repmat and ones
>> v = (1:10).';
>> m = v * ones(1, n)
m =
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
6 6 6 6 6
7 7 7 7 7
8 8 8 8 8
9 9 9 9 9
10 10 10 10 10
>> m = repmat(v, 1, n)
m =
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
6 6 6 6 6
7 7 7 7 7
8 8 8 8 8
9 9 9 9 9
10 10 10 10 10
You can also do this -
vector(:,ones(1,n))
But, if I have to choose, repmat would be the go-to approach for me, as it is made exactly for this purpose. Also, depending on how you are going to use this replicated array, you can just avoid creating it altogether with bsxfun that does on-the-fly replication on its input arrays and some operation to be applied on the inputs. Here's a comparison on that - Comparing BSXFUN and REPMAT that shows bsxfun to be better than repmat in most cases.
Benchmarking
For the sake of performance, let's test out these. Here's a benchmarking code to do so -
%// Inputs
vector = rand(1000,1);
n = 1000;
%// Warm up tic/toc.
for iter = 1:50000
tic(); elapsed = toc();
end
disp(' ------- With REPMAT -------')
tic,
for iter = 1:200
A = repmat(vector, 1, n);
end
toc, clear A
disp(' ------- With vector(:,ones(1,n)) -------')
tic,
for iter = 1:200
A = vector(:,ones(1,n));
end
toc, clear A
disp(' ------- With vector * ones(1, n) -------')
tic,
for iter = 1:200
A = vector * ones(1, n);
end
toc
Runtime results -
------- With REPMAT -------
Elapsed time is 1.241546 seconds.
------- With vector(:,ones(1,n)) -------
Elapsed time is 1.212566 seconds.
------- With vector * ones(1, n) -------
Elapsed time is 3.023552 seconds.
Both are correct, but repmat is a more general solution for multi-dimensional matrix copying and is thus bound to be slower than an other solution. The specific 'homemade' solution of multiplying two vectors is possibly faster. It is probably even faster to do selecting instead of multiplying, i.e. vector(:,ones(n,1)) instead of vector*ones(1,n).
EDIT:
Type open repmat in your Command Window. As you can see, it is not a built-in function. You can see that it also makes use of ones (selecting) to copy matrices. However, since it is a more general solution (for scalars and multi-dimensional matrices and copies in multiple directions), you will find unnecessary if statements and other unnecessary code, effectively slowing things down.
EDIT:
Multiplying vectors with ones becomes slower for very large vectors. The unequivocal winner is using ones with selection, i.e. vector(:,ones(n,1)) (which should always be faster than repmat since it uses the same strategy).
I would like to plot data points included inside a script file.
This should be done multiple times (plotting to different files).
Therefore, I am using a do-for-loop.
This loop let's Gnuplot freeze on excution.
Could you please hint me to the cause?
This is my MWE:
reset
set autoscale
do for [index=1:1] {
plot "-" with lines ls 2 notitle
0.500 5
1.000 6
1.500 7
e
}
Yes, seems like the combination of do for with inline data isn't supported. It also wouldn't be very convenient, since this would require a separate data block for every iteration like in
set style data linespoints
plot '-' using 1:2, '-' using 1:3
1 2 3
4 5 6
e
1 2 3
4 5 6
e
With version 5.0 inline data blocks were introduced which allow reusing inline data:
$data <<EOD
1 2 3
4 5 6
EOD
do for [i=2:3] {
plot $data using 1:i w l
pause -1
}
Have sales and a time indicator as such:
time sales
1 6
2 7
1 5
3 4
2 4
5 7
4 3
3 2
5 1
5 4
3 1
4 9
1 8
I want the mean, stdev, and N of the above saved in a t (each time period has a row) X 4 (time period, mean, stdev, N) matrix.
For time = 5 the matrix would be:
time mean stdev N
... ... ... ...
5 4 3 3
... ... ... ...
Just for the mean I tried:
mat t1=J(5,1,0)
forval i = 1/5 {
summ sales if time == `i'
mat t1[`i']=r(mean)
}
However, I kept getting an error. Even if it worked I was unsure how to get the other (stdev and N) variables of interest.
You were probably aiming for something like
matrix t1 = J(5, 1, .)
forvalues i = 1/5 {
summarize sales if time == `i'
matrix t1[`i', 1] = r(mean)
}
matrix list t1
U[14.9] Subscripting specifies you need matname[r,c]. You were leaving out the second subscript. In Mata you are allowed to subscript vectors in this way but you never enter Mata.
An alternative is
forval i = 1/5 {
summarize sales if time == `i'
matrix t1 = (nullmat(t1) \ r(mean))
}
With the latter, you have no need of declaring the matrix beforehand. See help nullmat().
But it's probably easiest to use collapse and get all information in one step:
clear all
set more off
input ///
time sales
1 6
2 7
1 5
3 4
2 4
5 7
4 3
3 2
5 1
5 4
3 1
4 9
1 8
end
collapse (mean) msales=sales (sd) sdsales=sales ///
(count) csales=sales, by(time)
list
Note that count counts nonmissing observations only.
If you want a matrix then convert the variables using mkmat, after the collapse:
mkmat time msales sdsales csales, matrix(summatrix)
matrix list summatrix