I have an array of dataframes, all with the same colums. I would like to automatically plot a different line for a specific column for each of this dataframes. Something like:
plot(array[]$time, array[]$data)
is something like that possible, or do I have to loop each dataframe and add a line() for each dataframe?
Edit
I beg your pardon, in fact what I created is a list.
Basically I have two tables, connections that list different TCP conections informations:
src | src_port | dst | dst_port
and probes that contains timeseries data regarding packets and data transmitted:
timestamp | src | src_port | dst | dst_port | packets | bytes
So to plot the timeseries of all the different connections, I created a list of dataframe subsets, like that:
connection <- vector(mode='list', length = nrow(connections))
for (row in 1:nrow(connections)){
connection[[row]] <- subset(probes, src == connections[row, 'src'] & src_port == connections[row, 'src_port'] & dst == connections[row, 'dst'] & dst_port == connections[row, 'dst_port'])
}
What I want to obtain is to plot all these subset having in the x axis the timestamp and in the y axis the bytes, considering a different timesries for each connection.
I hope I better clarified the problem now.
Here's a reproducible example of plotting multiple dataframes extracted from a three-dimensional array. Notice the need to use "[[" to process the indices, and the fact that the default type of graphic for plot is points rather than lines. Could change that with type="l":
dfarray <- array( list(), c(2,3,4))
dfarray[[1,1,1]] <- data.frame(a=2:4, letters[2:4]) # need to use "[["
dfarray[[1,1,2]] <- data.frame(a=2:4, b=8:10)
dfarray[[1,1,3]] <- data.frame(a=2:4, b=10:12)
# Remember to make enough space to hold lines
png(); plot(b ~a, data=dfarray[[1,1,1]], ylim=c(5,12) )
for( x in 2:3) {lines( b~a, data=dfarray[[1,1,x]], col=x)}
dev.off()
This was quite interesting. I think we can generalise the for loop like this:
lapply(X = c(dfarray), FUN = function(x) {lines(x = x$a, y = x$b, ylim=c(5,12))}
Related
My small challenge is in the code of a loop I am trying to make of a dataframe that is split to allow correlations for each group
an example of what I need to achieve for each spp
rbt<-subset(Trjan,Trjan$Spp=="Redbilled Teal")
cotest<-cor.test(rbt$year,rbt$abundance)
vals<-c(cotest$estimate,cotest$p.value)
vals# at the end of the day I need a dataframe with species, slope & p value e.g. "Redbilled Teal" "its slope" "p value"
But because I have many spp I cant do this for all of them.After following some examples I got this code but I am failing to put my variables well.
uniq <- unique(unlist(Trjan$Spp))
for (i in 1:length(uniq)){
data_1 <- subset(Trjan, Spp == uniq[i])
cor.test(year,abundance)
vals<-c(estimate,p.value)
}
# error "abundance not found
any help. I thought my small problem would not need a sample of data, if need arise I can edit.
I finally got help from a friend, I realised that I needed to create a new empty data frame to store all my cor.test results by species
final.tab<-data.frame(Species=character(),cor_est=numeric(),cor_pval=numeric(),stringsAsFactors = F)
uniq <- unique(unlist(Trjan$Spp))
for (i in 1:length(uniq)){
data_1 <- subset(Trjan, Spp == uniq[i])
#I had to create an object to store your cor.test results and add the object name (i.e. "data_1$" before your column name)
cor.test.temp<-cor.test(data_1$year,data_1$abundance)
vals<-c(as.character(uniq[i]),round(as.numeric(cor.test.temp$estimate),3),round(as.numeric(cor.test.temp$p.value),3))
#progressively filling in my data.frame with cor.test results
final.tab[i,]<-vals
}
I want to know if it is possible and if so how I can achieve converting an array of 8 columns and 300+ rows into a sorted, concatenated, one-dimensional array where each row is the concatenation of the contents in the 8 columns. I would also like to achieve this using a single formula.
Example:
leg | dog | tom | jon | bar | | | |
foo | bin | git | hub | bet | far | day | bin |
...
would convert into:
bar dog jon leg tom
bet bin bin day far foo git hub
...
I can achieve this for a single row using this:
=arrayformula(CONCATENATE(transpose(sort(transpose(F2:M2),1,1))&" "))
as long as the 8 columns are from F to M
I can then copy this formula down 300+ times which is easy to do but I would like a single formula that populates n number of rows.
Can this be achieved or do I have to copy the formula down?
If I understood correctly, you should be able to do that with a formula like this
=ArrayFormula(transpose(query(transpose(A2:H8),,50000)))
Change the range to suit.
See also below picture.
EDIT: An alternative way may be to create a custom formula (sorting included). Add this to the script editor
function concatenateAndSort(range) {
return range.map(function (r) {
return [r.sort().join(" ")]
})
}
Then in the spreadsheet (where you want the output to appear) enter
=concatenateAndSort(A3:H8)
(Change range to suit).
Something like this should do that:
=transpose(split(" "&join(" ",index(sort(arrayformula({row(A1:A300),len(A1:A300)*0-9E+99
;row(A1:A300),A1:A300
;row(A1:A300),B1:B300
;row(A1:A300),C1:C300
;row(A1:A300),D1:D300
;row(A1:A300),E1:E300
;row(A1:A300),F1:F300
;row(A1:A300),G1:G300
;row(A1:A300),H1:H300
})),0,2))," "&-9E+99&" ",false))
First it creates a two-dimensional array with original row number in the first column and value in the second for each cell (adding a new value -9e99 for each row), then the array is sorted, first column is discarded, all values are joined using a space, then split (by the added value surrounded by spaces), and finally transposed.
=A2&" "&B2&" "&C2&" "&D2&" "&E2&" "&F2&" "&G2&" "&H2
=JOIN(" "; A2:H2)
SORTED ROW: =TRANSPOSE(SORT(TRANSPOSE(JOIN(" ";A2:H2));1;TRUE)) AND THEN: CTRL+SHIFT+⇩ DOWN ARROW ... CTRL+ENTER
I am trying to write my own function for scaling up an input image by using the Nearest-neighbor interpolation algorithm. The bad part is I am able to see how it works but cannot find the algorithm itself. I will be grateful for any help.
Here's what I tried for scaling up the input image by a factor of 2:
function output = nearest(input)
[x,y]=size(input);
output = repmat(uint8(0),x*2,y*2);
[newwidth,newheight]=size(output);
for i=1:y
for j=1:x
xloc = round ((j * (newwidth+1)) / (x+1));
yloc = round ((i * (newheight+1)) / (y+1));
output(xloc,yloc) = input(j,i);
end
end
Here is the output after Mark's suggestion
This answer is more explanatory than trying to be concise and efficient. I think gnovice's solution is best in that regard. In case you are trying to understand how it works, keep reading...
Now the problem with your code is that you are mapping locations from the input image to the output image, which is why you are getting the spotty output. Consider an example where input image is all white and output initialized to black, we get the following:
What you should be doing is the opposite (from output to input). To illustrate, consider the following notation:
1 c 1 scaleC*c
+-----------+ 1 +----------------------+ 1
| | | | | |
|----o | <=== | | |
| (ii,jj) | |--------o |
+-----------+ r | (i,j) |
inputImage | |
| |
+----------------------+ scaleR*r
ouputImage
Note: I am using matrix notation (row/col), so:
i ranges on [1,scaleR*r] , and j on [1,scaleC*c]
and ii on [1,r], jj on [1,c]
The idea is that for each location (i,j) in the output image, we want to map it to the "nearest" location in the input image coordinates. Since this is a simple mapping we use the formula that maps a given x to y (given all the other params):
x-minX y-minY
--------- = ---------
maxX-minX maxY-minY
in our case, x is the i/j coordinate and y is the ii/jj coordinate. Therefore substituting for each gives us:
jj = (j-1)*(c-1)/(scaleC*c-1) + 1
ii = (i-1)*(r-1)/(scaleR*r-1) + 1
Putting pieces together, we get the following code:
% read a sample image
inputI = imread('coins.png');
[r,c] = size(inputI);
scale = [2 2]; % you could scale each dimension differently
outputI = zeros(scale(1)*r,scale(2)*c, class(inputI));
for i=1:scale(1)*r
for j=1:scale(2)*c
% map from output image location to input image location
ii = round( (i-1)*(r-1)/(scale(1)*r-1)+1 );
jj = round( (j-1)*(c-1)/(scale(2)*c-1)+1 );
% assign value
outputI(i,j) = inputI(ii,jj);
end
end
figure(1), imshow(inputI)
figure(2), imshow(outputI)
A while back I went through the code of the imresize function in the MATLAB Image Processing Toolbox to create a simplified version for just nearest neighbor interpolation of images. Here's how it would be applied to your problem:
%# Initializations:
scale = [2 2]; %# The resolution scale factors: [rows columns]
oldSize = size(inputImage); %# Get the size of your image
newSize = max(floor(scale.*oldSize(1:2)),1); %# Compute the new image size
%# Compute an upsampled set of indices:
rowIndex = min(round(((1:newSize(1))-0.5)./scale(1)+0.5),oldSize(1));
colIndex = min(round(((1:newSize(2))-0.5)./scale(2)+0.5),oldSize(2));
%# Index old image to get new image:
outputImage = inputImage(rowIndex,colIndex,:);
Another option would be to use the built-in interp2 function, although you mentioned not wanting to use built-in functions in one of your comments.
EDIT: EXPLANATION
In case anyone is interested, I thought I'd explain how the solution above works...
newSize = max(floor(scale.*oldSize(1:2)),1);
First, to get the new row and column sizes the old row and column sizes are multiplied by the scale factor. This result is rounded down to the nearest integer with floor. If the scale factor is less than 1 you could end up with a weird case of one of the size values being 0, which is why the call to max is there to replace anything less than 1 with 1.
rowIndex = min(round(((1:newSize(1))-0.5)./scale(1)+0.5),oldSize(1));
colIndex = min(round(((1:newSize(2))-0.5)./scale(2)+0.5),oldSize(2));
Next, a new set of indices is computed for both the rows and columns. First, a set of indices for the upsampled image is computed: 1:newSize(...). Each image pixel is considered as having a given width, such that pixel 1 spans from 0 to 1, pixel 2 spans from 1 to 2, etc.. The "coordinate" of the pixel is thus treated as the center, which is why 0.5 is subtracted from the indices. These coordinates are then divided by the scale factor to give a set of pixel-center coordinates for the original image, which then have 0.5 added to them and are rounded off to get a set of integer indices for the original image. The call to min ensures that none of these indices are larger than the original image size oldSize(...).
outputImage = inputImage(rowIndex,colIndex,:);
Finally, the new upsampled image is created by simply indexing into the original image.
MATLAB has already done it for you. Use imresize:
output = imresize(input,size(input)*2,'nearest');
or if you want to scale both x & y equally,
output = imresize(input,2,'nearest');
You just need a more generalized formula for calculating xloc and yloc.
xloc = (j * (newwidth+1)) / (x+1);
yloc = (i * (newheight+1)) / (y+1);
This assumes your variables have enough range for the multiplication results.
I am trying to parallelize a customer's Fortran code with MPI. f is an array of 4-byte reals dimensioned f(dimx,dimy,dimz,dimf). I need the various processes to work on different parts of the array's first dimension. (I would have rather started with the last, but it wasn't up to me.) So I define a derived type mpi_x_inteface like so
call mpi_type_vector(dimy*dimz*dimf, 1, dimx, MPI_REAL, &
mpi_x_interface, mpi_err)
call mpi_type_commit(mpi_x_interface, mpi_err)
My intent is that a single mpi_x_interface will contain all of the data in 'f' at some given first index "i". That is, for given i, it should contain f(i,:,:,:). (Note that at this stage of the game, all procs have a complete copy of f. I intend to eventually split f up between the procs, except I want proc 0 to have a full copy for the purpose of gathering.)
ptsinproc is an array containing the number of "i" indices handled by each proc. x_slab_displs is the displacement from the beginning of the array for each proc. For two procs, which is what I am testing on, they are ptsinproc=(/61,60/), x_slab_displs=(/0,61/). myminpt is a simple integer giving the minimum index handled in each proc.
So now I want to gather all of f into proc 0 and I run
if (myrank == 0) then
call mpi_gatherv(MPI_IN_PLACE, ptsinproc(myrank),
+ mpi_x_interface, f(1,1,1,1), ptsinproc,
+ x_slab_displs, mpi_x_interface, 0,
+ mpi_comm_world, mpi_err)
else
call mpi_gatherv(f(myminpt,1,1,1), ptsinproc(myrank),
+ mpi_x_interface, f(1,1,1,1), ptsinproc,
+ x_slab_displs, mpi_x_interface, 0,
+ mpi_comm_world, mpi_err)
endif
I can send at most one "slab" like this. If I try to send the entire 60 "slabs" from proc 1 to proc 0 I get a seg fault due to an "invalid memory reference". BTW, even when I send that single slab, the data winds up in the wrong places.
I've checked all the obvious stuff like maiking sure myrank and ptsinproc and x_slab_dislps are what they should be on all procs. I've looked into the difference between "size" and "extent" and so on, to no avail. I'm at my wit's end. I just don't see what I am doing wrong. And someone might remember that I asked a similar (but different!) question a few months back. I admit I'm just not getting it. Your patience is appreciated.
First off, I just want to say that the reason you're running into so many problems is because you are trying to split up the first (fastest) axis. This is not recommended at all because as-is packing your mpi_x_interface requires a lot of non-contiguous memory accesses. We're talking a huge loss in performance.
Splitting up the slowest axis across MPI processes is a much better strategy. I would highly recommend transposing your 4D matrix so that the x axis is last if you can.
Now to your actual problem(s)...
Derived datatypes
As you have deduced, one problem is that the size and extent of your derived datatype might be incorrect. Let's simplify your problem a bit so I can draw a picture. Say dimy*dimz*dimf=3, and dimx=4. As-is, your datatype mpi_x_interface describes the following data in memory:
| X | | | | X | | | | X | | | |
That is, every 4th MPI_REAL, and 3 of them total. Seeing as this is what you want, so far so good: the size of your variable is correct. However, if you try and send "the next" mpi_x_interface, you see that your implementation of MPI will start at the next point in memory (which in your case has not been allocated), and throw an "invalid memory access" at you:
tries to access and bombs
vvv
| X | | | | X | | | | X | | | | Y | | | | Y | ...
What you need to tell MPI as part of your datatype is that "the next" mpi_x_interface starts only 1 real into the array. This is accomplished by redefining the "extent" of your derived datatype by calling MPI_Type_create_resized(). In your case, you need to write
integer :: mpi_x_interface, mpi_x_interface_resized
integer, parameter :: SIZEOF_REAL = 4 ! or whatever f actually is
call mpi_type_vector(dimy*dimz*dimf, 1, dimx, MPI_REAL, &
mpi_x_interface, mpi_err)
call mpi_type_create_resized(mpi_x_interface, 0, 1*SIZEOF_REAL, &
mpi_x_interface_resized, mpi_err)
call mpi_type_commit(mpi_x_interface_resized, mpi_err)
Then, calling "the next" 3 mpi_x_interface_resized will result in:
| X | Y | Z | A | X | Y | Z | A | X | Y | Z | A |
as expected.
MPI_Gatherv
Note that now you have correctly defined the extent of your datatype, calling mpi_gatherv with an offset in terms of your datatype should now work as expected.
Personally, I wouldn't think there is a need to try some fancy logic with MPI_IN_PLACE for a collective operation. You can simply set myminpt=1 on myrank==0. Then you can call on every rank:
call mpi_gatherv(f(myminpt,1,1,1), ptsinproc(myrank),
+ mpi_x_interface_resized, f, ptsinproc,
+ x_slab_displs, mpi_x_interface_resized, 0,
+ mpi_comm_world, mpi_err)
I have a Byte[] buffer that may contain one or multiple data frames, I need to read the first bytes to know how long the actual frame is.
This is a "non-working" version of what I want to do:
let extractFrame (buffer:byte[]) =
match buffer with
| [|head1;head2;head3;..|] when head2 < (byte)128 -> processDataFrame buffer head2
| <...others....>
| _ -> raise(new System.Exception())
Basically, I need to evaluate the first three bytes, and then call processDataFrame with the buffer and the actual length of the frame. Depending on the headers, the frame can be data, control, etc...
Can this be done with any kind of match (lists, sequences, ...etc...)? Or will I have to create another small array with just the length of the header?(I would like to avoid this).
If you want to use matching you could create active pattern (http://msdn.microsoft.com/en-us/library/dd233248.aspx):
let (|Head1|_|) (buffer:byte[]) =
if(buffer.[0] (* add condition here *)) then Some buffer.[0]
else None
let (|Head2|_|) (buffer:byte[]) =
if(buffer.[1] < (byte)128) then Some buffer.[1]
else None
let extractFrame (buffer:byte[]) =
match buffer with
| Head1 h1 -> processDataFrame buffer h1
| Head2 h2 -> processDataFrame buffer h2
........
| _ -> raise(new System.Exception())
I think that this might actually be easier to do using the plain if construct.
But as Petr mentioned, you can use active patterns and define your own patterns that extract specific information from the array. To model what you're doing, I would actually use a parameterized active pattern - you can give it the number of elements from the array that you need and it gives you an array with e.g. 3 elements back:
let (|TakeSlice|_|) count (array:_[]) =
if array.Length < count then None
else Some(array.[0 .. count-1])
let extractFrame (buffer:byte[]) =
match buffer with
| TakeSlice 3 [|head1;head2;head3|] when head2 < (byte)128 ->
processDataFrame buffer head2
| <...others....>
| _ -> raise(new System.Exception())
One disadvantage of this approach is that your pattern [|h1; h2; h3|] has to match to the length that you specified 3 - the compiler cannot check this for you.