Stat2Data package doesn't allow me to use the objects in it - package

Stat2Data package contains a lot of exemplary datasets. I get no errors when installing it or using the library function to call it. However, it doesn't allow me to work with the objects.
Anyone familiar with this package and know what I can do about it? Here's the code that I used:
install.packages("Stat2Data")
library(Stat2Data)
# Attempt 1
MedGPA_ds <- ggplot(MedGPA, aes(x = GPA, y = Acceptance))
# Attempt 2
MedGPA_ds <- ggplot(Stat2Data::MedGPA, aes(x = GPA, y = Acceptance))

You have missed just one step, the use of the data() function to call the specific dataset (MedGPA) contained within the Stat2Data package.
Try the following:
library(Stat2Data)
data(MedGPA)
head(MedGPA)
Accept Acceptance Sex BCPM GPA VR PS WS BS MCAT Apps
1 D 0 F 3.59 3.62 11 9 9 9 38 5
2 A 1 M 3.75 3.84 12 13 8 12 45 3
3 A 1 F 3.24 3.23 9 10 5 9 33 19
4 A 1 F 3.74 3.69 12 11 7 10 40 5
5 A 1 F 3.53 3.38 9 11 4 11 35 11
6 A 1 M 3.59 3.72 10 9 7 10 36 5
Happy coding!

Related

How do I grab values from datafile to make legend?

I want to use system call(s) to grab values from first data line in file file and print into legend of plot.
The command returns a syntax error. I admit come confusion about a- the use of system calls and syntax, despite reading few advanced questions here.
This is what I have:
gnuplot> plot for [i=20:30:1] '4He-processed' index i u 8:($22>0.2&&$22<2?$9:1/0):10 w yerr t system("head -2 4He-processed | tail -1 | awk '{printf "%s %8.3f %8.3f %s" , "'", $3, $4,"'"}'")
with the response: ')' expected with the pointer at the "f" in printf.
I want to have the values in $3 and $4 written to the legend.
This alternate command
gnuplot> plot for [i=20:30:1] '4He-processed' index i u 8:($22>0.2&&$22<2?$9:1/0):10 w yerr t system("head -2 4He-processed | tail -1 ")
puts the entire first line, of each index loop, to the legend
It likely has to do with syntax?
I want the values from $3 and $4, not the column headings:
Here is the some lines (but not all the columns) from the file
nz na e0 theta nu xsect ert y fy fye
2 4 0.150 60.000 0.025 0.330E+02 0.752E+00 -0.0459 0.956E+00 0.218E-01
2 4 0.150 60.000 0.030 0.497E+02 0.784E+00 -0.0001 0.146E+01 0.230E-01
2 4 0.150 60.000 0.035 0.483E+02 0.766E+00 0.0315 0.144E+01 0.229E-01
2 4 0.150 60.000 0.040 0.408E+02 0.728E+00 0.0573 0.125E+01 0.224E-01
This continues for many blocks. Here, if I were to start my loop with the first block, the values (at $3 and $4) would be 0.150 and 60.000 which correspond to the energy and angle of the projectile and they would hopefully appear in the legend. The plotted quantities ($8,$22 and $23) not pasted here (too many columns).
The problem is that the double-quoted string in your command starts with head" and runs to printf". That is clearly not what you intended.
Also I am uncertain what you mean by values in $3 and $4. Can you show an example of the first line (2 lines?) of your file and state, in words, what you are trying to extract from it?
Amended answer after seeing the data and a fuller explanation of the requirements
The answer from #theozh should work, but for completeness here is a different approach that works in either gnuplot 5.2 or 5.4.
It uses two passes through the data file; one pass to construct the titles, and a second pass for the actual plot.
$Data <<EOD
nz na e0 theta nu xsect ert y fy fye
1 11 0.150 60.000 5 6 7 8 9 10
2 12 0.150 60.000 5 6 7 8 9 10
3 13 0.150 60.000 5 6 7 8 9 10
4 14 0.150 60.000 5 6 7 8 9 10
nz na e0 theta nu xsect ert y fy fye
1 21 0.250 65.000 5 6 7 8 9 10
2 22 0.250 65.000 5 6 7 8 9 10
3 23 0.250 65.000 5 6 7 8 9 10
4 24 0.250 65.000 5 6 7 8 9 10
nz na e0 theta nu xsect ert y fy fye
1 31 0.350 70.000 5 6 7 8 9 10
2 32 0.350 70.000 5 6 7 8 9 10
3 33 0.350 70.000 5 6 7 8 9 10
4 34 0.350 70.000 5 6 7 8 9 10
EOD
set key out
set key title "e0 theta" left
#set datafile columnheader # omit this line for gnuplot version < 5.4
# columheaders are not actually used anyhow
#
# First pass is just to accummulate an array of titles
#
nblocks = 30 # maximum number of data blocks
array Title[nblocks] # one title for each data blockA
set term push # remember current terminal
set term dumb # could be anything
set out '/dev/null' # throw away the output
plot for [i=0:2:1] $Data index i using 1:2:($0 == 1 ? Title[i+1]=sprintf("%g %g",$3,$4) : "") with labels
#
# Second pass is the actual plot
#
set term pop # restore original terminal
plot for [i=0:2:1] $Data index i using 1:2 with lp title Title[i+1]
I guess you can't use columnheader() (check help columnheader), because by this you will sacrifice the first data line of each block and I assume you do want to keep all data.
The following should work for gnuplot>=5.4.0 and under the assumption that column 3 and 4 do not change within each block. However, this will not work for gnuplot<=5.4.0, because (as I understand) the title for the legend will be evaluated before each plotting iteration and in gnuplot>=5.4.0 after each plotting iteration. For gnuplot<5.4.0 you have to think about other workarounds.
I hope you get the idea and can adapt the script to your case, i.e. filtering, etc.
Script: (works for gnuplot>=5.4.0)
### extract title without using columnheader()
reset session
$Data <<EOD
nz na e0 theta nu xsect ert y fy fye
1 11 0.150 60.000 5 6 7 8 9 10
2 12 0.150 60.000 5 6 7 8 9 10
3 13 0.150 60.000 5 6 7 8 9 10
4 14 0.150 60.000 5 6 7 8 9 10
nz na e0 theta nu xsect ert y fy fye
1 21 0.250 65.000 5 6 7 8 9 10
2 22 0.250 65.000 5 6 7 8 9 10
3 23 0.250 65.000 5 6 7 8 9 10
4 24 0.250 65.000 5 6 7 8 9 10
nz na e0 theta nu xsect ert y fy fye
1 31 0.350 70.000 5 6 7 8 9 10
2 32 0.350 70.000 5 6 7 8 9 10
3 33 0.350 70.000 5 6 7 8 9 10
4 34 0.350 70.000 5 6 7 8 9 10
EOD
set key out
plot for [i=0:2:1] $Data u 1:2:(myTitle=sprintf("%g, %g",$3,$4)) index i \
w p pt 7 ps 2 lc i title myTitle
### end of script
Result:
Addition: (script for gnuplot>=4.6.0, March 2012)
What Ethan did in his answer with an array (available only from 5.2.0 on) you can do with strings and word() for older versions (check help word).
The following works for 4.6.0 and 4.6.5 and >=5.0.0, however, not for 4.6.7 (it complains that ... '+' u (NaN):(NaN) ... "Skipping data file with no valid points").
For gnuplot 4.x you cannot add spaces in your myTitles, but for gnuplot>=5.0.0 you could use double quotes and include spaces, e.g. myTitles=myTitles.sprintf(' "%g, %g"',$3,$4).
You might have to adapt the color numbers in the second plot command depending on which subblocks (index) you are plotting in the first plot command.
Data: SO74320605.dat
nz na e0 theta nu xsect ert y fy fye
1 11 0.150 60.000 5 6 7 8 9 10
2 12 0.150 60.000 5 6 7 8 9 10
3 13 0.150 60.000 5 6 7 8 9 10
4 14 0.150 60.000 5 6 7 8 9 10
nz na e0 theta nu xsect ert y fy fye
1 21 0.250 65.000 5 6 7 8 9 10
2 22 0.250 65.000 5 6 7 8 9 10
3 23 0.250 65.000 5 6 7 8 9 10
4 24 0.250 65.000 5 6 7 8 9 10
nz na e0 theta nu xsect ert y fy fye
1 31 0.350 70.000 5 6 7 8 9 10
2 32 0.350 70.000 5 6 7 8 9 10
3 33 0.350 70.000 5 6 7 8 9 10
4 34 0.350 70.000 5 6 7 8 9 10
Script: (works for gnuplot 4.6.0, 4.6.5, >=5.0.0, but not for 4.6.7)
### extract title from lines without using columnheader()
reset
FILE = "SO74320605.dat"
set key out noautotitle
myTitles = ''
plot for [i=0:2:1] FILE u 1:2:($0==1?myTitles=myTitles.sprintf(' %g,%g',$3,$4):0) \
index i w lp pt 7 ps 2 lc i+1, \
for [i=1:words(myTitles)] '+' u (NaN):(NaN) every ::::0 \
w lp pt 7 ps 2 lc i ti word(myTitles,i)
### end of script
Result: (created with gnuplot 4.6.0)

What is the meaning of the addition at the end of this array declaration?

I'm tasked with implementing an algorithm which was supplied as Matlab (which none of us have any experience with) into our c++ application.
There is an array declared as such:
encrypted = [18 10 20 13 6 25 21 13 17;
2 26 4 29 22 9 5 29 1;
19 11 21 12 7 24 20 12 16;
% ... many rows like this ...
13 21 11 18 25 6 10 18 14]+1;
What is the semantic meaning of the +1 at the end of the array declaration?
Simply adding 1 to each entry:
>> [1 2 3; 4 5 6]
ans =
1 2 3
4 5 6
>> [1 2 3; 4 5 6] + 1
ans =
2 3 4
5 6 7
If you have MATLAB around, you could have figured that out by just trying. If you do not, I hope you have a very clear picture of what the code is doing and write a good test suite, since you won't be able to compare your new code's output to the MATLAB one.
The +1 means that all elements of the written matrix will be increased by one.
Example
out = [1 2;
3 4] + 1;
disp(out)
2 3
4 5

Rearranging an array using for loop in Matlab

I have a 1 x 15 array of values:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
I need to rearrange them into a 3 x 5 matrix using a for loop:
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
How would I do that?
I'm going to show you three methods. One where you need to have a for loop, and two others when you don't:
Method #1 - for loop
First, create a matrix that is 3 x 5, then keep track of an index that will go through your array. After, create a double for loop that will help you populate the array.
index = 1;
array = 1 : 15; %// Array we wish to access
matrix = zeros(3,5); %// Initialize
for m = 1 : 3
for n = 1 : 5
matrix(m,n) = array(index);
index = index + 1;
end
end
matrix =
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
Method #2 - Without a for loop
Simply put, use reshape:
matrix = reshape(1:15, 5, 3).';
matrix =
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
reshape will take a vector and restructure it into a matrix so that you populate the matrix by columns first. As such, we want to put 1 to 5 in the first column, 6 to 10 in the second and 11 to 15 in the third column. Therefore, our output matrix is in fact 5 x 3. When you see this, this is actually the transposed version of the matrix we want, which is why you do .' to transpose the matrix back.
Method #3 - Another method without a for loop (tip of the hat goes to Luis Mendo)
You can use vec2mat, and specify that you need to have 5 columns worth for your matrix:
matrix = vec2mat(1:15, 5);
matrix =
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
vec2mat takes a vector and reshapes it into a matrix of as many columns as you specify in the second parameter. In this case, we need 5 columns.
For the sake of (bsx)fun, here is another option...
bsxfun(#plus,1:5,[0:5:10]')
ans =
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
less readable, maybe faster, but who cares if it is such a small of an array...
A = [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ] ;
A = reshape( A' , 3 , 5 ) ;
A' = 1 2 3 4 5
6 7 8 9 10
11 12 13 14 15

Assigning a single value to all cells within a specified time period, matrix format

I have the following example dataset which consists of the # of fish caught per check of a net. The nets are not checked at uniform intervals. The day of the check is denoted in julian days as well as the number of days the net had been fishing since last checked (or since it's deployment in the case of the first check)
http://textuploader.com/9ybp
Site_Number Check_Day_Julian Set_Duration_Days Fish_Caught
2 5 3 100
2 10 5 70
2 12 2 65
2 15 3 22
100 4 3 45
100 10 6 20
100 18 8 8
450 10 10 10
450 14 4 4
In any case, I would like to turn the raw data above into the following format:
http://textuploader.com/9y3t
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
2 0 0 100 100 100 70 70 70 70 70 65 65 22 22 22 0 0 0
100 0 45 45 45 20 20 20 20 20 20 8 8 8 8 8 8 8 8
450 10 10 10 10 10 10 10 10 10 10 4 4 4 4 0 0 0 0
This is a matrix which assigns the # of fish caught during the period to EACH of the days that were within that period. The columns of the matrix are Julian days, the rows are site numbers.
I have tried to do this with some matrix functions but I have had much difficulty trying to populate all the fields that are within the time period, but I do not necessarily have a row of data for?
I had posted my small bit of code here, but upon reflection, my approach is quite archaic and a bit off point. Can anyone suggest a method to convert the data into the matrix provided? I've been scratching my head and googling all day but now I am stumped.
Cheers,
C
Two answers, the second one is faster but a bit low level.
Solution #1:
library(IRanges)
with(d, {
ir <- IRanges(end=Check_Day_Julian, width=Set_Duration_Days)
cov <- coverage(split(ir, Site_Number),
weight=split(Fish_Caught, Site_Number),
width=max(end(ir)))
do.call(rbind, lapply(cov, as.vector))
})
Solution #2:
with(d, {
ir <- IRanges(end=Check_Day_Julian, width=Set_Duration_Days)
site <- factor(Site_Number, unique(Site_Number))
m <- matrix(0, length(levels(site)), max(end(ir)))
ind <- cbind(rep(site, width(ir)), as.integer(ir))
m[ind] <- rep(Fish_Caught, width(ir))
m
})
I don't see a super obvious matrix transformation here. This is all i've got assuming the raw data is in a data.frame called dd
dd$Site_Number<-factor(dd$Site_Number)
mm<-matrix(0, nrow=nlevels(dd$Site_Number), ncol=18)
for(i in 1:nrow(dd)) {
mm[as.numeric(dd[i,1]), (dd[i,2]-dd[i,3]):dd[i,2] ] <- dd[i,4]
}
mm

R shortcut to getting last n entries in a vector [duplicate]

This question already has answers here:
How to access the last value in a vector?
(12 answers)
Closed 10 years ago.
This may be redundant but I could not find a similar question on SO.
Is there a shortcut to getting the last n elements/entries in a vector or array without using the length of the vector in the calculation?
foo <- 1:23
> foo
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Let say one wants the last 7 entities, I want to avoid this cumbersome syntax:
> foo[(length(foo)-6):length(foo)]
[1] 17 18 19 20 21 22 23
Python has foo[-7:]. Is there something similar in R? Thanks!
You want the tail function
foo <- 1:23
tail(foo, 5)
#[1] 19 20 21 22 23
tail(foo, 7)
#[1] 17 18 19 20 21 22 23
x <- 1:3
# If you ask for more than is currently in the vector it just
# returns the vector itself.
tail(x, 5)
#[1] 1 2 3
Along with head there are easy ways to grab everything except the last/first n elements of a vector as well.
x <- 1:10
# Grab everything except the first element
tail(x, -1)
#[1] 2 3 4 5 6 7 8 9 10
# Grab everything except the last element
head(x, -1)
#[1] 1 2 3 4 5 6 7 8 9
Not a good idea when you have the awesome tail function but here's an alternative:
n <- 3
rev(rev(foo)[1:n])
I'm preparing myself for the down votes.

Resources