SAS: Using a Loop for Creating Many Data Sets and renaming the variables in them - arrays

I have a dataset in a long format as e.g.:
time subject var1 var2 var3
1 1 0.41 0.48 0.85
2 1 0.58 0.38 0.15
3 1 0.08 0.39 0.96
4 1 0.58 0.87 0.15
5 1 0.55 0.40 0.67
1 2 0.76 0.49 0.03
2 2 0.36 0.26 0.93
3 2 0.83 0.88 0.63
4 2 0.19 0.65 0.99
5 2 0.89 0.91 0.47
I would like to get a dataset in a wide format as
time var1_sub1 var2_sub1 var3_sub1 var1_sub2 var2_sub2 var3_sub2
1 0.41 0.48 0.85 0.76 0.49 0.03
2 0.58 0.38 0.15 0.36 0.26 0.93
3 0.08 0.39 0.96 0.83 0.88 0.63
4 0.58 0.87 0.15 0.19 0.65 0.99
5 0.55 0.40 0.67 0.89 0.91 0.47
So far, I came up with an idea to do it in the following way:
data data_sub1;
set data;
if subject=1;
var1_sub1=var1;
var2_sub1=var2;
var3_sub1=var3;
run;
data data_sub2;
set data;
if subject=2;
var1_sub2=var1;
var2_sub2=var2;
var3_sub2=var3;
run;
proc sort data=data_sub1;
by time;
run;
proc sort data=data_sub2;
by time;
run;
data datamerged;
merge data_sub1 data_sub2;
by time;
run;
It works, everything is fine, but I would like to learn how one could code it in a more beautiful way as in the practice I have much more subjects and variables.

This is a PROC TRANSPOSE problem. To solve most PROC TRANSPOSE problems, make it totally vertical (one value-one variable name per row) and then transpose using the ID statement.
data have;
input time subject var1 var2 var3;
datalines;
1 1 0.41 0.48 0.85
2 1 0.58 0.38 0.15
3 1 0.08 0.39 0.96
4 1 0.58 0.87 0.15
5 1 0.55 0.40 0.67
1 2 0.76 0.49 0.03
2 2 0.36 0.26 0.93
3 2 0.83 0.88 0.63
4 2 0.19 0.65 0.99
5 2 0.89 0.91 0.47
;;;;
run;
data have_vert;
set have;
array vars var:;
do _t = 1 to dim(vars);
id=cats(vname(vars[_t]),'_','sub',subject); *this is our future variable name;
value = vars[_t]; *this is our future variable value;
output;
end;
keep time id value subject;
run;
proc sort data=have_vert;
by time subject id;
run;
proc transpose data=have_vert out=want;
by time;
var value;
id id;
run;

Related

KNN Algorithm is Giving good Accuracy with Bad Confusion Matrix Results

I have data with multilabel classification. I used KNN model in order to classify it. The number of labels are 15, I got accuracy results for each label, averaged the results to get the accuracy of the model which is 93%.
The confusion matrix is showing bad numbers.
Would you tell me what does this mean? Is it overfitting? How can I solve my problem?
Accuracy and mean absolute error (mae) code
Input:
# Getting the accuracy of the model
y_pred1 = level_1_knn_model.predict(X_val1)
accuracy = (sum(y_val1==y_pred1)/y_val1.shape[0])*100
accuracy = sum(accuracy)/len(accuracy)
print("Accuracy: "+str(accuracy)+"%\n")
# Getting the mean absolute error
mae1 = mean_absolute_error(y_val1, y_pred1)
print("Mean Absolute Error: "+str(mae1))
Output:
Accuracy: [96.55462575 97.82146336 99.23207908 95.39247451 98.69340807 74.22793801
78.67975909 97.47825108 99.80189098 77.67264969 91.69399776 99.97084683
99.42621267 99.32682688 99.74159693]%
Accuracy: 93.71426804569977%
Mean Absolute Error: 9.703818402273944
Confusion Matrix and classification report code
Input:
# Calculate the confusion matrix
cMatrix1 = confusion_matrix(y_val1.argmax(axis=1), y_pred1.argmax(axis=1))
# Plot the confusion matrix
plt.figure(figsize=(11,10))
sns.heatmap(cMatrix1, annot=True, fmt='g')
# Calculate the classification report
classReport1 = classification_report(y_val1, y_pred1)
print("\nClassification Report:")
print(classReport1)
Output:
Classification Report:
precision recall f1-score support
0 0.08 0.00 0.01 5053
1 0.03 0.00 0.01 3017
2 0.00 0.00 0.00 1159
3 0.07 0.00 0.01 6644
4 0.00 0.00 0.00 1971
5 0.58 0.65 0.61 47222
6 0.39 0.33 0.36 27302
7 0.02 0.00 0.00 3767
8 0.00 0.00 0.00 299
9 0.58 0.61 0.60 40823
10 0.13 0.02 0.03 11354
11 0.00 0.00 0.00 44
12 0.00 0.00 0.00 866
13 0.00 0.00 0.00 1016
14 0.00 0.00 0.00 390
micro avg 0.54 0.43 0.48 150927
macro avg 0.13 0.11 0.11 150927
weighted avg 0.43 0.43 0.42 150927
samples avg 0.43 0.43 0.43 150927

C# Function that takes in value and prints out a 11x11 grid of values ranging -1 to 1

Currently learning C and in the textbook problem I am working on I have hit a slump. The question was Define a function that returns the value x^2 - y^2, and print out a 11 x 11 grid for values of x and y ranging from -1 to 1 using a function. I was able to finish the first row but I am having trouble with the other rows. The correct output to this problem is
0.00 0.36 0.64 0.84 0.96 1.00 0.96 0.84 0.64 0.36 0.00
-0.36 0.00 0.28 0.48 0.60 0.64 0.60 0.48 0.28 0.00 -0.36
-0.64 -0.28 0.00 0.20 0.32 0.36 0.32 0.20 -0.00 -0.28 -0.64
-0.84 -0.48 -0.20 0.00 0.12 0.16 0.12 -0.00 -0.20 -0.48 -0.84
-0.96 -0.60 -0.32 -0.12 0.00 0.04 -0.00 -0.12 -0.32 -0.60 -0.96
-1.00 -0.64 -0.36 -0.16 -0.04 0.00 -0.04 -0.16 -0.36 -0.64 -1.00
-0.96 -0.60 -0.32 -0.12 0.00 0.04 0.00 -0.12 -0.32 -0.60 -0.96
-0.84 -0.48 -0.20 0.00 0.12 0.16 0.12 0.00 -0.20 -0.48 -0.84
-0.64 -0.28 0.00 0.20 0.32 0.36 0.32 0.20 0.00 -0.28 -0.64
-0.36 0.00 0.28 0.48 0.60 0.64 0.60 0.48 0.28 0.00 -0.36
0.00 0.36 0.64 0.84 0.96 1.00 0.96 0.84 0.64 0.36 0.00
So far in my code I have
double y=1;
int count =0;
double xSq;
double origX = x;
double origY = y;
double ySq;
xSq = x * x;
ySq = y * y;
double update;
for (int i =0; i < 11; i++){
double sum = xSq - ySq;
printf("%f\t", sum);
count++;
y = y - 0.2;
ySq = y * y;
}
I think what they are wanting you to do is different. They want you to first makea function that returns the diff between squares.
Then, they want you to use that function from a 2 level loop (in main) which vary the values of x and y respectively.
The values x and y each go from -1 to +1.
Inside the 2 level loop, you'd call your function with the then current values of x, y and get the result. Then, you'd print x, y and the result.
You'll figure out how to add a line after one level of the loop so that you get your rows.

Spotfire Line chart with min max bars

I am trying to make a chart that has a line graph showing the change in value in the count column for each month, and then two points showing the min and max value in that month. The table table is below.
Date Min Max Count
1/1/2015 0.28 6.02 13
2/1/2015 0.2 7.72 8
3/1/2015 1 1 1
4/1/2015 0.4 6.87 7
5/1/2015 0.36 3.05 8
6/1/2015 0.17 1.26 13
7/1/2015 0.31 1.59 15
8/1/2015 0.39 3.35 13
9/1/2015 0.22 0.86 10
10/1/2015 0.3 2.48 13
11/1/2015 0.16 0.82 9
12/1/2015 0.33 2.18 5
1/1/2016 0.23 1.16 14
2/1/2016 0.38 1.74 7
3/1/2016 0.1 8.87 9
4/1/2016 0.28 0.68 3
5/1/2016 0.13 3.23 11
6/1/2016 0.33 1 5
7/1/2016 0.28 1.26 4
8/1/2016 0.08 0.41 2
9/1/2016 0.43 0.61 2
10/1/2016 0.49 1.39 4
11/1/2016 0.89 0.89 1
I tried doing a scatter plot but when I try to Add a Line from Column value I get an error saying that the line cannot work on categorical data.
Any suggestions on how I can prepare this visualization?
Thanks!
I would do this in a combination chart.
Insert a combination chart (Line & Bar Graph)
On your X-Axis put your date as <BinByDateTime([Date],"Year.Month",1)>
On your Y-Axis put your aggregations: Sum([Count]), Max([Max]), Min([Min])
Right click > Properties > Series > set the Min and Max to Line Type
(Optional) Change the Y-Axis scale

how to multiply a value in an array from a text file in perl

hey guys so ive been attempting to teach myself perl. so I found this project online, and I've begun to do it, but now I am stuck. the question is asking for me to print each record with the orbital period given in seconds rather than days. now this is the input file text:
Adrastea XV Jupiter 129000 0.30 0.00 0.00 Jewitt 1979
Amalthea V Jupiter 181000 0.50 0.40 0.00 Barnard 1892
Ananke XII Jupiter 21200000 -631 147.00 0.17 Nicholson 1951
Ariel I Uranus 191000 2.52 0.00 0.00 Lassell 1851
Atlas XV Saturn 138000 0.60 0.00 0.00 Terrile 1980
Belinda XIV Uranus 75000 0.62 0.03 0.00 Voyager2 1986
Bianca VIII Uranus 59000 0.43 0.16 0.00 Voyager2 1986
Caliban XVI Uranus 7169000 -580 140. 0.08 Gladman 1997
Callirrhoe XVII Jupiter 24100000 ? ? ? Sheppard 2000
Callisto IV Jupiter 1883000 16.69 0.28 0.01 Galileo 1610
Calypso XIV Saturn 295000 1.89 0.00 0.00 Pascu 1980
Carme XI Jupiter 22600000 -692 163.00 0.21 Nicholson 1938
Chaldene XXI Jupiter 23387000 -733.7 165.2 0.238 Sheppard 2000
Charon I Pluto 20000 6.39 98.80 0.00 Christy 1978
Cordelia VI Uranus 50000 0.34 0.14 0.00 Voyager2 1986
Cressida IX Uranus 62000 0.46 0.04 0.00 Voyager2 1986
Deimos II Mars 23000 1.26 1.80 0.00 Hall 1877
Desdemona X Uranus 63000 0.47 0.16 0.00 Voyager2 1986
Despina V Neptune 53000 0.33 0.00 0.00 Voyager2 1989
Dione IV Saturn 377000 2.74 0.02 0.00 Cassini 1684
Earth III Sun 149600000 365.26 0.00 0.02 - -
Elara VII Jupiter 11737000 259.65 28.00 0.21 Perrine 1905
Enceladus II Saturn 238000 1.37 0.02 0.00 Herschel 1789
Epimetheus XI Saturn 151000 0.69 0.34 0.01 Walker 1980
Erinome XXV Jupiter 23279000 728 164.9 0.266 Sheppard 2000
Europa II Jupiter 671000 3.55 0.47 0.01 Galileo 1610
Galatea VI Neptune 62000 0.43 0.00 0.00 Voyager2 1989
Ganymede III Jupiter 1070000 7.15 0.19 0.00 Galileo 1610
Harpalyke XXII Jupiter 21132000 623.3 148.6 0.226 Sheppard 2000
Helene XII Saturn 377000 2.74 0.20 0.01 Laques 1980
Himalia VI Jupiter 11480000 250.57 28.00 0.16 Perrine 1904
Hyperion VII Saturn 1481000 21.28 0.43 0.10 Bond 1848
Iapetus VIII Saturn 3561000 79.33 14.72 0.03 Cassini 1671
Io I Jupiter 422000 1.77 0.04 0.00 Galileo 1610
Iocaste XXIV Jupiter 20216000 ? ? ? Sheppard 2000
Isonoe XXVI Jupiter 23078000 ? ? ? Sheppard 2000
Janus X Saturn 151000 0.69 0.14 0.01 Dollfus 1966
Juliet XI Uranus 64000 0.49 0.06 0.00 Voyager2 1986
Jupiter V Sun 778330000 4332.71 1.31 0.05 - -
Kalyke XXIII Jupiter 23745000 ? ? ? Sheppard 2000
Larissa VII Neptune 74000 0.55 0.00 0.00 Reitsema 1989
Leda XIII Jupiter 11094000 238.72 27.00 0.15 Kowal 1974
Lysithea X Jupiter 11720000 259.22 29.00 0.11 Nicholson 1938
Mars IV Sun 227940000 686.98 1.85 0.09 - -
Megaclite XIX Jupiter 23911000 ? ? ? Sheppard 2000
Mercury I Sun 57910000 87.97 7.00 0.21 - -
Metis XVI Jupiter 128000 0.29 0.00 0.00 Synnott 1979
Mimas I Saturn 186000 0.94 1.53 0.02 Herschel 1789
Miranda V Uranus 130000 1.41 4.22 0.00 Kuiper 1948
Moon I Earth 384000 27.32 5.14 0.05 - -
Naiad III Neptune 48000 0.29 0.00 0.00 Voyager2 1989
Neptune VIII Sun 4504300000 60190.00 1.77 0.01 Adams 1846
Nereid II Neptune 5513000 360.13 29.00 0.75 Kuiper 1949
Oberon IV Uranus 583000 13.46 0.00 0.00 Herschel 1787
Ophelia VII Uranus 54000 0.38 0.09 0.00 Voyager2 1986
Pan XVIII Saturn 134000 0.58 0.00 0.00 Showalter 1990
Pandora XVII Saturn 142000 0.63 0.00 0.00 Collins 1980
Pasiphae VIII Jupiter 23500000 -735 147.00 0.38 Melotte 1908
Phobos I Mars 9000 0.32 1.00 0.02 Hall 1877
Phoebe IX Saturn 12952000 -550.48 175.30 0.16 Pickering 1898
Pluto IX Sun 5913520000 90550 17.15 0.25 Tombaugh 1930
Portia XII Uranus 66000 0.51 0.09 0.00 Voyager2 1986
Praxidike XXVII Jupiter 20964000 ? ? ? Sheppard 2000
Prometheus XVI Saturn 139000 0.61 0.00 0.00 Collins 1980
Prospero XVIII Uranus 16568000 -2019 152. 0.44 Holman 1999
Proteus VIII Neptune 118000 1.12 0.00 0.00 Voyager2 1989
Puck XV Uranus 86000 0.76 0.31 0.00 Voyager2 1985
Rhea V Saturn 527000 4.52 0.35 0.00 Cassini 1672
Rosalind XIII Uranus 70000 0.56 0.28 0.00 Voyager2 1986
Saturn VI Sun 1429400000 10759.50 2.49 0.06 - -
Setebos XIX Uranus 17681000 -2239 158. 0.57 Kavelaars 1999
Sinope IX Jupiter 23700000 -758 153.00 0.28 Nicholson 1914
Stephano XX Uranus 7948000 -674 143. 0.24 Gladman 1999
Sun - - - - - - - -
Sycorax XVII Uranus 12213000 -1289 153. 0.51 Nicholson 1997
Taygete XX Jupiter 23312000 ? ? ? Sheppard 2000
Telesto XIII Saturn 295000 1.89 0.00 0.00 Smith 1980
Tethys III Saturn 295000 1.89 1.09 0.00 Cassini 1684
Thalassa IV Neptune 50000 0.31 4.50 0.00 Voyager2 1989
Thebe XIV Jupiter 222000 0.67 0.80 0.02 Synnott 1979
Themisto XVIII Jupiter 7507000 ? ? ? Sheppard 2000
Titan VI Saturn 1222000 15.95 0.33 0.03 Huygens 1655
Titania III Uranus 436000 8.71 0.00 0.00 Herschel 1787
Trinculo XXI Uranus 8578000 -759 167.0 0.208 Gladman 2001
Triton I Neptune 355000 -5.88 157.00 0.00 Lassell 1846
Umbriel II Uranus 266000 4.14 0.00 0.00 Lassell 1851
Uranus VII Sun 2870990000 30685.00 0.77 0.05 Herschel 1781
Venus II Sun 108200000 224.70 3.39 0.01 - -
and this is what I came up with for my code, but it doesn't work :/ help me please.
!/usr/bin/perl
use strict;
my $period;
my $galaxy;
my $solar = 'solar.txt';
open(my $fh, '<:encoding(UTF -8)', $solar)
or die "could not open file!!!";
while ( my #galaxy = <$fh>){
my($planet,$number_moons,$obj_orbit,$orbital_radius,$orbital_period,$orbital_inclination, $orbtial_eccentricity, $discoverer, $year) = split / /, $galaxy;
if($orbital_period ne '0'){
$period = $orbital_period * 86400;
s/$orbital_period/$period/g;
print #galaxy;
}
}
You don't want to read a file into an array generally. Get rid of my #galaxy entirely and use the default internal "current record" variable $_ instead. And you might as well use the default split separator (whitespace) as well:
while (<$fh>){
... = split;
...
print;
...
Change [this reads into an array variable]:
while ( my #galaxy = <$fh>) {
Into [this reads into a scalar variable]:
while ( my $galaxy = <$fh>) {
Your split:
my($planet,$number_moons,...) = split / /, $galaxy;
uses a scalar variable.
In perl, you can have three types of variables, all with the same name and they are not related. They are distinguished by the syntax:
# scalar:
$foo = 17;
# array:
#foo = (23);
# hash:
$bar{"x"} = 37;
%foo = %bar;
printf("scalar: %d\n",$foo);
printf("array: %d\n",$foo[0]);
printf("hash: %d\n",$foo{"x"});
I'd probably do something along these lines
#!/usr/bin/perl
use warnings;
use strict;
my $file = 'planets.txt';
open my $fh, '<', $file;
while (my $galaxy = <$fh>){
chomp $galaxy;
my($planet,$number_moons,$obj_orbit,$orbital_radius,$orbital_period,$orbital_inclination, $orbtial_eccentricity, $discoverer, $year) = split ' ', $galaxy;
if (my ($inseconds) = $orbital_period =~ /([0-9.]+)/){
$inseconds = $orbital_period * 86400;
$galaxy =~ s/$orbital_period/$inseconds/;
}
print $galaxy . "\n";
}
close($fh);
The only problem I forsee with this is if any value BEFORE the orbital period is the same as the orbital period that value will be replaced by the regex match. There is another option and that would be loading the line into a hash and update the value that way, I'll provide an example
--- UPDATE ---
Here's an example using a hash, the downside of a hash is that it will not print things out in the same order so in order to get around that you could populate a hash with an array like so
#!/usr/bin/perl
use warnings;
use strict;
my $file = 'planets.txt';
my %celestial;
open my $fh, '<', $file;
while (my $galaxy = <$fh>){
chomp $galaxy;
my($planet,$number_moons,$obj_orbit,$orbital_radius,$orbital_period,$orbital_inclination, $orbtial_eccentricity, $discoverer, $year) = split ' ', $galaxy;
if ($orbital_period =~ /([0-9.]+)/){ $orbital_period = $orbital_period * 86400; }
my #info;
push #info, ($number_moons,$obj_orbit,$orbital_radius,$orbital_period,$orbital_inclination, $orbtial_eccentricity, $discoverer, $year);
$celestial{$planet} = \#info;
}
close($fh);
for my $keys (sort keys %celestial){
print $keys;
foreach my $k(#{$celestial{$keys}}) {
print " $k";
}
print "\n";
}
Hope this helps you out
# Read from STDIN rather than explicitly opening a file
# Read data into $_
while (<>) {
# Split on comma to get individual fields
my #satellite = split /,/, $_;
# Update the fifth field
$satellite[4] *= 86_400;
# Join the array back into a string
print join ',', #satellite;
}

how to split an array into separate arrays (R)? [duplicate]

This question already has answers here:
Split data.frame based on levels of a factor into new data.frames
(3 answers)
Closed 7 years ago.
I have the array:
>cent
b e r f
A19 60.46 0.77 -0.12 1
A15 16.50 0.53 0.08 2
A17 2.66 0.51 0.20 3
A11 36.66 0.40 -0.25 4
A12 38.96 0.91 0.23 1
A05 0.00 0.29 0.01 2
A09 3.40 0.35 0.03 3
A04 0.00 0.25 -0.03 4
Could some one please say me how to split this array into 4 separate arrays where the last column «f» is the flag? In result I would like to see:
>cent1
b e r f
A19 60.46 0.77 -0.12 1
A12 38.96 0.91 0.23 1
>cent2
b e r f
A15 16.50 0.53 0.08 2
A05 0.00 0.29 0.01 2
….
Should I use the for-loop and check flag "f" or exist a build-in function? Thanks.
We can use split to create a list of data.frames.
lst <- split(cent, cent$f)
NOTE: Here I assumed that the 'cent' is a data.frame. If it is a matrix
lst <- split(as.data.frame(cent), cent[,"f"])
Usually, it is enough to do most of the analysis. But, if we need to create multiple objects in the global environment, we can use list2env (not recommended)
list2env(lst, paste0("cent", seq_along(lst)), envir= .GlobalEnv)

Resources