How to use arrays in sas? - arrays

How to perform calculation using arrays in SAS?
source file scholar
Anne C A C D B E D D B A
Vicky C C C E B E D B A
Laurel D D C D B E D D B A
Victor C A C D B E D D A D
Dimple C A C D B E D D B A
Godfrey B D C B D D D B B A
Denny C D C B E E D B B A
Richard C A C D B E D D B A

Try this
data have;
input name $ (q1 - q10)(:$1.);
infile datalines missover;
datalines;
Anne C A C D B E D D B A
Vicky C C C E B E D B A
Laurel D D C D B E D D B A
Victor C A C D B E D D A D
Dimple C A C D B E D D B A
Godfrey B D C B D D D B B A
Denny C D C B E E D B B A
Richard C A C D B E D D B A
;
data want;
set have;
array a {10} $ _temporary_ ("C", "A", "C", "D", "B", "E", "D", "D", "B", "A");
array q q1 - q10;
total = 0;
do over q;
if q = a[_I_] then total + 1;
end;
Result = ifc(total ge 7, "Passed", "Failed");
run;
Result:
Obs name q1 ------------q10 total Result
1 Anne C A C D B E D D B A 10 Passed
2 Vicky C C C E B E D B A 5 Failed
3 Laurel D D C D B E D D B A 8 Passed
4 Victor C A C D B E D D A D 8 Passed
5 Dimple C A C D B E D D B A 10 Passed
6 Godfrey B D C B D D D B B A 4 Failed
7 Denny C D C B E E D B B A 6 Failed
8 Richard C A C D B E D D B A 10 Passed

Not sure why, but there u go
data want;
set have;
array a {10} $ _temporary_ ("C", "A", "C", "D", "B", "E", "D", "D", "B", "A");
array t {10} _temporary_ (10 * 0);
array q q1 - q10;
do over q;
if q = a[_I_] then t[_I_] = 1;
end;
total = sum(of t[*]);
Result = ifc(total ge 7, "Passed", "Failed");
call stdize('replace', 'mult=', 0, of t[*], _N_);
run;

I suspect you want to use simpler constructs like below:
data want;
set have;
array a {10} $ _temporary_ ("C", "A", "C", "D", "B", "E", "D", "D", "B", "A");
array correct_answer(10) correct_answer1-correct_answer10 ;
array q q1 - q10;
do i=1 to dim(a);
if q = a[_I_] then correct_answer = 1;
else correct_answer=0;
end;
Total = sum(of correct_answer1-correct_answer10);
if total>= 7 then result="Passed";
else result ="Failed";
run;

Related

Join four columns into one according to each row

A B C D
E F G H
I J K L
M N O P
If I chose to join the columns I would ={A1:A;B1:B;C1:C;D1:D} but it would look like this:
A
E
I
M
B
F
J
N
... and so on
I would like it to look like this:
A
B
C
D
E
F
G
... and so on
How to proceed in this case?
Note: It may happen that some of the columns are not complete in data, some may have more values than the others, but I still want to continue following this same pattern. Example:
A B D
E G H
I J K L
M N O P
Result:
A
B
D
E
G
H
... and so on
use:
=TRANSPOSE(QUERY(TRANSPOSE(A:D),, 9^9))
then:
=TRANSPOSE(SPLIT(QUERY(TRANSPOSE(QUERY(TRANSPOSE(A:D),,9^9)),,9^9), " "))

Batch insert heading/newline to ASCII file if value of column changes

I have a file similar to this:
A B C
D E C
F G C
A B X
F G X
A B Q
D E Q
Thats what I am looking for
> C
A B C
D E C
F G C
> X
A B X
F G X
> Q
A B Q
D E Q
So far I have a kind of complicated work-around.
Using AWK to add a empty line.
awk -v i=3 "NR>0 && $i!=p { print "A" }{ p=$i } 1" file.txt
I dont manage to add a ">" directly with awk since its a newline value. Instead of the "A", awk is outputting a empty line. Not really sure why..
Using then
sed -e "s/^$/>/" file.txt
I manage to insert a ">" to the empty line but the heading behind is still missing.
sed is for doing s/old/new, that is all. What you are attempting to do is not just s/old/new so you shouldn't be considering using sed, just use awk:
$ awk '$3!=p{print ">", $3; p=$3} 1' file
> C
A B C
D E C
F G C
> X
A B X
F G X
> Q
A B Q
D E Q
awk solution. Assuming that your input file is sorted:
awk '!a[$NF]++{ print ">",$NF }1' file
The output:
> C
A B C
D E C
F G C
> X
A B X
F G X
> Q
A B Q
D E Q
Could you please try following also and let me know if this helps you.
awk 'NR==1{print ">",$3 RS $0;prev=$3;next} prev!=$3{print ">",$3};1; {prev=$3}' Input_file
Output will be as follows.
> C
A B C
D E C
F G C
> X
A B X
F G X
> Q
A B Q
D E Q

naming array from an array in GAWK

I have a file with repeating elements. I would like to assign records to an array until the file repeats, at which point I want to create a new array to assign the records to. I would like to do this an arbitrary amount of times.
for example.
$ cat repeat.txt
a
b
c
d
e
f
g
a
b
c
d
e
f
g
a
b
c
d
e
f
g
I want the output to be something like this
0 a a a
1 b b b
2 c c c
3 d d d
4 e e e
5 f f f
6 g g g
right now I am doing this with this hideous code.
awk 'BEGIN{n=0;z=0}
$1~"a" {n=0;z++}
z==1{a[n]=$0}
z==2{b[n]=$0}
z==3{c[n]=$0}
z==4{d[n]=$0}
z==5{e[n]=$0}
z==6{f[n]=$0}
{n++}
END{for (i in a)
print i,a[i],b[i],c[i],d[i],e[i],f[i],g[i],h[i],k[i],j[i]}'
repeat.txt
I would like the assignment of new arrays to be automatic.
I attempted this by the following
echo "abcdefghijklmopqrstuvwxyz" > alphabet.txt
awk 'BEGIN{N=0}
NR==FNR{FS=""}
NR==FNR{for (zz=0;zz<=NF;zz++) a[zz]=$zz; next}
NR!=FNR{FS="\t"}
NR!=FNR{if ($0~a) N++; (a[N])[N]=$0}
END{for (I in (a[N])) print I,(a[N])[I]}' alphabet.txt repeat.txt
but this didn't work because you can't do multidimensional arrays like this in gawk. I can't think of another way to do this.

How to load a sliding diagonal vector from data stored column-wise with SSE

The sliding diagonal vector contains 16 elements, each one an 8-bit unsigned integer.
Without SSE and a bit simplified it would have looked like this in C:
int width=1000000; // a big number
uint8_t matrix[width][16];
fill_matrix_with_interesting_values(&matrix);
for (int i=0; i < width - 16; ++i) {
uint8_t diagonal_vector[16];
for (int j=0; j<16; ++j) {
diagonal_vector[j] = matrix[i+j][j];
}
do_something(&diagonal_vector);
}
but in my case I can only load column-wise (vertically) from the matrix with the _mm_load_si128 intrinsics function. The sliding diagonal vector is moving horizontally so I need to load 16 column vectors in advance and use one element from each of those column vectors to create the diagonal vector.
Is it possible to make a fast low-memory implementation for this with SSE?
Update Nov 14 2016: Providing some more details. In my case I read single-letter codes from a text file in FASTA format. Each letter represents a certain amino acid. Each amino acid has a specific column vector associated with it. That column vector is looked up from a constant table (a BLOSUM matrix). In C code it would look like this
while (uint8_t c = read_next_letter_from_file()) {
column_vector = lookup_from_const_table(c)
uint8_t diagonal_vector[16];
... rearrange the values from the latest column
vectors into the diagonal_vector ...
do_something(&diagonal_vector)
}
The implementation I will present only needs one column load per iteration. First we initialize some variables
const __m128i mask1=_mm_set_epi8(0,0,0,0,0,0,0,0,255,255,255,255,255,255,255,255);
const __m128i mask2=_mm_set_epi8(0,0,0,0,255,255,255,255,0,0,0,0,255,255,255,255);
const __m128i mask3=_mm_set_epi8(0,0,255,255,0,0,255,255,0,0,255,255,0,0,255,255);
const __m128i mask4=_mm_set_epi8(0,255,0,255,0,255,0,255,0,255,0,255,0,255,0,255);
__m128i v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15;
Then for each step the variable v_column_load is loaded with the next column.
v15 = v_column_load;
v7 = _mm_blendv_epi8(v7,v15,mask1);
v3 = _mm_blendv_epi8(v3,v7,mask2);
v1 = _mm_blendv_epi8(v1,v3,mask3);
v0 = _mm_blendv_epi8(v0,v1,mask4);
v_diagonal = v0;
In the next step the variable name numbers in v0, v1, v3, v7, v15 are incremented by 1 and adjusted to be in the range 0 to 15. In other words: newnumber = ( oldnumber + 1 ) modulo 16.
v0 = v_column_load;
v8 = _mm_blendv_epi8(v8,v0,mask1);
v4 = _mm_blendv_epi8(v4,v8,mask2);
v2 = _mm_blendv_epi8(v2,v4,mask3);
v1 = _mm_blendv_epi8(v1,v2,mask4);
v_diagonal = v1;
After 16 iterations the v_diagonal will start to contain the correct diagonal values.
Looking at mask1,mask2, mask3, mask4, we see a pattern that can be used to generalize this algorithm for other vector lengths (2^n).
For instance, for vector length 8, we would only need 3 masks and the iteration steps would look like this:
v7 = a a a a a a a a
v6 =
v5 =
v4 =
v3 = a a a a
v2 =
v1 = a a
v0 = a
v0 = b b b b b b b b
v7 = a a a a a a a a
v6 =
v5 =
v4 = b b b b
v3 = a a a a
v2 = b b
v1 = a b
v1 = c c c c c c c c
v0 = b b b b b b b b
v7 = a a a a a a a a
v6 =
v5 = c c c c
v4 = b b b b
v3 = a a c c
v2 = a b c
v2 = d d d d d d d d
v1 = c c c c c c c c
v0 = b b b b b b b b
v7 = a a a a a a a a
v6 = d d d d
v5 = c c c c
v4 = b b d d
v3 = a a c d
v3 = e e e e e e e e
v2 = d d d d d d d d
v1 = c c c c c c c c
v0 = b b b b b b b b
v7 = a a a a e e e e
v6 = d d d d
v5 = a a c c e e
v4 = a b b d a
v4 = f f f f f f f f
v3 = e e e e e e e e
v2 = d d d d d d d d
v1 = c c c c c c c c
v0 = b b b b f f f f
v7 = a a a a e e e e
v6 = b b d d f f
v5 = a b c d e f
v5 = g g g g g g g g
v4 = f f f f f f f f
v3 = e e e e e e e e
v2 = d d d d d d d d
v1 = c c c c g g g g
v0 = b b b b f f f f
v7 = a a c c e e g g
v6 = a b c d e f g
v6 = h h h h h h h h
v5 = g g g g g g g g
v4 = f f f f f f f f
v3 = e e e e e e e e
v2 = d d d d h h h h
v1 = c c c c g g g g
v0 = b b d d f f h h
v7 = a b c d e f g h <-- this vector now contains the diagonal
v7 = i i i i i i i i
v6 = h h h h h h h h
v5 = g g g g g g g g
v4 = f f f f f f f f
v3 = e e e e i i i i
v2 = d d d d h h h h
v1 = c c e e g g i i
v0 = b c d e f g h i <-- this vector now contains the diagonal
v0 = j j j j j j j j
v7 = i i i i i i i i
v6 = h h h h h h h h
v5 = g g g g g g g g
v4 = f f f f j j j j
v3 = e e e e i i i i
v2 = d d f f h h j j
v1 = c d e f g h i j <-- this vector now contains the diagonal
Sidenote: I discovered this way of loading a diagonal vector when I was working on an implementation of the Smith-Waterman algorithm. Some more information can be found on the old SourceForge project web page.

C reading csv file

I'm running into a problem I haven't encountered before and am baffled... for some reason when I try to read a CSV file char by char but it seems like spaces are somehow getting placed there... and what's weirder is the fact that no space chars exist anywhere. I will give an example...
char *readgd(const char *fname)
{
char *gddata, *tmp;
FILE *fp;
int buff = 1024, c = 0, ch;
if(!(fp = fopen(fname, "r")))
{
printf("\nError! Could not open %s!", fname);
return 0x00;
}
if(!(gddata = malloc(buff)))
{
fclose(fp);
printf("\nError! Memory allocation failed!");
return 0x00;
}
while(ch != EOF)
{
c++;
ch = fgetc(fp);
if(buff <= c)
{
buff += buff;
if(!(tmp = realloc(gddata, buff)))
{
free(gddata);
fclose(fp);
printf("\nError! Memory allocation failed!");
}
gddata = tmp;
}
gddata[c - 1] = ch;
if(gddata[c - 1] != ' ') printf("%c", gddata[c - 1]); //no spaces?
}
if(!(tmp = realloc(gddata, c + 1)))
{
free(gddata);
fclose(fp);
printf("\nError! Memory allocation failed!");
}
gddata = tmp;
gddata[c] = 0x00;
fclose(fp);
return gddata;
}
with the following CSV snippet:
:Tagname,Area,SecurityGroup,Container,ContainedName,ShortDesc,ExecutionRelativeOrder,ExecutionRelatedObject,UDAs,Extensions,CmdData,Address_ACbHAlmCfg,Address_ACbHWarnCfg,Address_ACbLAlmCfg,Address_ACbLWarnCfg,Address_ACbTfCfg,Address_ACrHAlmDb,Address_ACrHAlmSp,Address_ACrHAlmTmrSp,Address_ACrHWarnDb,Address_ACrHWarnSp,Address_ACrHWarnTmrSp,Address_ACrLAlmDb,Address_ACrLAlmSp,Address_ACrLAlmTmrSp,Address_ACrLWarnDb,Address_ACrLWarnSp,Address_ACrLWarnTmrSp,Address_ACrTfTmrSp,Address_bHalm,Address_bHWarn,Address_bLAlm,Address_bLwarn,Address_bMode,Address_bTfAlm,Address_rCCmd,Address_rVal,
outputs this onto the console:
 
■: T a g n a m e , A r e a , S e c u r i t y G r o u p , C o n t a i n e r , C
o n t a i n e d N a m e , S h o r t D e s c , E x e c u t i o n R e l a t i v e
O r d e r , E x e c u t i o n R e l a t e d O b j e c t , U D A s , E x t e n s
i o n s , C m d D a t a , A d d r e s s _ A C b H A l m C f g , A d d r e s s _
A C b H W a r n C f g , A d d r e s s _ A C b L A l m C f g , A d d r e s s _ A
C b L W a r n C f g , A d d r e s s _ A C b T f C f g , A d d r e s s _ A C r H
A l m D b , A d d r e s s _ A C r H A l m S p , A d d r e s s _ A C r H A l m T
m r S p , A d d r e s s _ A C r H W a r n D b , A d d r e s s _ A C r H W a r n
S p , A d d r e s s _ A C r H W a r n T m r S p , A d d r e s s _ A C r L A l m
D b , A d d r e s s _ A C r L A l m S p , A d d r e s s _ A C r L A l m T m r S
p , A d d r e s s _ A C r L W a r n D b , A d d r e s s _ A C r L W a r n S p ,
A d d r e s s _ A C r L W a r n T m r S p , A d d r e s s _ A C r T f T m r S p
, A d d r e s s _ b H a l m , A d d r e s s _ b H W a r n , A d d r e s s _ b L
A l m , A d d r e s s _ b L w a r n , A d d r e s s _ b M o d e , A d d r e s s
_ b T f A l m , A d d r e s s _ r C C m d , A d d r e s s _ r V a l ,
I am very confused as to where these spaces are coming from. Any help would be greatly appreciated.
Are you sure the CSV is not encoded with UTF-16 (using two bytes per character)?
This is the most likely reason you'd see spaces between otherwise valid ASCII characters, so try verifying the encoding first.

Resources