struct elements erasing itself - c

Ok so I am reading some things from files with this code:
for (i=0; i<start;i++)
{
filename=files[i];
if((fp=fopen(filename, "r"))==NULL)
{
printf("unable to open %s\n", filename);
exit(1);
}
while(fgets(buffer, sizeof(buffer), fp) !=NULL)
{
d[counter].id=atoi(strtok(buffer, del));
strcpy(buffer2, strtok(NULL, del));
len=strlen(buffer2);
if(buffer2[len-1]=='\n')
buffer2[len-1]='\0';
strcpy(d[counter].name, buffer2);
counter++;
}
token = strtok (filename, del1);
holder=token;
token = strtok (NULL, del1); /* section*/
token2 = strtok(holder, del2);
token2 = strtok(NULL, del2); /*course name */
for(x=z;x<counter;x++)
{
d[x].section=atoi(token);
printf("%d ", d[x].section);
strcpy(d[x].course, token2);
printf("%s %d %s %d\n", d[x].course, d[x].section, d[x].name, d[x].id);
}
z=counter;
}
Struct definition:
struct student {
char course[8];
int section;
char name[19];
int id;
};
Everything prints fine except for the "section" in the struct, for some reason the elements makes itself 0 after a certain amount.
Here is the output:
1 CSE1325 1 Sally 3233
1 CSE1325 1 George 9473
2 CSE1325 2 Tom 1234
2 CSE1325 2 Ralph 3540
2 CSE1325 2 Mary 5678
1 CSE2312 1 Tom 1234
1 CSE2312 1 Ralph 3540
1 CSE2312 1 Mary 5678
1 CSE2315 1 Tom 1234
1 CSE2315 1 Ralph 3540
1 CSE2315 1 Mary 5678
2 CSE2315 2 Sally 3233
2 CSE2315 2 George 9473
4 ENGL1301 0 Tom 1234
4 ENGL1301 0 Sally 3233
4 ENGL1301 0 Ralph 3540
4 ENGL1301 0 Mary 5678
4 ENGL1301 0 George 9473
1 HIST1311 0 Tom 1234
1 HIST1311 0 Sally 3233
1 HIST1311 0 Ralph 3540
1 HIST1311 0 Mary 5678
1 HIST1311 0 George 9473
5 MATH1426 0 Sally 3233
5 MATH1426 0 George 9473
This is the expected output:
1 CSE1325 1 Sally 3233
1 CSE1325 1 George 9473
2 CSE1325 2 Tom 1234
2 CSE1325 2 Ralph 3540
2 CSE1325 2 Mary 5678
1 CSE2312 1 Tom 1234
1 CSE2312 1 Ralph 3540
1 CSE2312 1 Mary 5678
1 CSE2315 1 Tom 1234
1 CSE2315 1 Ralph 3540
1 CSE2315 1 Mary 5678
2 CSE2315 2 Sally 3233
2 CSE2315 2 George 9473
4 ENGL1301 4 Tom 1234
4 ENGL1301 4 Sally 3233
4 ENGL1301 4 Ralph 3540
4 ENGL1301 4 Mary 5678
4 ENGL1301 4 George 9473
1 HIST1311 1 Tom 1234
1 HIST1311 1 Sally 3233
1 HIST1311 1 Ralph 3540
1 HIST1311 1 Mary 5678
1 HIST1311 1 George 9473
5 MATH1426 5 Sally 3233
5 MATH1426 5 George 9473
See how the numbers match? But for mine it doesn't, when I print d[x].section in the for loop as a standalone it prints the correct thing, but when I use it in that combined print statement for some reason it prints out 0 when reaching ENGL1301.

The course number can be eight characters, and strings in C are by convention null-terminated. Since your declaration is char course[8], when there are eight characters in the course number, the terminating null is being put off the end of course which lands in the section number, making it zero.
Declaring char course[9] should do the trick.

The 0 is almost certainly the null terminator from this value (and the other values of the same length):
"ENGL1301"
This 8 character string is 9 characters when you include the null terminator at the end of the string. In this case, the null terminator is being written past the end of the string, which happens to be where the section is stored.
To fix this, declare course as char course[9]

Related

Nested for-loop: error variable already defined

I have a nested loop in Stata with four levels of foreach statements. With this loop, I am trying to create a new variable named strata that ranges from 1 to 40.
foreach x in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 {
foreach r in 1 2 3 4 5 {
foreach s in 1 2 {
foreach a in 1 2 3 4 {
gen strata= `x' if race==`r' & sex==`s' & age==`a'
}
}
}
}
I get an error :
"variable strata already defined"
Even with the error, the loop does assign strata = 1, but not the rest of the strata. All other cells are missing/empty.
Example data:
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(age sex race)
1 2 2
1 2 1
1 1 1
1 1 1
1 2 1
2 2 1
2 2 1
4 2 1
1 2 1
4 2 1
3 2 1
2 2 1
4 2 1
4 2 2
3 2 1
4 1 3
4 2 1
4 2 1
2 1 2
4 2 1
2 2 1
3 2 1
3 2 1
1 2 3
4 2 1
1 2 5
4 2 1
4 2 1
4 2 2
4 2 1
2 2 1
4 1 1
3 2 1
1 2 1
2 2 1
4 2 1
1 2 2
2 2 3
1 1 3
4 2 1
2 2 3
1 2 1
1 1 1
2 2 3
1 2 1
1 1 3
1 2 1
2 2 1
3 2 1
1 2 1
4 2 1
1 2 2
1 2 1
2 2 1
4 2 1
4 2 1
1 2 1
1 2 1
4 2 1
2 2 1
4 2 1
1 2 1
1 1 3
2 2 1
1 1 1
4 1 1
3 2 1
2 2 1
1 2 1
1 1 1
2 2 3
4 2 2
2 2 1
2 2 1
3 2 1
2 2 2
3 2 1
2 1 1
1 1 1
3 2 1
1 2 3
4 2 1
4 2 1
2 2 1
1 2 1
1 1 1
3 2 1
4 2 1
2 2 3
1 2 3
4 2 1
3 2 1
2 2 1
4 2 1
3 2 1
2 1 1
1 2 1
2 2 1
2 2 3
1 1 1
end
label values sex sex
label def sex 1 "male (1)", modify
label def sex 2 "female (2)", modify
label values race race
label def race 1 "non-Hispanic white (1)", modify
label def race 2 "black (2)", modify
label def race 3 "AAPI/other (3)", modify
label def race 5 "Hispanic (5)", modify
generate is for generating new variables. The second time your code reaches a generate statement, the code fails for the reason given.
One answer is that you need to generate your variable outside the loops and then replace inside.
For other reasons your code can be rewritten in stages.
First, integer sequences can be more easily and efficiently specified with forvalues, which can be abbreviated: I tend to write forval.
gen strata = .
forval x = 1/40 {
forval r = 1/5 {
forval s = 1/2 {
forval a = 1/4 {
replace strata = `x' if race==`r' & sex==`s' & age==`a'
}
}
}
}
Second, the code is flawed any way. Everything ends up as 40!
Third, you can do allocations much more directly, say by
gen strata = 8 * (race - 1) + 4 * (sex - 1) + age
This is a self-contained reproducible demonstration:
clear
set obs 5
gen race = _n
expand 2
bysort race : gen sex = _n
expand 4
bysort race sex : gen age = _n
gen strata = 8 * (race - 1) + 4 * (sex - 1) + age
isid strata
Clearly you can and should vary the recipe for a different preferred scheme.

Create a Column that shows the day of the month based on a date column

I am attempting to return day of the week (i.e. Monday = 1, Tuesday = 2, etc) based on a date column ("Posting_date"). I tried a for loop but got it wrong:
#First date of table was a Sunday (1 March 2019) => so counter starts at 6
posting_df3['Day'] = (posting_df3['Posting_date'] - dt.datetime(2019,3,31)).dt.days.astype('int16')
# Start counter on the right date (31 March 2019 is a Sunday)
count = 7
for x in posting_df3['Day']:
if count != 7:
count = 1
else:
count = count + 1
posting_df3['Day'] = count
Not sure if there are other ways of doing this. Attached is an image of my database structure:
level_0 Posting_date Reservation date Book_window ADR Day
0 9 2019-03-31 2019-04-01 -1 156.00 0
1 25 2019-04-01 2019-04-01 0 152.15 1
2 11 2019-04-01 2019-04-01 0 149.40 1
3 42 2019-04-01 2019-04-01 0 141.33 1
4 45 2019-04-01 2019-04-01 0 159.36 1
... ... ... ... ... ... ...
4278 739 2020-02-21 2019-04-17 310 253.44 327
4279 739 2020-02-22 2019-04-17 310 253.44 328
4280 31 2020-03-11 2019-04-01 345 260.00 346
Final output should be 2019-03-31 Day column should return 7 since it is a Sunday
and 2019-04-01 Day column should return 1 since its Monday etc
You can do it this way
df['weekday']=pd.to_datetime(df['Posting_date']).dt.weekday+1
Input
level_0 Posting_date Reservation_date Book_window ADR Day
0 9 3/31/2019 4/1/2019 -1 156.00 0
1 25 4/1/2019 4/1/2019 0 152.15 1
2 11 4/1/2019 4/1/2019 0 149.40 1
3 42 4/1/2019 4/1/2019 0 141.33 1
4 45 4/1/2019 4/1/2019 0 159.36 1
Output
level_0 Posting_date Reservation_date Book_window ADR Day weekday
0 9 3/31/2019 4/1/2019 -1 156.00 0 7
1 25 4/1/2019 4/1/2019 0 152.15 1 1
2 11 4/1/2019 4/1/2019 0 149.40 1 1
3 42 4/1/2019 4/1/2019 0 141.33 1 1
4 45 4/1/2019 4/1/2019 0 159.36 1 1

Google Sheets, how to subtract one array from another?

I've got two arrays, and I want to subtract array2 from the other array1 to form sorted result array. Wondering if it's even possible. I've tried to search everywhere, but haven't found the solution that I know would know how to implement.
array1
NAME DATA1 DATA2 DATA3
MATT 6 2 4
ROBERT 3 2 1
JAKE 2 2 0
PETER 3 1 2
CHARLES 3 1 2
array2
NAME DATA1 DATA2 DATA3
MATT 6 2 4
JAKE 2 2 0
ROBERT 2 2 0
CHARLES 2 0 2
result array
NAME DATA1 DATA2 DATA3
PETER 3 1 2
CHARLES 1 1 0
ROBERT 1 0 1
MATT 0 0 0
JAKE 0 0 0
try:
=ARRAYFORMULA({A1:D1; QUERY(QUERY({A2:D6; IFERROR(A9:D12*-1, A9:D12)},
"select Col1,sum(Col2),sum(Col3),sum(Col4)
where Col1 is not null
group by Col1"),
"offset 1", 0)})

String array combination in R

I'm starting my studies in R, and even looking for this topic in many forums, I couldn't find a good answer. Maybe I'm not searching using the right terms, or maybe it's not possible to do in R, so please apologize my ignorance.
I would like to find how many times two professionals participates in a given project. Additional to that, I would like to map what is their position when they are found together.
I'm not using a specific notation below. For example, assume I have the following string arrays:
Project1: Bob (President), Joe (Vice President), Mary (Participant), Paul (Participant)
Project2: Bob (President), Joe (Vice President), Sue (Participant), Bill (Participant)
Project3: Paul (President), Sue (Vice President), Bob (Participant), Joe (Participant)
Project'n: (...)
The output would be:
Bob (President) & Joe (Vice President) = 2
Bob (President) & Mary (Participant) = 1
Bob (President) & Paul (Participant) = 1
Bob (Participant) & Paul (President) = 1
Sue (Vice President) & Joe (Participant) = 1
And it goes on and on, and I assume these results could be aggregate in a histogram graph. I have 86 names, participating in 38 different projects, at 3 different possible positions.
Any ideas if it would be possible to do in R? How could it accomplished? Any code templates available or documentation that I could use to get to this answer?
## MY ATTEMPT (START)
Groups <- data.frame (Name=c('Paul','Paul','Paul','Bob','Bob','Sue','Bill'),Group=c('P1','P2','P3','P1','P2','P3','P3'),Role=c('President','President','President','Vice President','Vice President','Participant','Participant'))
Table <- table (Groups)
When I print 'Table', it shows this output:
, , Role = Participant
Group
Name P1 P2 P3
Bill 0 0 1
Bob 0 0 0
Paul 0 0 0
Sue 0 0 1
, , Role = President
Group
Name P1 P2 P3
Bill 0 0 0
Bob 0 0 0
Paul 1 1 1
Sue 0 0 0
, , Role = Vice President
Group
Name P1 P2 P3
Bill 0 0 0
Bob 1 1 0
Paul 0 0 0
Sue 0 0 0
Now - for instance - in project "P1" we can see Paul as President and Bob as Vice President. Same happens in project "P2". In "P3", we have Paul as President plus Sue and Bill both as Participants.
My doubt is now how to count how many occurrences of a given relationship all over the projects. Something like:
Paul/President & Bob/Vice = 2 occurrences,
Paul/President & Sue/Participant = 1 occurrence,
Paul/President & Bill/Participant = 1 occurrence, etc
Basically a 'hist' based on the occurrences of a particular people/role combination.
## MY ATTEMPT (END)
Now that you have your Table, you can count the occurrence of different types of relationships using apply over different sets of axes:
How many occurrences of different types of participants are there for each project?
> apply(Table, c(2,3), sum)
Role
Group Participant President Vice President
P1 0 1 1
P2 0 1 1
P3 2 1 0
How many occurrences of Person-Role combinations?
> apply(Table, c(1,3), sum)
Role
Name Participant President Vice President
Bill 1 0 0
Bob 0 0 2
Paul 0 3 0
Sue 1 0 0
Which projects is each person working in?
> apply(Table, c(1,2), sum)
Group
Name P1 P2 P3
Bill 0 0 1
Bob 1 1 0
Paul 1 1 1
Sue 0 0 1
How many projects is each person working on?
> apply(Table, 1, sum)
Bill Bob Paul Sue
1 2 3 1
How many people are involved in each project?
> apply(Table, 2, sum)
P1 P2 P3
2 2 3
How many people belong to each role?
> apply(Table, 3, sum)
Participant President Vice President
2 3 2
Thanks #ScottRitchie for your tips. After some additional readings and tests, I came out with the following:
A csv file was imported with columns containing the name, project and role. I also added another column at the end, like a counter (with a constant value of 1 from end to end).
I did:
Groupings <-read.csv("~/Documents/TCC_BIGDATA/Test.csv", sep=";")
Groupings$Counter <- as.integer(Groupings$Counter)
print(Groupings)
Project Name Role Counter
1 P1 Paul President 1
2 P1 Bob Vice President 1
3 P1 Sue Participant 1
4 P1 Bill Participant 1
5 P2 Paul Vice President 1
6 P2 Bob Participant 1
7 P2 Bill President 1
8 P3 Bob President 1
9 P3 Bill Vice President 1
10 P3 Sue Participant 1
How many times a name shows in the list?
aggregate(Counter ~ Name, data = Groupings, sum)
Name Counter
1 Bill 3
2 Bob 3
3 Paul 2
4 Sue 2
How many times a Name+Role combination shows in the list?
aggregate(Counter ~ Name + Role, data = Groupings, sum)
Name Role Counter
1 Bill Participant 1
2 Bob Participant 1
3 Sue Participant 2
4 Bill President 1
5 Bob President 1
6 Paul President 1
7 Bill Vice President 1
8 Bob Vice President 1
9 Paul Vice President 1
And other exercises and combinations can be made. At the end, it is just another way to achieve the same you (#ScottRitchie) built to answer my question. I thought it would be a good idea to share so others could apply.

Subsetting Last N Values From a Data Frame, R

I have a data frame of all the results of a football season, in a data frame called new. I want to extract the last 5 games of all teams home and away. The home variable is column 1 and away variable is column 2.
Say there are 20 teams in a character vector called teams, each with a unique name. If it was just a single team it would be easy to subset - say if team1 was "Arsenal", using something like
Arsenal <- "Arsenal"
head(new[new[,1] == Arsenal | new[,2] == Arsenal,], 5)
But I want to loop through the character vector teams to obtain the last 5 results of all teams, 20 in total. Can somebody help me please?
Edit: Here is some sample data. As an example, I would like to obtain the last two games of all teams- it would be easy to subset a single team but I'm not sure how to subset multiple teams.
V1 V2 V3 V4 V5
1 Chelsea Everton 2 1 19/05/2013
2 Liverpool QPR 1 0 19/05/2013
3 Man City Norwich 2 3 19/05/2013
4 Newcastle Arsenal 0 1 19/05/2013
5 Southampton Stoke 1 1 19/05/2013
6 Swansea Fulham 0 3 19/05/2013
7 Tottenham Sunderland 1 0 19/05/2013
8 West Brom Man United 5 5 19/05/2013
9 West Ham Reading 4 2 19/05/2013
10 Wigan Aston Villa 2 2 19/05/2013
11 Arsenal Wigan 4 1 14/05/2013
12 Reading Man City 0 2 14/05/2013
13 Everton West Ham 2 0 12/05/2013
14 Fulham Liverpool 1 3 12/05/2013
15 Man United Swansea 2 1 12/05/2013
16 Norwich West Brom 4 0 12/05/2013
17 QPR Newcastle 1 2 12/05/2013
18 Stoke Tottenham 1 2 12/05/2013
19 Sunderland Southampton 1 1 12/05/2013
20 Aston Villa Chelsea 1 2 11/05/2013
21 Chelsea Tottenham 2 2 08/05/2013
22 Man City West Brom 1 0 07/05/2013
23 Wigan Swansea 2 3 07/05/2013
24 Sunderland Stoke 1 1 06/05/2013
25 Liverpool Everton 0 0 05/05/2013
26 Man United Chelsea 0 1 05/05/2013
27 Fulham Reading 2 4 04/05/2013
28 Norwich Aston Villa 1 2 04/05/2013
29 QPR Arsenal 0 1 04/05/2013
30 Swansea Man City 0 0 04/05/2013
31 Tottenham Southampton 1 0 04/05/2013
32 West Brom Wigan 2 3 04/05/2013
33 West Ham Newcastle 0 0 04/05/2013
34 Aston Villa Sunderland 6 1 29/04/2013
35 Arsenal Man United 1 1 28/04/2013
36 Chelsea Swansea 2 0 28/04/2013
37 Reading QPR 0 0 28/04/2013
38 Everton Fulham 1 0 27/04/2013
39 Man City West Ham 2 1 27/04/2013
40 Newcastle Liverpool 0 6 27/04/2013
41 Southampton West Brom 0 3 27/04/2013
42 Stoke Norwich 1 0 27/04/2013
43 Wigan Tottenham 2 2 27/04/2013
Where df is your data.frame, this will create a list of 20 data.frames with each element being the dataset for one team. This also assumes that the dataset is already ordered, since you mentioned it.
setnames(df,c('hometeam','awayteam','homegoals','awaygoals','fixturedate'))
allteams <- sort(unique(df$hometeam))
eachteamlastfive <- vector(mode = "list", length = length(allteams))
for ( i in seq(length(allteams)))
{
eachteamlastfive[[i]] <- head(df[df$hometeam==allteams[i] | df$awayteam == allteams[i], ],5)
}
take a look at sapply
sapply(unique(new[,1]), function(team) head(new[new[,1] == team | new[,2] == team,], 5))

Resources