I am trying to read a text file into a structure in c, but on the last iteration of my fscanf() loop, it changes both the numbers and text stored in the first and some of the second parts of my structure.
Debugging has revealed that this behaviour is caused by the while fscanf() loop. Although changing the size of the strings input prevented the numbers being changed, the string on the first line PMs.Party[0] still changed from = Labour to r. Here is my code:
#include<stdio.h>
#include<stdlib.h>
#include<math.h>
#include<string.h>
void PartyPwr( int Runs, int Time[12], char Prty[12][15]);
struct Data
{
char *Name[12][15];
int StrtMnth[12];
int StrtYr[12];
int EndMnth[12];
int EndYr[12];
char Party[12][15]; // if this is 13 20 it runs without numbers changing.
int TimePwr[12];
};
int main(void)
{
int Max=0;
int i=0;
FILE *PriMins;
struct Data PMs;
if ((PriMins=fopen("PM.txt", "r")) == NULL)
{
printf("Error: PM.txt cannot be read.");
system("pause");
return(1);
}
while(fscanf(PriMins, "%s %d %d %d %d %s", &PMs.Name[Max], &PMs.StrtMnth[Max], &PMs.StrtYr[Max], &PMs.EndMnth[Max], &PMs.EndYr[Max], &PMs.Party[Max]) > 0)
{
PMs.TimePwr[Max]=((PMs.EndMnth[Max] +(PMs.EndYr[Max]*12)) - (PMs.StrtMnth[Max] + (PMs.StrtYr[Max]*12)));
printf("%s %d Total term %d\n",PMs.Name[Max], PMs.EndMnth[Max],PMs.TimePwr[Max]);
printf("Max val, %d bug check %d, %d, Party %s\n",Max, PMs.TimePwr[0], PMs.TimePwr[1], PMs.Party[0]);
Max++;
}
//PartyPwr(Max, PMs.TimePwr, PMs.Party);
//printf("%d, %d", PMs.TimePwr[0], PMs.TimePwr[1]);
fclose(PriMins);
system("pause");
return(0);
}
void PartyPwr( int Runs , int Time[12], char Prty[12][15])
int i=0;
int LabPwr=0;
int ConPwr=0;
for (i=0;i<Runs;i++)
{
printf("%s\n", Prty[i]);
if (strcmp(Prty[i],"Labour")==0)
{
LabPwr=(LabPwr+Time[i]);
}
if (strcmp(Prty[i],"Conservative")==0)
{
ConPwr=(ConPwr+Time[i]);
}
if ((strcmp(Prty[i],"Conservative")!=0) && (strcmp(Prty[i],"Labour")!=0))
{
printf("An invalid party was present in the list.");
}
}
printf ("Total Labour time in power: %d\nTotal Conservative time in power: %d\n", LabPwr, ConPwr);
}
This is the text file for the programme.
Attlee 7 1945 10 1951 Labour
Churchill 11 1951 5 1955 Conservative
Eden 6 1955 12 1956 Conservative
Macmillan 1 1957 10 1963 Conservative
Douglas-Home 11 1963 10 1964 Conservative
Wilson 11 1964 5 1970 Labour
Heath 6 1970 2 1974 Conservative
Wilson 3 1974 3 1976 Labour
Callaghan 4 1976 4 1979 Labour
Thatcher 5 1979 11 1990 Conservative
Major 12 1990 4 1997 Conservative
Blair 5 1997 6 2007 Labour
Brown 6 2007 5 2010 Labour
EDIT: I've just discovered if the size of every variable in Data is increased by one, the code runs without any of the issues. I assume this is some kind of overflow?
EDIT 2: Specifically if EndYr is [13] not [12] the problem is eliminated.
The word Conservative is 12 characters, but you must account for the null char '\0' at the end of every C string.
That is why your code works when you use 13 chars array for the Party field.
What you should do
Specify the maximum length of the Party field in the scanf format specifier. For example, if you keep 12 chars array for the party field:
fscanf(PriMins, "%s %d %d %d %d %11s", &PMs.Name[Max], &PMs.StrtMnth[Max], &PMs.StrtYr[Max], &PMs.EndMnth[Max], &PMs.EndYr[Max], &PMs.Party[Max])
You are reading 13 records from the file and have space to store only 12 of them as all you data items are of size 12.If you increase the size by 1 there is enough room for all 13 records
struct Data
{
char Name[13][15];
int StrtMnth[13];
int StrtYr[13];
int EndMnth[13];
int EndYr[13];
char Party[13][15];
int TimePwr[13];
};
Related
I'm trying to access a file called 'movies.dat' that contains a movies title, gross revenue, and year of release.
Currently my program will retrieve the title and year correctly. However, the gross revenue will return the wrong value only when I try to retrieve it as an int. It will also return it correctly if the value is less than 2150000000 I tried retrieving it as an long but that returns the same issue.
movies.dat
Gone_with_the_Wind 3713000000 1939
Avatar 3263000000 2009
Titanic 3087000000 1997
Star_Wars 3049000000 1977
Avengers:_Endgame 2798000000 2019
The_Sound_of_Music 2554000000 1965
E.T._the_Extra-Terrestrial 2493000000 1982
The_Ten_Commandments 2361000000 1956
Doctor_Zhivago 2238000000 1965
Star_Wars:_The_Force_Awakens 2206000000 2015
Snow_White 2150000000 1937
Jurassic_Park 2100000000 1993
Jaws 2100000000 1975
Avengers:_Infinity_War 2050000000 2018
The_Exorcist 2000000000 1973
Output
Gone_with_the_Wind -581967296 1939
Avatar -1031967296 2009
Titanic -1207967296 1997
Star_Wars -1245967296 1977
Avengers:_Endgame -1496967296 2019
The_Sound_of_Music -1740967296 1965
E.T._the_Extra-Terrestrial -1801967296 1982
The_Ten_Commandments -1933967296 1956
Doctor_Zhivago -2056967296 1965
Star_Wars:_The_Force_Awakens -2088967296 2015
Snow_White -2144967296 1937
Jurassic_Park 2100000000 1993
Jaws 2100000000 1975
Avengers:_Infinity_War 2050000000 2018
The_Exorcist 2000000000 1973
Code
#include<stdio.h>
#include<assert.h>
int main(void)
{
//Open File and check it opened correctly
FILE * fp;
fp = fopen("movies.dat", "r");
assert("Error: Could not open file" && (fp != NULL));
char text[256] = "\000";
int gross = 0;
int year = 0;
//loop
while(!feof(fp))
{
//basic attempt
fscanf(fp, "%s %d %d", text, &gross, &year);
printf("%s %d %d\n", text, gross, year);
}
//Close File
fclose(fp);
}
2150000000 is almost 231−1, which is the value of INT_MAX when int is 32 bits. Try to read the big numbers in a long long.
This the assignment I'm trying to do.
Download the sea level data from http://climate.nasa.gov/vital-signs/sea-level (Links to an external site.). Create a program that does the following:
a. Tell the user that the program uses data from NASA to predict sea level from the years 2020 to 2050.
b. Store the sea level data in an array. You only need to use one data point for each year from 1993 to the present year. Use the last column for each row (the Global Mean Sea Level GMSL with annual and semi-annual signal removed).
c. Find the average annual change in sea level over all the years specified in the data. (Hint - use a loop to store the annual change in an array over the years, and then use a loop to compute the average annual change).
d. Assume a linear increase and compute the predicted sea level rise for the years 2020, 2025, 2030, 2035, 2040, 2045, and 2050. Store these results in their own array. (Hint - just use the average you computed in part c as the annual change for the future years).
e. Display the results for the user and be sure to reference the data set as specified in the data file so the user knows where the data came from.
Sample output:
The predicted Global Mean Sea Level is
2020 64.32
2025 68.98
2030 73.51
2035 78.12
2040 83.43
2045 88.12
2050 93.04
These predictions were made using data provided by XXXXXXXXXX
This is the code so far. However It seems to not use all of the data in the array to find the average change in sea level.
#include <stdio.h>
#include <stdlib.h>
int main()
{
//creates a file object to read data
FILE* infile = fopen("nasa.txt","r");
//checks if file exists
if(infile == NULL)
{
printf("File does not exist.\n");
return -1;
}
//create 2d array to store years and their sea levels
int level[50][2];
//number of elements in array
int n = 0,i;
char word[5];
//read data from file word by word
while(fscanf(infile, "%s", word) != EOF)
{
if(word != ' ' && word != '\n')
{
//convert string to int and store in array
level[n][0] = atoi(word);
//store sea level
fscanf(infile, "%s", word);
level[n][1] = atoi(word);
//increment n
n++;
}
}
//store avg change
float avg=0;
for(i=1;i<n;i++)
{
//add difference of consecutive elements
avg += level[i][1] - level[i-1][1];
}
//calculate mean
avg = (float)avg/n;
int c = 7; //number of predictions
//array to store results
float predictions[][2] = {{2020,0},{2025,0},{2030,0},{2035,0},
{2040,0},{2045,0},{2050,0}};
//predict future sea levels
for(i=0;i<c;i++)
{
//multiply avg change by number of years
predictions[i][1] = level[n-1][1] +
(predictions[i][0] - level[n-1][0])*avg;
}
//print avg change
printf("Average change in sea level year over year is: %f mm\n",avg);
//print predictions
for(i = 0;i<c;i++)
{
printf("Predicted sea level change since 1993 for the year %.0f: %.2f mm\n",
predictions[i][0],predictions[i][1]);
}
printf("These predictions were made using data provided by the National Aeronautics and Space Administration.");
return 0;
}
Sea level change data
1993 4
1994 7
1995 11
1996 14
1997 21
1998 20
1999 19
2000 22
2001 27
2002 31
2003 34
2004 36
2005 40
2006 42
2007 43
2008 47
2009 48
2010 54
2011 53
2012 59
2013 65
2014 68
2015 75
2016 83
2017 85
2018 88
2019 94
However It seems to not use all of the data in the array to find the average change in sea level.
avg = (float)avg/n; is same as avg = (float)(level[n-1][1] - level[0][1])/n.
Only the end-points matter.
The loop that accumulates the differences is canceling out the in-between year's values.
In a mid-year, if you add +100, it makes one year difference +100 more, and the next year difference 100 less. The running sum of the differences is not affect by that +100 in the end.
All the mid-year values could be 0 and one would get the same average.
I'm new to C and I've found a peculiar output from gcc that I'm having a hard time getting to the bottom of. The error upon running the application is:
*** stack smashing detected ***: /home/joshua/Research/cml/test terminated
Program received signal SIGABRT, Aborted.
0x00007ffff7a43428 in __GI_raise (sig=sig#entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
54 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
I've done some research on this, and it appears that this is many times caused by tying to put too large a value into an array, for example; I'm not doing anything like that.
Here is my sample code for reference:
1 #include <stdio.h>
2 #include <string.h>
3
4 struct student
5 {
6 int id;
7 char name[10];
8 float percentage;
9 };
10
11 int main()
12 {
13 int i;
14 struct student record[2];
15
16 // 1st student's record
17 record[0].id=1;
18 strcpy(record[0].name, "Raju");
19 record[0].percentage = 86.5;
20
21 // 2nd student's record
22 record[1].id=2;
23 strcpy(record[1].name, "Surendren");
24 record[1].percentage = 90.5;
25
26 // 3rd student's record
27 record[2].id=3;
28 strcpy(record[2].name, "Thiyagu");
29 record[2].percentage = 81.5;
30
31 for(i=0; i<3; i++)
32 {
33 printf(" Records of STUDENT : %d \n", i+1);
34 printf(" Id is: %d \n", record[i].id);
35 printf(" Name is: %s \n", record[i].name);
36 printf(" Percentage is: %f\n\n",record[i].percentage);
37 }
38 return 0;
39 }
The 2 in
struct student record[2];
is not the top index, it is the number of elements. And as you seem to know indexes start at zero which means the valid indexes are 0 and 1 for the above array. Going out of bounds leads to undefined behavior.
struct student record[2];
You've got an array of size 2 and you're trying to store 3 elements in it. Array indices go from 0 to n-1. record[2] is an invalid index.
To keep a check on the integrity of the functions, next to the return statement Gcc adds protection variables (called canaries) which have known values.In your case when the uninitialized access is done on record[2] you have violated the stacks integrity and the canary values are overwritten which trigger the crash.
I am writing some code in C to read some file data in arrays and keep getting a segmentation fault compiling with gcc. It reads the file up to 11th line of data then gives the fault. Been through some other similar questions on here but can't find a solution.
Thanks
code:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <string.h>
int ainb(char a[],char b[])//returns 0 if str b contains a returns 1 otherwise
{
int i=0,j0=-1,j1=0,count=0;
if(strlen(b)<strlen(a)) {return 1;}
for(i=0;i<strlen(b);i++) {
if((b[i]==a[j1])&&(j1==j0+1)){
j0=j1;j1++;
} else {
j0=-1;j1=0;
}
if((j1+1)==strlen(a)) {break;}
}
if((j1+1)==strlen(a)){
return 0;
} else {
return 1;
}
}
void read_pdb(FILE* fp,char **atm,int *atnum,char **name,char **res,char *chain,int *resnum,double *x,double *y,double *z,double *occ,double *bfac,char **elem,double ac[2][3]) //reads file lines and stores in arrays
{
printf("\nReading pdb data\n");
int i=0,j=0;
char buff[7];
fpos_t position;
while(!feof(fp))
{
fgetpos(fp,&position);fgets(buff,sizeof buff,fp);
if((ainb("ATOM",buff)==0)||(ainb("HETATM",buff)==0))
{
fsetpos(fp,&position);printf("\ngetting position %d\n",i+1);
fscanf(fp,"%6s%5d %4s %3s %1s%4d %8lf%8lf%8lf%6lf%6lf %2s \n",atm[i],&atnum[i],name[i],res[i],&chain[i],&resnum[i],&x[i],&y[i],&z[i],&occ[i],&bfac[i],elem[i]);
printf("\nnode %d data found\n",i+1);
printf("\n%6s%5d %4s %3s %1s%4d %8.3lf%8.3lf%8.3lf%6.2lf%6.2lf %2s \n",atm[i],atnum[i],name[i],res[i],&chain[i],resnum[i],x[i],y[i],z[i],occ[i],bfac[i],elem[i]);
if(ainb("HETATM",atm[i])==0){
ac[j][0]=x[i];ac[j][1]=y[i];ac[j][2]=z[i];j++;
}
i++;
}
}
printf("\n%d Atoms read\n",i);
}
void main()
{
double ac[2][3];
int N,k;
double *x,*y,*z,*occ,*bfac;
char **atm,**name,**res,**elem,*chain;
int *atnum,*resnum;
FILE *out;
out=fopen("OUT.pdb","r");//something to check for file
N=66;
//make dynamic arrays
x=(double*)malloc(N*sizeof(double));
y=(double*)malloc(N*sizeof(double));
z=(double*)malloc(N*sizeof(double));
occ=(double*)malloc(N*sizeof(double));
bfac=(double*)malloc(N*sizeof(double));
atnum=(int*)malloc(N*sizeof(int));
resnum=(int*)malloc(N*sizeof(int));
atm=(char**)malloc(N*sizeof(char));
name=(char**)malloc(N*sizeof(char));
res=(char**)malloc(N*sizeof(char));
elem=(char**)malloc(N*sizeof(char));
chain=(char*)malloc(N*sizeof(char));
for(k=0;k<N;k++)
{
atm[k]=(char*)malloc(7*sizeof(char));
name[k]=(char*)malloc(5*sizeof(char));
res[k]=(char*)malloc(4*sizeof(char));
elem[k]=(char*)malloc(3*sizeof(char));
}
//read in data
read_pdb(out,atm,atnum,name,res,chain,resnum,x,y,z,occ,bfac,elem,ac);
fclose(out);
printf("\n-------------------------------------------\nTest Complete\n");
free(x);
free(y);
free(z);
free(occ);
free(bfac);
free(elem);
free(name);
free(atm);
free(res);
free(resnum);
free(atnum);
free(chain);
}
The output is:
Reading pdb data
getting position 1
node 1 data found
ATOM 1 CA PRO A 1 4.612 0.903 5.089 1.00 24.97 C
getting position 2
node 2 data found
ATOM 2 CA SER A 2 3.526 0.341 3.809 1.00 59.99 C
getting position 3
node 3 data found
ATOM 3 CA ARG A 3 6.208 1.550 6.551 1.00 20.40 C
getting position 4
node 4 data found
ATOM 4 CA TRP A 4 5.912 2.348 4.388 1.00 50.28 C
getting position 5
node 5 data found
ATOM 5 CA GLE A 5 4.087 4.359 6.884 1.00 54.04 C
getting position 6
node 6 data found
ATOM 6 CA THR A 6 4.405 1.292 2.566 1.00 62.06 C
getting position 7
node 7 data found
ATOM 7 CA TYR A 7 3.327 3.041 5.205 1.00 50.46 C
getting position 8
node 8 data found
ATOM 8 CA VAL A 8 5.276 0.109 0.387 1.00 58.00 C
getting position 9
node 9 data found
ATOM 9 CA LEU A 9 2.992 3.190 3.084 1.00 41.48 C
getting position 10
node 10 data found
ATOM 10 CA CYS A 10 3.565 0.287 0.721 1.00 47.65 C
getting position 11
Segmentation fault (core dumped)
Lets consider this code:
name=(char**)malloc(N*sizeof(char));
for(k=0;k<N;k++)
{
name[k]=(char*)malloc(5*sizeof(char));
}
You allocate n*sizeof(char) array and try to store N pointers to char in it. But size of pointer to the char is greater than sizeof(char), so you get buffer overflow and undefined behavior even on the initialization stage. You are lucky and your program is not crashing at this stage, but it will fail on the array usage. To prevent this error you should use sizeof(char*) in your allocation code.
Rather than hard code the type, and get it wrong for name, let the compiler figure it out. Less code, easier to read and easier to code & maintain.
//bfac=(double*)malloc(N*sizeof(double));
//resnum=(int*)malloc(N*sizeof(int));
//name=(char**)malloc(N*sizeof(char)); OP was looking for `sizeof (char*)`
bfac = malloc(N * sizeof *bfac);
resnum = malloc(N * sizeof *resnum);
name = malloc(N * sizeof *name);
Also in C, no need to cast the result of malloc().
A text file holds information about a softball team. Each line has data arranged as follows:
4 Jessie Joybat 5 2 1 1
The first item is the player's number, conveniently in the range 0–18. The second item is the player's first name, and the third is the player's last name. Each name is a single word. The next item is the player's official times at bat, followed by the number of hits, walks, and runs batted in (RBIs). The file may contain data for more than one game, so the same player may have more than one line of data, and there may be data for other players between those lines. Write a program that stores the data into an array of structures. The structure should have members to represent the first and last names, the at bats, hits, walks, and RBIs (runs batted in), and the batting average (to be calculated later). You can use the player number as an array index. The program should read to end-of-file, and it should keep cumulative totals for each player.
The world of baseball statistics is an involved one. For example, a walk or reaching base on an error doesn't count as an at-bat but could possibly produce an RBI. But all this program has to do is read and process the data file, as described next, without worrying about how realistic the data is.
The simplest way for the program to proceed is to initialize the structure contents to zeros, read the file data into temporary variables, and then add them to the contents of the corresponding structure. After the program has finished reading the file, it should then calculate the batting average for each player and store it in the corresponding structure member. The batting average is calculated by dividing the cumulative number of hits for a player by the cumulative number of at-bats; it should be a floating-point calculation. The program should then display the cumulative data for each player along with a line showing the combined statistics for the entire team.
team.txt (text file I'm working with):
4 Jessie Joybat 5 2 1 1
4 Jessie Joybat 7 3 5 3
7 Jack Donner 6 3 1 2
11 Martin Garder 4 3 2 1
15 Jaime Curtis 7 4 1 2
2 Curtis Michel 3 2 2 3
9 Gillan Morthim 9 6 6 7
12 Brett Tyler 8 7 4 3
8 Hans Gunner 7 7 2 3
14 Jessie James 11 2 3 4
12 Brett Tyler 4 3 1 3
Since I'm a beginner in C, either I misinterpreted the task from what was asked originally or it's unfairly complex (I believe the former is the case). I'm so lost that I can't think of the way how could I fill in by the criteria of index (player number) every piece of data, keep track of whether he has more than one game, calculate and fetch bat average and then print.
What I have so far is:
#define LGT 30
struct profile {
int pl_num;
char name[LGT];
char lname[LGT];
int atbat[LGT/3];
int hits[LGT/3];
int walks[LGT/3];
int runs[LGT/3];
float batavg;
};
//It's wrong obviously but it's a starting point
int main(void)
{
FILE *flx;
int i,jc,flow=0;
struct profile stat[LGT]={{0}};
if((flx=fopen("team.txt","r"))==NULL) {
fprintf(stderr,"Can't read file team!\n");
exit(1);
}
for( jc = 0; jc < 11; jc++) {
fscanf(flx,"%d",&i);
stat[i].pl_num=i;
fscanf(flx,"%s",&stat[i].name);
fscanf(flx,"%s",&stat[i].lname);
fscanf(flx,"%d",&stat[i].atbat[flow]);
fscanf(flx,"%d",&stat[i].hits[flow]);
fscanf(flx,"%d",&stat[i].walks[flow]);
fscanf(flx,"%d",&stat[i].runs[flow]);
flow++;
}
}
Advice 1: don't declare arrays like atbat[LGT/3].
Advice 2: Instead of multiple fscanf you could read the whole line in a shot.
Advice 3: Since the number of players is limited and the player number has a good range (0-18), using that player number as an index into the struct array is a good idea.
Advice 4: Since you need cumulative data for each player (no need to store his history points), then you don't need arrays of integers, just an integer to represent the total.
So:
#include <stdio.h>
#define PLAYERS_NO 19
typedef struct
{
char name[20+1];
char lastName[25+1];
int atbat;
int hits;
int walks;
int runs;
float batavg;
} Profile;
int main(int argc, char** argv)
{
Profile stats[PLAYERS_NO];
int i;
FILE* dataFile;
int playerNo;
Profile tmpProfile;
int games = 0;
for(i=0; i<PLAYERS_NO; ++i)
{
stats[i].name[0] = '\0';
stats[i].lastName[0] = '\0';
stats[i].atbat = 0;
stats[i].hits = 0;
stats[i].walks = 0;
stats[i].runs = 0;
}
dataFile = fopen("team.txt", "r");
if ( dataFile == NULL )
{
fprintf(stderr, "Can't read file team!\n");
exit(1);
}
for(i=0; i<PLAYERS_NO && !feof(dataFile); ++i, ++games)
{
fscanf(dataFile, "%d", &playerNo);
if ( playerNo <0 || playerNo > PLAYERS_NO )
{
fprintf(stderr, "Player number out of range\n");
continue;
}
fscanf(dataFile, "%s %s %d %d %d %d",
&tmpProfile.name,
&tmpProfile.lastName,
&tmpProfile.atbat,
&tmpProfile.hits,
&tmpProfile.walks,
&tmpProfile.runs);
printf("READ: %d %s %s %d %d %d %d\n",
playerNo,
tmpProfile.name,
tmpProfile.lastName,
tmpProfile.atbat,
tmpProfile.hits,
tmpProfile.walks,
tmpProfile.runs);
strcpy(stats[playerNo].name, tmpProfile.name);
strcpy(stats[playerNo].lastName, tmpProfile.lastName);
stats[playerNo].atbat += tmpProfile.atbat;
stats[playerNo].hits += tmpProfile.hits;
stats[playerNo].walks += tmpProfile.walks;
stats[playerNo].runs += tmpProfile.runs;
}
/* exercise: compute the average */
fclose(dataFile);
for(i=0; i<PLAYERS_NO; ++i)
{
if ( stats[i].name[0] == '\0' )
continue;
printf("%d %s %s %d %d %d %d\n",
i,
stats[i].name,
stats[i].lastName,
stats[i].atbat,
stats[i].hits,
stats[i].walks,
stats[i].runs);
}
return 0;
}
The first rule of programming: Divide and conquer.
So you need to identify individual operations. One such operation is "load one row of input", another is "look up a player". If you have some of those operations (more will come up as you go), you can start building your program:
while( more_input ) {
row = load_one_row()
player = find_player( row.name )
if( !player ) {
player = create_player( row.name )
add_player( player )
}
... do something with row and player ...
}
when you have that, you can start to write all the functions.
An important point here is to write test cases. Start with a simple input and test the code to read a row. Do you get the correct results?
If so, test the code to find/create players.
The test cases make sure that you can forget about code that already works.
Use a framework like Check for this.
If I were doing this, I'd start with a structure that only held one "set" of data, then create an array of those structs:
struct profile {
char name[NAMELEN];
char lname[NAMELEN];
int atbat;
int hits;
int walks;
int runs;
float batavg;
};
Since you're using the player's number as the index into an array, you don't need to store it into the structure too.
I think that will simplify the problem a little bit. You don't need to store multiple data items for a single player -- when you get a duplicate, you just ignore some of the new data (like the names, which should be identical) and sum up the others (e.g., at-bats, hits).