File I/O Extraction with structures in C - c

The task is to read in a .txt file with a command line argument, within the file there is a list unstructured information listing every airport in the state of Florida note this is only a snippet of the total file. There is some data that must be ignored such as ASO ORL PR A 0 18400 - anything that does not pertain to the structured variables within AirPdata.
The assignment is asking for the site number, locID, fieldname, city, state, latitude, longitude, and if there is a control tower or not.
INPUT
03406.20*H 2FD7 AIR ORLANDO ORLANDO FL ASO ORL PR 28-26-08.0210N 081-28-23.2590W PR NON-NPIAS N A 0 18400
03406.18*H 32FL MEYER- INC ORLANDO FL ASO ORL PR 28-30-05.0120N 081-22-06.2490W PR NON-NPAS N 0 0
OUTPUT
Site# LocID Airport Name City ST Latitude Longitude Control Tower
------------------------------------------------------------------------
03406.20*H 2FD7 AIR ORLANDO ORLANDO FL 28-26-08.0210N 081-28-23.2590W N
03406.18*H 32FL MEYER ORLANDO FL 28-30.05.0120N 081-26-39.2560W N
etc.. etc. etc.. etc.. .. etc.. etc.. ..
etc.. etc. etc.. etc.. .. etc.. etc.. ..
my code so far looks like
#include <stdio.h>
#include <stdlib.h>
#include <strings.h>
typedef struct airPdata{
char *siteNumber;
char *locID;
char *fieldName;
char *city;
char *state;
char *latitude;
char *longitude;
char controlTower;
} airPdata;
int main (int argc, char* argv[])
{
char text[1000];
FILE *fp;
char firstwords[200];
if (strcmp(argv[1], "orlando5.txt") == 0)
{
fp = fopen(argv[1], "r");
if (fp == NULL)
{
perror("Error opening the file");
return(-1);
}
while (fgets(text, sizeof(text), fp) != NULL)
{
printf("%s", text);
}
}
else
printf("File name is incorrect");
fflush(stdout);
fclose(fp);
}
So far i'm able to read the whole file, then output the unstructured input onto the command line.
The next thing I tried to figure out is to extract piece by piece the strings and store them into the variables within the structure. Currently i'm stuck at this phase. I've looked up information on strcpy, and other string library functions, data extraction methods, ETL, I'm just not sure what function to use properly within my code.
I've done something very similar to this in java using substrings, and if there is a way to take a substring of the massive string of text, and set parameters on what substrings are held in what variable, that would potentially work. such as... LocID is never more than 4 characters long, so anything with a numerical/letter combination that is four letters long can be stored into airPdata.LocID for example.
After the variables are stored within the structures, I know I have to use strtok to organize them within the list under site#, locID...etc.. however, that's my best guess to approach this problem, i'm pretty lost.

I don't know what the format is. It can't be space-separated, some of the fields have spaces in them. It doesn't look fixed-width. Because you mentioned strtok I'm going to assume its tab-separated.
You can use strsep use that. strtok has a lot of problems that strsep solves, but strsep isn't standard C. I'm going to assume this is some assignment requiring standard C, so I'll begrudgingly use strtok.
The basic thing to do is to read each line, and then split it into columns with strtok or strsep.
char line[1024];
while (fgets(line, sizeof(line), fp) != NULL) {
char *column;
int col_num = 0;
for( column = strtok(line, "\t");
column;
column = strtok(NULL, "\t") )
{
col_num++;
printf("%d: %s\n", col_num, column);
}
}
fclose(fp);
strtok is funny. It keeps its own internal state of where it is in the string. The first time you call it, you pass it the string you're looking at. To get the rest of the fields, you call it with NULL and it will keep reading through that string. So that's why there's that funny for loop that looks like its repeating itself.
Global state is dangerous and very error prone. strsep and strtok_r fix this. If you're being told to use strtok, find a better resource to learn from.
Now that we have each column and its position, we can do what we like with it. I'm going to use a switch to choose only the columns we want.
for( column = strtok(line, "\t");
column;
column = strtok(NULL, "\t") )
{
col_num++;
switch( col_num ) {
case 1:
case 2:
case 3:
case 4:
case 5:
case 9:
case 10:
case 13:
printf("%s\t", column);
break;
default:
break;
}
}
puts("");
You can do whatever you like with the columns at this point. You can print them immediately, or put them in a list, or a structure.
Just remember that column is pointing to memory in line and line will be overwritten. If you want to store column, you'll have to copy it first. You can do that with strdup but *sigh* that isn't standard C. strcpy is really easy to use wrong. If you're stuck with standard C, write your own strdup.
char *mystrdup( const char *src ) {
char *dst = malloc( (sizeof(src) * sizeof(char)) + 1 );
strcpy( dst, src );
return dst;
}

Related

Bus Error on void function return

I'm learning to use libcurl in C. To start, I'm using a randomized list of accession names to search for protein sequence files that may be found hosted here. These follow a set format where the first line is a variable length (but which contains no information I'm trying to query) then a series of capitalized letters with a new line every sixty (60) characters (what I want to pull down, but reformat to eighty (80) characters per line).
I have the call itself in a single function:
//finds and saves the fastas for each protein (assuming on exists)
void pullFasta (proteinEntry *entry, char matchType, FILE *outFile) {
//Local variables
URL_FILE *handle;
char buffer[2] = "", url[32] = "http://www.uniprot.org/uniprot/", sequence[2] = "";
//Build full URL
/*printf ("u:%s\nt:%s\n", url, entry->title); /*This line was used for debugging.*/
strcat (url, entry->title);
strcat (url, ".fasta");
//Open URL
/*printf ("u:%s\n", url); /*This line was used for debugging.*/
handle = url_fopen (url, "r");
//If there is data there
if (handle != NULL) {
//Skip the first line as it's got useless info
do {
url_fread(buffer, 1, 1, handle);
} while (buffer[0] != '\n');
//Grab the fasta data, skipping newline characters
while (!url_feof (handle)) {
url_fread(buffer, 1, 1, handle);
if (buffer[0] != '\n') {
strcat (sequence, buffer);
}
}
//Print it
printFastaEntry (entry->title, sequence, matchType, outFile);
}
url_fclose (handle);
return;
}
With proteinEntry being defined as:
//Entry for fasta formatable data
typedef struct proteinEntry {
char title[7];
struct proteinEntry *next;
} proteinEntry;
And the url_fopen, url_fclose, url_feof, url_read, and URL_FILE code found here, they mimic the file functions for which they are named.
As you can see I've been doing some debugging with the URL generator (uniprot URLs follow the same format for different proteins), I got it working properly and can pull down the data from the site and save it to file in the proper format that I want. I set the read buffer to 1 because I wanted to get a program that was very simplistic but functional (if inelegant) before I start playing with things, so I would have a base to return to as I learned.
I've tested the url_<function> calls and they are giving no errors. So I added incremental printf calls after each line to identify exactly where the bus error is occurring and it is happening at return;.
My understanding of bus errors is that it's a memory access issue wherein I'm trying to get at memory that my program doesn't have control over. My confusion comes from the fact that this is happening at the return of a void function. There's nothing being read, written, or passed to trigger the memory error (as far as I understand it, at least).
Can anyone point me in the right direction to fix my mistake please?
EDIT: As #BLUEPIXY pointed out I had a potential url_fclose (NULL). As #deltheil pointed out I had sequence as a static array. This also made me notice I'm repeating my bad memory allocation for url, so I updated it and it now works. Thanks for your help!
If we look at e.g http://www.uniprot.org/uniprot/Q6GZX1.fasta and skip the first line (as you do) we have:
MNAKYDTDQGVGRMLFLGTIGLAVVVGGLMAYGYYYDGKTPSSGTSFHTASPSFSSRYRY
Which is a 60 characters string.
When you try to read this sequence with:
//Grab the fasta data, skipping newline characters
while (!url_feof (handle)) {
url_fread(buffer, 1, 1, handle);
if (buffer[0] != '\n') {
strcat (sequence, buffer);
}
}
The problem is sequence is not expandable and not large enough (it is a fixed length array of size 2).
So make sure to choose a large enough size to hold any sequence, or implement the ability to expand it on-the-fly.

Having trouble comparing strings in file to an array of strings inputted by user in C

I have tried to research this question, but was unable to find anything that would help me. I have been constantly trying to debug using fprint, but I still cannot figure it out.
I am an intermediate programmer, and would love if I could get some help here. Here is my code:
int i = 0;
const int arraySize = 10;
char buf[256];
char str[256];
char buffer[256];
char *beerNames[arraySize] = { };
FILE *names;
FILE *percent;
i = 0;
int numBeers = 0;
printf("Please enter a name or (nothing to stop): ");
gets(buf);
while (strcmp(buf, "") != 0) {
beerNames[i] = strdup(buf);
i++;
numBeers++;
if (numBeers == arraySize)
break;
printf("Please enter a name or (nothing to stop): ");
gets(buf);
}
// now open files and look for matches of names: //
names = fopen("Beer_Names.txt", "r");
percent = fopen("Beer_Percentage.txt", "r");
while (fgets(str, sizeof(str) / sizeof(str[0]), names) != NULL) {
fgets(buffer, sizeof(buffer) / sizeof(buffer[0]), percent);
for (i = 0; i < numBeers; i++) {
if (strcmp(str, beerNames[i]) == 0) {
printf("Beer: %s Percentage: %s\n", str, beerNames[i]);
break;
}
}
}
fclose(names);
fclose(percent);
So, the issue that I am having is when I try to strcmp(), it is not comparing properly and is returning either a -1 or a 1. I have tried printing out the strcmp() values as well and it just ends up skipping the match when it equals to 0.
My Beer_Names.txt (shortened) looks like this:
Anchor Porter
Anchor Steam
Anheuser Busch Natural Light
Anheuser Busch Natural Ice
Aspen Edge
Big Sky I.P.A.
Big Sky Moose Drool Brown Ale
Big Sky Powder Hound (seasonal)
Big Sky Scape Goat Pale Ale
Big Sky Summer Honey Ale (seasonal)
Blatz Beer
Blatz Light
Blue Moon
And my Beer_Percentage.txt (shortened) looks like this:
5.6
4.9
4.2
5.9
4.1
6.2
5.1
6.2
4.7
14.7
4.8
0
5.4
This is not for a homework assignment, I am just doing a personal project and I trying to get better at C.
You're problem is that gets() does not return the newline character as part of the string, while fgets() does.
So when the user entered value "Anchor Porter" is read with gets, your string looks like this "Anchor Porter\0", but when you read it from a file with fgets it ends up like this "Anchor Porter\n\0", which will not compare equal.
gets(buf);
I know gets(3) is convenient, and I know this is a toy, but please do not use gets(3). It is impossible to write secure code with gets(3) and there is a reasonable chance that future C libraries might not even include this function. (Yes, I know it is standardized but we can hope future versions will omit it; POSIX.1-2008 has removed it.) Reasonable compilers will warn you about its use. Use fgets(3) instead.
while (fgets(str, sizeof(str) / sizeof(str[0]), names) != NULL) {
sizeof(char) is defined to be 1. This is unlikely to change, and you're unlikely to change the type of the array. It's generally not a big deal, but you cannot use a construct like this as often as you might suspect -- you can use it in this case only because str[] was declared in an enclosing scope of this line. If str were passed as a parameter, the sizeof(str) operator would return the size of a data pointer and not the size of the array. Don't get too used to this construct -- it won't always work as you expect.
names = fopen("Beer_Names.txt", "r");
percent = fopen("Beer_Percentage.txt", "r");
while (fgets(str, sizeof(str) / sizeof(str[0]), names) != NULL) {
fgets(buffer, sizeof(buffer) / sizeof(buffer[0]), percent);
Please take the time to check fopen(3) for success or failure. It's a good habit to get into, and if you provide a good error message, it might save you time in the future, too. Replace the fopen() lines with something like this:
names = fopen("Beer_Names.txt", "r");
percent = fopen("Beer_Percentage.txt", "r");
if (!names) {
perror("failed to open Beer_Names.txt");
exit(1);
}
if (!percent) {
perror("failed to open Beer_Percentage.txt");
exit(1);
}
You could wrap that up into a function that does fopen(), checks the return value, and either prints the error message and quits or returns the FILE* object.
And now, the bug that brought you here: Robert has pointed out that fgets(3) and gets(3) handle the terminating newline of input differently. (One more reason to get ridd of gets(3) as soon as possible.)

Parsing commands shell-like in C

I want to parse user input commands in my C (just C) program. Sample commands:
add node ID
add arc ID from ID to ID
print
exit
and so on. Then I want to do some validation with IDs and forward them to specified functions. Functions and validations are of course ready. It's all about parsing and matching functions...
I've made it with many ifs and strtoks, but I'm sure it's not the best way... Any ideas (libs)?
I think what you want is something like this:
while (1)
{
char *line = malloc(128); // we need to be able to increase the pointer
char *origLine = line;
fgets(line, 128, stdin);
char command[20];
sscanf(line, "%20s ", command);
line = strchr(line, ' ');
printf("The Command is: %s\n", command);
unsigned argumentsCount = 0;
char **arguments = malloc(sizeof(char *));
while (1)
{
char arg[20];
if (line && (sscanf(++line, "%20s", arg) == 1))
{
arguments[argumentsCount] = malloc(sizeof(char) * 20);
strncpy(arguments[argumentsCount], arg, 20);
argumentsCount++;
arguments = realloc(arguments, sizeof(char *) * argumentsCount + 1);
line = strchr(line, ' ');
}
else {
break;
}
}
for (int i = 0; i < argumentsCount; i++) {
printf("Argument %i is: %s\n", i, arguments[i]);
}
for (int i = 0; i < argumentsCount; i++) {
free(arguments[i]);
}
free(arguments);
free(origLine);
}
You can do what you wish with 'command' and 'arguments' just before you free it all.
It depends on how complicated your command language is. It might be worth going to the trouble of womping up a simple recursive descent parser if you have more than a couple of commands, or if each command can take multiple forms, such as your add command.
I've done a couple of RDPs by hand for some projects in the past. It's a bit of work, but it allows you to handle some fairly complex commands that wouldn't be straightforward to parse otherwise. You could also use a parser generator like lex/yacc or flex/bison, although that may be overkill for what you are doing.
Otherwise, it's basically what you've described; strok and a bunch of nested if statements.
I just wanted to add something to Richard Ross's reply: Check the returned value from malloc and realloc. It may lead to hard-to-find crashes in your program.
All your command line parameters will be stored into a array of strings called argv.
You can access those values using argv[0], argv[1] ... argv[n].

fgets() seems to overflow input to other variables

I'm doing a read from a file, but the input seems to "overflow" into other variables.
I have these 2 variables:
char str[250]; //used to store input from stream
char *getmsg; //already points to some other string
The problem is, when I use fgets() to read the input
printf("1TOKEN:%s\n",getmsg);
fp=fopen("m.txt","r");
fp1=fopen("m1.txt","w");
if(fp!=NULL && fp1!=NULL)
printf("2TOKEN:%s\n",getmsg);
while(fgets(str,250,fp)!=NULL){
printf("3TOKEN:%s\n",getmsg);
printf("read:%s",str);
printf("4TOKEN:%s\n",getmsg);
I get something like this:
1TOKEN:c
2TOKEN:c
3TOKEN:b atob atobbody
read:a b atob atobbody
4TOKEN:b atob atobbody
You see how str kind of flows into getmsg. What happened there? How can I avoid this from happening?
Thanks in advance :)
in the code, "getmsg" is called "token", I thought it might have something to do with identical names or something so I changed it to getmsg, same error, so I changed it back...
if(buf[0]=='C'){
int login_error=1;
fp=fopen("r.txt","r");
if(fp!=NULL){
memcpy(&count,&buf[1],2);
pack.boxid=ntohs(count);
memcpy(pack.pword,&buf[3],10);
printf("boxid:%u pword:%s\n",pack.boxid,pack.pword);
while(fgets(str,250,fp)!=NULL){
/*"getmsg"===>*/ token=strtok(str," ");
token=strtok(NULL," ");//receiver uname
token1=strtok(NULL," ");//pword
token2=strtok(NULL," ");//boxid
sscanf(token2,"%hu",&count);//convert char[] to unsigned short
if(pack.boxid==count && strcmp(token1,pack.pword)==0){//uname & pword found
login_error=0;
printf("found:token:%s\n",token);
break;
}
}
if(login_error==1){
count=65535;
pack.boxid=htons(count);
}
if(login_error==0){
count=0;
pack.boxid=htons(count);
}
fclose(fp);
}
printf("1TOKEN:%s\n",token);
if(login_error==0){
int msg_error=1;
fp=fopen("m.txt","r");
fp1=fopen("m1.txt","w");
if(fp!=NULL && fp1!=NULL){
printf("2TOKEN:%s\n",token);
while(fgets(str,250,fp)!=NULL){
printf("3TOKEN:%s\n",token);
printf("read:%s",str);
token1=strtok(str," ");//sender
token2=strtok(NULL," ");//receiver
token3=strtok(NULL," ");//subject
token4=strtok(NULL," ");//body
printf("m.txt:token1:%s token2:%s token3:%s token4:%s\n",token1,token2,token3,token4);
if(msg_error==1 && strcmp(token,token2)==0){//message found
msg_error=0;
count=0;
pack.boxid=htons(count);
strcpy(pack.uname,token1);
strcpy(pack.subject,token3);
strcpy(pack.body,token4);
printf("pack:uname:%s subject:%s body:%s token:%s token2:%s strcmp:%d\n",pack.uname,pack.subject,pack.body,token,token2,strcmp(token,token2));
continue;
}
fprintf(fp1,"%s %s %s %s\n",token1,token2,token3,token4);
}
if(msg_error==1){
count=65534;
pack.boxid=htons(count);
}
printf("count:%u -> boxid:%u\n",count,pack.boxid);
fclose(fp);
fclose(fp1);
}
str[0]='c';
memcpy(&str[1],&pack.boxid,2);
memcpy(&str[3],pack.uname,8);
memcpy(&str[11],pack.subject,20);
memcpy(&str[31],pack.body,200);
str[231]='\0';
bytes=232;
}
}
below is m.txt, it is used to store senders, receivers, subjects and msgbodies:
the naming patter is quite obvious >.^
a b atob atobbody
a c atoc atoccc
b c btoc btoccccc
b a btoa btoaaaaa
So I'm trying to get a msg stored in m.txt for the recipient "c", but it flows over, and by much coincidence, it returns the msg for "b"...
It looks like getmsg is pointing to the third character of your str buffer:
`str` is "a b atob atobbody"
^
|
\__ `getmsg` is pointing there.
Therefore, every time you change str by calling fgets(), the string pointed to by getmsg also changes, since it uses the same memory.

Why does my program read an extra structure?

I'm making a small console-based rpg, to brush up on my programming skills.
I am using structures to store character data. Things like their HP, Strength, perhaps Inventory down the road. One of the key things I need to be able to do is load and save characters. Which means reading and saving structures.
Right now I'm just saving and loading a structure with first name and last name, and attempting to read it properly.
Here is my code for creating a character:
void createCharacter()
{
char namebuf[20];
printf("First Name:");
if (NULL != fgets(namebuf, 20, stdin))
{
char *nlptr = strchr(namebuf, '\n');
if (nlptr) *nlptr = '\0';
}
strcpy(party[nMember].fname,namebuf);
printf("Last Name:");
if (NULL != fgets(namebuf, 20, stdin))
{
char *nlptr = strchr(namebuf, '\n');
if (nlptr) *nlptr = '\0';
}
strcpy(party[nMember].lname,namebuf);
/*Character created, now save */
saveCharacter(party[nMember]);
printf("\n\n");
loadCharacter();
}
And here is the saveCharacter function:
void saveCharacter(character party)
{
FILE *fp;
fp = fopen("data","a");
fwrite(&party,sizeof(party),1,fp);
fclose(fp);
}
and the loadCharacter function
void loadCharacter()
{
FILE *fp;
character tempParty[50];
int loop = 0;
int count = 1;
int read = 2;
fp= fopen("data","r");
while(read != 0)
{
read=fread(&tempParty[loop],sizeof(tempParty[loop]),1,fp);
printf("%d. %s %s\n",count,tempParty[loop].fname,tempParty[loop].lname);
loop++;
count++;
}
fclose(fp);
}
So the expected result of the program is that I input a name and last name such as 'John Doe', and it gets appended to the data file. Then it is read in, maybe something like
1. Jane Doe
2. John Doe
and the program ends.
However, my output seems to add one more blank structure to the end.
1. Jane Doe
2. John Doe
3.
I'd like to know why this is. Keep in mind I'm reading the file until fread returns a 0 to signify it's hit the EOF.
Thanks :)
Change your loop:
while( fread(&tempParty[loop],sizeof(tempParty[loop]),1,fp) )
{
// other stuff
}
Whenever you write file reading code ask yourself this question - "what happens if I read an empty file?"
You have an algorithmic problem in your loop, change it to:
read=fread(&tempParty[loop],sizeof(tempParty[loop]),1,fp);
while(read != 0)
{
//read=fread(&tempParty[loop],sizeof(tempParty[loop]),1,fp);
printf("%d. %s %s\n",count,tempParty[loop].fname,tempParty[loop].lname);
loop++;
count++;
read=fread(&tempParty[loop],sizeof(tempParty[loop]),1,fp);
}
There are ways to ged rid of the double fread but first get it working and make sure you understand the flow.
Here:
read=fread(&tempParty[loop],sizeof(tempParty[loop]),1,fp);
printf("%d. %s %s\n",count,tempParty[loop].fname,tempParty[loop].lname);
You are not checking whether the read was successful (the return value of fread()).
while( 1==fread(&tempParty[loop],sizeof*tempParty,1,fp) )
{
/* do anything */
}
is the correct way.
use fopen("data","rb")
instead of fopen("data","r") which is equivalent to fopen("data","rt")
You've got the answer to your immediate question but it's worth pointing out that blindly writing and reading whole structures is not a good plan.
Structure layouts can and do change depending on the compiler you use, the version of that compiler and even with the exact compiler flags used. Any change here will break your ability to read files saved with a different version.
If you have ambitions of supporting multiple platforms issues like endianness also come into play.
And then there's what happens if you add elements to your structure in later versions ...
For robustness you need to think about defining your file format independently of your code and having your save and load functions handle serialising and de-serialising to and from this format.

Resources