How can I split an input using multiple delimiters? - c

I want to split into tokens an lessons.txt file. This file has some people and these people's lessons. How can I do it ?
There is my lessons.txt file :
George Adam :Math,Science,Germany
Elizabeth McCurry :Music,Math,History
Tom Hans :Science,Music
Firstly, I want to split into ":". And I want to store names in an array. Secondly , I want to split into "," and these lessons I want to store an different array. How can I this ?
There is my code below :
char names[100] , *token, *lecture;
file=fopen("C:\\lessons.txt","r");
while(!feof(file))
{
fgets(names,sizeof(names),file);
printf("%s",names);
token=strtok(names,":");
while(token!=NULL)
{
token=strtok(NULL,":");
printf(" \n %s",token);
lecture=strtok(token,",");
while(lecture!=NULL)
{
lecture=strtok(NULL,",");
printf(" \n\n %s",lecture);
}
}
}
fclose(file);

So you want names to be stored in a separate array, and lessons to be stored in another?
You will need two separate tokens, you are using the same token for names and lessons.
Try this :
FILE *file;
file = fopen("C:\\lessons.txt", "r");
char names[100], *token, *difftok;
while (fgets(names, sizeof(names), file) != NULL) {
token = strtok(names, ":")
//puts(token); ---> George Adams
difftok = strtok(NULL, ",");
//puts(difftok); ---> Math
difftok = strtok(NULL, ",");
//puts(difftok); ---> Science
difftok = strtok(NULL, "\n");
//puts(difftok); ---> Germany
}
fclose(fp);
}
In my excerpt, token will always represent names, and difftok will always be lectures, from here I think you can figure out how to store the tokens into an array. Token goes into one, difftok into another.
Also, your EOF condition is wrong, feof returns a non-zero when it reaches end of file :
while(!feof(file))
Should be:
while(feof(file) == 0)
However, in this case I used fgets(...) != NULL because fgets return NULL when it reached end of file. You should probably use my condition as feof(file) == 0 encounters some end of file problems when used with your code and messes up the way the tokens parse the string.

Related

fgets loop only works properly if string is malloc'd again at the end of the loop

I have to read in comma separated lines from a file, break down the args between that commas, and handle them accordingly. I have the following code set up to do exactly what I need:
char *string = malloc(MAX_INPUT);
char * current_parent;
char * current_child;
// while there are still lines to read from the file
while(fgets(string, BUFF_SIZE, file) != NULL){
// get the parent of a line and its first child
current_parent = trim(strtok(string, ","));
if(strcmp(current_parent, "") != 0){
current_child = trim(strtok(NULL, ","));
if(current_child == NULL || strcmp(current_child, "") == 0){
tree = add_child(tree, current_parent, "");
}
else{
while(current_child != NULL){
tree = add_child(tree, current_parent, current_child);
current_child = trim(strtok(NULL, ","));
}
}
current_parent = 0;
current_child = 0;
}
string = (char *)malloc(MAX_INPUT);
}
// close the file
free(string);
fclose(file);
Example of a file:
john 1, sam 2
sam 2, ben 4, frances, sam 3
ben 4, sam 4, ben 5, nancy 2, holly
ben 5, john 2, sam 5
Whatever is before the first comma is the parent's name, and all other strings after that are its children's names.
My problem is that for whatever reason it only reads in the data properly if I malloc the string all over again at the end of the loop. In addition to this, if I try to free the memory before malloc'ing again, it doesn't work - resulting in memory leaks. I've even tried setting up the loop to just use a fixed sized string, but it still then reads in this jumbled data at the end. I've also tried manually emptying string at the end of each loop. I have a feeling that I may be overlooking something really simple but this has been driving me crazy. Let me know if I can provide anything else that would be of help. Thanks.
Edit: Not sure if this is important for anything but MAX_INPUT and BUFF_SIZE are both 1024

File I/O Extraction with structures in C

The task is to read in a .txt file with a command line argument, within the file there is a list unstructured information listing every airport in the state of Florida note this is only a snippet of the total file. There is some data that must be ignored such as ASO ORL PR A 0 18400 - anything that does not pertain to the structured variables within AirPdata.
The assignment is asking for the site number, locID, fieldname, city, state, latitude, longitude, and if there is a control tower or not.
INPUT
03406.20*H 2FD7 AIR ORLANDO ORLANDO FL ASO ORL PR 28-26-08.0210N 081-28-23.2590W PR NON-NPIAS N A 0 18400
03406.18*H 32FL MEYER- INC ORLANDO FL ASO ORL PR 28-30-05.0120N 081-22-06.2490W PR NON-NPAS N 0 0
OUTPUT
Site# LocID Airport Name City ST Latitude Longitude Control Tower
------------------------------------------------------------------------
03406.20*H 2FD7 AIR ORLANDO ORLANDO FL 28-26-08.0210N 081-28-23.2590W N
03406.18*H 32FL MEYER ORLANDO FL 28-30.05.0120N 081-26-39.2560W N
etc.. etc. etc.. etc.. .. etc.. etc.. ..
etc.. etc. etc.. etc.. .. etc.. etc.. ..
my code so far looks like
#include <stdio.h>
#include <stdlib.h>
#include <strings.h>
typedef struct airPdata{
char *siteNumber;
char *locID;
char *fieldName;
char *city;
char *state;
char *latitude;
char *longitude;
char controlTower;
} airPdata;
int main (int argc, char* argv[])
{
char text[1000];
FILE *fp;
char firstwords[200];
if (strcmp(argv[1], "orlando5.txt") == 0)
{
fp = fopen(argv[1], "r");
if (fp == NULL)
{
perror("Error opening the file");
return(-1);
}
while (fgets(text, sizeof(text), fp) != NULL)
{
printf("%s", text);
}
}
else
printf("File name is incorrect");
fflush(stdout);
fclose(fp);
}
So far i'm able to read the whole file, then output the unstructured input onto the command line.
The next thing I tried to figure out is to extract piece by piece the strings and store them into the variables within the structure. Currently i'm stuck at this phase. I've looked up information on strcpy, and other string library functions, data extraction methods, ETL, I'm just not sure what function to use properly within my code.
I've done something very similar to this in java using substrings, and if there is a way to take a substring of the massive string of text, and set parameters on what substrings are held in what variable, that would potentially work. such as... LocID is never more than 4 characters long, so anything with a numerical/letter combination that is four letters long can be stored into airPdata.LocID for example.
After the variables are stored within the structures, I know I have to use strtok to organize them within the list under site#, locID...etc.. however, that's my best guess to approach this problem, i'm pretty lost.
I don't know what the format is. It can't be space-separated, some of the fields have spaces in them. It doesn't look fixed-width. Because you mentioned strtok I'm going to assume its tab-separated.
You can use strsep use that. strtok has a lot of problems that strsep solves, but strsep isn't standard C. I'm going to assume this is some assignment requiring standard C, so I'll begrudgingly use strtok.
The basic thing to do is to read each line, and then split it into columns with strtok or strsep.
char line[1024];
while (fgets(line, sizeof(line), fp) != NULL) {
char *column;
int col_num = 0;
for( column = strtok(line, "\t");
column;
column = strtok(NULL, "\t") )
{
col_num++;
printf("%d: %s\n", col_num, column);
}
}
fclose(fp);
strtok is funny. It keeps its own internal state of where it is in the string. The first time you call it, you pass it the string you're looking at. To get the rest of the fields, you call it with NULL and it will keep reading through that string. So that's why there's that funny for loop that looks like its repeating itself.
Global state is dangerous and very error prone. strsep and strtok_r fix this. If you're being told to use strtok, find a better resource to learn from.
Now that we have each column and its position, we can do what we like with it. I'm going to use a switch to choose only the columns we want.
for( column = strtok(line, "\t");
column;
column = strtok(NULL, "\t") )
{
col_num++;
switch( col_num ) {
case 1:
case 2:
case 3:
case 4:
case 5:
case 9:
case 10:
case 13:
printf("%s\t", column);
break;
default:
break;
}
}
puts("");
You can do whatever you like with the columns at this point. You can print them immediately, or put them in a list, or a structure.
Just remember that column is pointing to memory in line and line will be overwritten. If you want to store column, you'll have to copy it first. You can do that with strdup but *sigh* that isn't standard C. strcpy is really easy to use wrong. If you're stuck with standard C, write your own strdup.
char *mystrdup( const char *src ) {
char *dst = malloc( (sizeof(src) * sizeof(char)) + 1 );
strcpy( dst, src );
return dst;
}

C - Reading CSV file in char array

I have a very little experience in C programming, particularly File Handling. I am developing a project in which I'm supposed to create a Sign Up/Log In system. I have a .csv file in which the data are separated by ,
What I am trying to do is reading the first and second column into two char arrays respectively.
char userLogin[100];
char userPassword[100];
FILE *file3 = fopen("C:\\Users\\Kshitiz\\Desktop\\BAAS\\signup_db.csv","r");
if(file3 != NULL){
while(!feof(file3)){
fscanf(file3,"%[^,],%s",userLogin,userPassword);
puts(userLogin);
puts(userPassword);
}
}
fclose(file3);
Content of signup_db.csv:
Username,Password
SBI063DDN,Qazwsx1234
ICICIDDN456,WSXEDC1234r
Expected Output:
Username
Password
SBI063DDN
Qazwsx1234
ICICIDDN456
WSXEDC1234r
Output which I'm getting:
Username
Password
SBI063DDN
Qazwsx1234
ICICIDDN456
WSXEDC1234r
WSXEDC1234r
Can anyone please help me how can I resolve this issue? Thank you!
The 'fscanf()' function returns the number of items of the argument list successfully filled. So instead try this:
while(fscanf(file3,"%[^,],%s",userLogin,userPassword) == 2)
{
puts(userLogin);
puts(userPassword);
}
The problem you mentioned is probably because of a new line character at the end of your file. When you read the last line, you have not yet reached the end of file. The above code solves this issue.
In my case I have the expected results, but I don't know if there is a difference with the compiler or if my csv file is different (I've tried to recreate it). Here is another way to parse the file, check if you have the expected results:
#include <stdio.h>
#include <string.h>
#define LINE_LENGTH 1000
int main(void) {
char userLogin[100];
char userPassword[100];
char line[LINE_LENGTH];
char *delimiter = ",";
char *token;
FILE *file3 = fopen("signup_db.csv", "r");
while(fgets(line, LINE_LENGTH, file3) != NULL) {
token = strtok(line, delimiter);
printf("%s\n", token);
token = strtok(NULL, delimiter);
printf("%s\n", token);
}
fclose(file3);
}

The last character is not printed to a file

I am trying to figure out why using C function strtok is not working properly for me. Here's the problem:
I have a file which contains two types of information: headers and text descriptions. Each line in the file is either a header or part of a text description. A header starts with '>'. The description text follows the header and can span multiple lines. At the end of the text there is an empty line which separates the description from the next header. My aim is to write two separate files: one contains the headers on each line and the other contains the corresponding description on a line by itself. To implement the codes in C, I used fgets to read the file one line at a time into dynamically allocated memory. In order to write the description text on one single line, I used `strtok to get rid of any new line characters exists in the text.
My code is working properly for the header files. However, for the descriptions file, I noticed that the last character of the text is not printed out to the file even though it is printed to the stdout.
FILE *headerFile = fopen("Headers", "w"); //to write headers
FILE *desFile = fopen("Descriptions", "w"); //to write descriptions
FILE *pfile = fopen("Data","r");
if ( pfile != NULL )
{
int numOfHeaders =0;
char **data1 = NULL; //an array to hold a header line
char **data2 = NULL; //an array to hold a description line
char line[700] ; //maximum size for the line
while (fgets(line, sizeof line, pfile ))
{
if(line[0] =='>') //It is a header
{
data1 = realloc(data1,(numOfHeaders +1)* sizeof(*data1));
data1[numOfHeaders]= malloc(strlen(line)+1);
strcpy(data1[numOfHeaders],line);
fprintf(headerFile, "%s",line);//writes the header
if(numOfHeaders >0)
fprintf(desFile, "\n");//writes a new line in the desc file
numOfHeaders++;
}
//it is not a header and not an empty line
if(line[0] != '>' && strlen(line)>2)
{
data2 = realloc(data2,(numOfHeaders +1)* sizeof(*data2));
data2[numOfHeaders]= malloc(strlen(line)+1);
char *s = strtok(line, "\n ");
strcpy(data2[numOfHeaders],s);
fprintf(desFile, "%s",data2[numOfHeaders]);
printf(desFile, "%s",data2[numOfHeaders]);
}
} //end-while
fclose(desFile);
fclose(headerFile);
fclose(pfile );
printf("There are %d headers in the file.\n",numOfHeaders);
}
As mentioned in the comments:
fprintf(desFile, "%s",data2[numOfHeaders]); //okay
printf(desFile, "%s",data2[numOfHeaders]); //wrong
Second line should be:
printf("%s",data2[numOfHeaders]); //okay
Or, you could do this:
sprintf(buffer, "%s",data2[numOfHeaders]);
fprintf(desFile, buffer);
printf(buffer);
Other possible issues:
Without an input file it is not possible to know for certain what strtok() is doing, but here is a guess based on what you have described:
In these two lines:
data2[numOfHeaders]= malloc(strlen(line)+1);
char *s = strtok(line, "\n ");
if the string contained in data2 has any embedded spaces, s will only contain the segment occurring before that space. And because you are only calling it once before line gets refreshed:
while (fgets(line, sizeof line, pfile ))
only one token (the very first segment) will be read.
Not always, but Normally, strtok() is called in a loop:
char *s = {0};
s= strtok(stringToParse, "\n ");//make initial call before entering loop
while(s)//ALWAYS test to see if s contains new content, else NULL
{
//do something with s
strcpy(data2[numOfHeaders],s);
//get next token from string
s = strtok(NULL, "\n ");//continue to tokenize string until s is null
}
But, as I said above, you are calling it only once on that string before the content of the string is changed. It is possible then, that the segment not printing has simply not yet been tokenized by strtok().

All objects have the same name

I'm doing my final project for my algorithms course in C. For the project, we have to take an input text file that contains lines like:
P|A|0
or
E|0|1|2
The former indicates a vertex to be added to the graph we're using in the program, the 2nd token being the name of the vertex, and the last token being its index in the vertices[] array of the graph struct.
I've got a while loop going through this program line by line, it takes the first token to decide whether to make a vertex or an edge, and then proceeds accordingly.
When I finish the file traversal, I call my show_vertices function, which is just a for-loop that prints each name (g->vertices[i].name) sequentially.
The problem is that where the name should go in the output (%s), I keep getting the last "token1" I collected. In the case of the particular input file I'm using it happens to be the source node of the last edge in the list...which is odd because there are two other values passed through the strtok() function afterward. The line in the file looks like:
E|6|7|1
which creates an edge from indexes 6 to 7 with a weight of 1. The edge comes up fine. But when I call any printf with a %s, it comes up "6". Regardless.
This is the file traversal.
fgets(currLn, sizeof(currLn), infile);
maxv = atoi(currLn);
if(maxv = 0)
{
//file not formatted correctly, print error message
return;
}
t_graph *g = new_graph(maxv, TRUE);
while((fgets(currLn, sizeof(currLn), infile)) != NULL)
{
token1 = strtok(currLn, "|");
key = token1[0];
if(key == 'P' || key == 'p')
{
token1 = strtok(NULL, "|");
if(!add_vertex(g, token1))
{
//file integration fail, throw error!
return;
}
//***If I print the name here, it works fine and gives me the right name!****
continue;
}
if(key == 'E' || key == 'e')
{
token1 = strtok(NULL, "|");
token2 = strtok(NULL, "|");
token3 = strtok(NULL, "|");
src = atoi(token1);
dst = atoi(token2);
w = atoi(token3);
if(!add_edge(g, src, dst, w))
{
//file integration fail, throw error
return;
}
continue;
}
else
{
//epic error message because user doesn't know what they're doing.
return;
}
}
If I run show_vertices here, I get:
0. 6
1. 6
2. 6
etc...
You aren't copying the name. So you end up with a pointer (returned by strtok) to single static array in which you read each line. Since the name is always at offset 2, it that pointer will always be currLn+2. When you traverse and print, that will be the last name you read.
You need to strdup(token1) before passing it to (or in) add_vertex.
No there isn't enough information to be certain this is the answer. But I'll bet money this is it.

Resources