How to get position of delimited separated string in C - c

How do i get the position of delimited separated string?
My text file looks like
at:x:25:25:Batch jobs daemon:/var/spool/atjobs:/bin/bash
avahi:x:109:111:User for Avahi:/var/run/avahi-daemon:/bin/false
beagleindex:x:110:112:User for Beagle indexing:/var/cache/beagle:/bin/bash
My C code looks like
#include<stdio.h>
int main(int argc, char *argv[])
{
char *str, *saveptr;
char ch[100];
char *sp;
FILE *f;
int j;
char searchString[20];
char *pos;
f = fopen("passwd", "r");
if (f == NULL)
{
printf("Error while opening the file");
}
while (fgets(ch, sizeof ch, f)!= NULL)
{
/*printf("%s\n", ch); */
for (j = 1, str = ch; ; j++, str= NULL)
{
char *token = strtok_r(str, ": ", &saveptr);
if (token == NULL)
break;
//printf("%s---\n---", token);
printf("%s",token);
}
}
fclose(f);

well, using strtok(str, ": ", will split your string on spaces as well as colons, which is probably not what you want. In addition, strtok treats multiple consecutive delimiter characters as a single delimiter (so it will never return an empty string between two colons), which is not what you want for parsing passwd.
Instead, you probably just want to use strchr:
while (fgets(ch, sizeof ch, f)!= NULL) {
char *token, *end;
for (j = 1, token = ch; token; j++, token = end) {
if ((end = strchr(token, ':'))) *end++ = 0;
...do something with token and j

I do not think you have to use strtok() just to get the position of a token separated by delimiters, rather simply walk through each line, and do a char by char comparison for the delimiter... (hope this will help you)
I prepared an input file called GetDelimPosition.txt:
at:x:25:25:Batch jobs daemon:/var/spool/atjobs:/bin/bash
avahi:x:109:111:User for Avahi:/var/run/avahi-daemon:/bin/false
jamil:x:25:25:Batch jobs daemon:/var/spool/atjobs:/bin/bash
javier:x:109:111:User for Avahi:/var/run/avahi-daemon:/bin/false
jiame:x:25:25:Batch jobs daemon:/var/spool/atjobs:/bin/bash
jose:x:109:111:User for Avahi:/var/run/avahi-daemon:/bin/false
And used the following code: (of course you will modify as needed)
#include <ansi_c.h>
//edit this line as needed:
#define FILE_LOC "C:\\dev\\play\\GetDelimPosition.txt"
int main(void)
{
FILE * fp;
char ch[260];
int line=-1;
int position[80][100]={0}; //lines x DelimPosition
memset(position, 0, 80*100*sizeof(int));
int i=-1,j=0, k=0;
int len;
fp = fopen(FILE_LOC, "r");
while (fgets(ch, sizeof ch, fp)!= NULL)
{
line++; //increment line
len = strlen(ch);
for(j=0;j<len;j++)
{
if(ch[j] == ':')
{
position[line][k] = j+1;//position of token (1 after delim)
k++; //increment position index for next token
}
}
k=0; //getting new line, zero position index
}
fclose(fp);
return 0;
}
To get the following results: (rows are lines in file, columns are positions of each token. First token is assumed at position 0, and not reported)

Related

ascii file processing in C

I have a hard time understanding how you process ascii files in c. I have no problem opening files and closing them or reading files with one value on each line. However, when the data is separated with characters, I really don't understand what the code is doing at a lower level.
Example: I have a file containing names separated with comas that looks like this:
"MARY","PATRICIA","LINDA","BARBARA","ELIZABETH","JENNIFER"
I have created an array to store them:
char names[6000][20];
And now, my code to process it is while (fscanf(data, "\"%s\",", names[index]) != EOF) { index++; }
The code executes for the 1st iteration and names[0] contains the whole file.
How can I separate all the names?
Here is the full code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main() {
char names[6000][20]; // an array to store 6k names of max length 19
FILE * data = fopen("./022names.txt", "r");
int index = 0;
int nbNames;
while (fscanf(data, "\"%s\",", names[index]) != EOF) {
index++;
}
nbNames = index;
fclose(data);
printf("%d\n", index);
for (index=0; index<nbNames; index++) {
printf("%s \n", names[index]);
}
printf("\n");
return 0;
}
PS: I am thinking this might also be because of the data structure of my array.
If you want a simple solution, you can read the file character by character using fgetc. Since there are no newlines in the file, just ignore quotation marks and move to the next index when you find a comma.
char names[6000][20]; // an array to store 6k names of max length 19
FILE * data = fopen("./022names.txt", "r");
int name_count = 0, current_name_ind = 0;
int c;
while ((c = fgetc(data)) != EOF) {
if (c == ',') {
names[name_count][current_name_ind] = '\0';
current_name_ind = 0;
++name_count;
} else if (c != '"') {
names[name_count][current_name_ind] = c;
++current_name_ind;
}
}
names[name_count][current_name_ind] = '\0';
fclose(data);
"The code executes for the 1st iteration and names[0] contains the whole file...., How can I separate all the names?"
Regarding the first few statements:
char names[6000][20]; // an array to store 6k names of max length 19
FILE * data = fopen("./022names.txt", "r");
What if there are there are 6001 names. Or one of the names has more than 20 characters?
Or what if there are way less than 6000 names?
The point is that with some effort to enumerate the tasks you have listed, and some time mapping out what information is needed to create the code that matches your criteria, you can create a better product: The following is derived from your post:
Process ascii files in c
Read file content that is separated by characters
input is a comma separated file, with other delimiters as well
Choose a method best suited to parse a file of variable size
As mentioned in the comments under your question there are ways to create your algorithms in such way as to flexibly allow for extra long names, or for a variable number of names. This can be done using a few C standard functions commonly used in parsing files. ( Although fscanf() has it place, it is not the best option for parsing file contents into array elements.)
The following approach performs the following steps to accomplish the user needs enumerated above
Read file to determine number of, and longest element
Create array sized to contain exact contents of file using count of elements and longest element using variable length array (VLA)
Create function to parse file contents into array. (using this technique of passing VLA as function argument.)
Following is a complete example of how to implement each of these, while breaking the tasks into functions when appropriate...
Note, code below was tested using the following input file:
names.txt
"MARY","PATRICIA","LINDA","BARBARA","ELIZABETH","JENNIFER",
"Joseph","Bart","Daniel","Stephan","Karen","Beth","Marcia",
"Calmazzothoulumus"
.
//Prototypes
int count_names(const char *filename, size_t *count);
size_t filesize(const char *fn);
void populateNames(const char *fn, int longest, char arr[][longest]);
char *filename = ".\\names.txt";
int main(void)
{
size_t count = 0;
int longest = count_names(filename, &count);
char names[count][longest+1];//VLA - See linked info
// +1 is room for null termination
memset(names, 0, sizeof names);
populateNames(filename, longest+1, names);
return 0;
}
//populate VLA with names in file
void populateNames(const char *fn, int longest, char names[][longest])
{
char line[80] = {0};
char *delim = "\",\n ";
char *tok = NULL;
FILE * fp = fopen(fn, "r");
if(fp)
{
int i=0;
while(fgets(line, sizeof line, fp))
{
tok = strtok(line, delim);
while(tok)
{
strcpy(names[i], tok);
tok = strtok(NULL, delim);
i++;
}
}
fclose(fp);
}
}
//passes back count of tokens in file, and return longest token
int count_names(const char *filename, size_t *count)
{
int len=0, lenKeep = 0;
FILE *fp = fopen(filename, "r");
if(fp)
{
char *tok = NULL;
char *delim = "\",\n ";
int cnt = 0;
size_t fSize = filesize(filename);
char *buf = calloc(fSize, 1);
while(fgets(buf, fSize, fp)) //goes to newline for each get
{
tok = strtok(buf, delim);
while(tok)
{
cnt++;
len = strlen(tok);
if(lenKeep < len) lenKeep = len;
tok = strtok(NULL, delim);
}
}
*count = cnt;
fclose(fp);
free(buf);
}
return lenKeep;
}
//return file size in bytes (binary read)
size_t filesize(const char *fn)
{
size_t size = 0;
FILE*fp = fopen(fn, "rb");
if(fp)
{
fseek(fp, 0, SEEK_END);
size = ftell(fp);
fseek(fp, 0, SEEK_SET);
fclose(fp);
}
return size;
}
You can use the in-built strtok() function which is easy to use.
I have used the tok+1 instead of tok to omit the first " and strlen(tok) - 2 to omit the last ".
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main() {
char names[6000][20]; // an array to store 6k names of max length 19
FILE * data = fopen("./022names.txt", "r");
int index = 0;
int nbNames;
char *str = (char*)malloc(120000*sizeof(char));
while (fscanf(data, "%s", str) != EOF) {
char *tok = strtok(str, ",");
while(tok != 0){
strncpy(names[index++], tok+1, strlen(tok)-2);
tok = strtok(0, ",");
}
}
nbNames = index;
fclose(data);
free(str); // just to free the memory occupied by the str variable in the heap.
printf("%d\n", index);
for (index=0; index<nbNames; index++) {
printf("%s \n", names[index]);
}
printf("\n");
return 0;
}
Also, the parameter 120000 is just the maximum number of characters that can be in the file. It is just 6000 * 20 as you mentioned.

Find text inside the beg and end () parentheses in textile and read/print into a buffer. IN C

I am new to C and am getting very frustrated with learning this language. Currently I'm trying to write a program that reads in a program textfile, reads and prints all the string literals, and tokens each on separate line. I have most of it except for one snag. within the text file there is a line such as: (..text..). I need to be able to search, read and print all the text is inside the parentheses on it's own line. Here is an idea I have so far:
#define KEY 32
#define BUFFER_SIZE 500
FILE *fp, *fp2;
int main()
{
char ch, buffer[BUFFER_SIZE], operators[] = "+-*%=", separators[] = "(){}[]<>,";
char *pus;
char source[200 + 1];
int i, j = 0, k = 0;
char *words = NULL, *word = NULL, c;
fp = fopen("main.txt", "r");
fp2 = fopen ("mynewfile.txt","w") ;
while ((ch = fgetc(fp)) != EOF)
{
// pus[k++] = ch;
if( ch == '(')
{
for ( k = 0;, k < 20, K++){
buffer[k] = ch;
buffer[k] = '\0';
}
printf("%s\n", buffer)
}
....
The textfile is this:
#include <stdio.h>
int main(int argc, char **argv)
{
for (int i = 0; i < argc; ++i)
{
printf("argv[%d]: %s\n", i, argv[i]);
}
}
So far I've been able to read char by char and place it into a buffer. But this idea just isn't working, and I'm stumped. I've tried dabbling with strcopy(), ands strtok, but they all take char arrays. Any ideas would be appreciated thank you.
Most likely the best way would be to use fgets() with a file to read in each line as a string (char array) and then delimit that string. See the short example below:
char buffer[BUFFER_SIZE];
int current_line = 0;
//Continually read in lines until nothing is left...
while(fgets(buffer, BUFFER_SIZE - 1, fp) != NULL)
{
//Line from file is now in buffer. We can delimit it.
char copy[BUFFER_SIZE];
//Copy as strtok will overwrite a string.
strcpy(copy, buffer);
printf("Line: %d - %s", current_line, buffer); //Print the line.
char * found = strtok(copy, separators); //Will delmit based on the separators.
while(found != NULL)
{
printf("%s", found);
found = strtok(NULL, separators);
}
current_line++;
}
strtok will return a char pointer to where the first occurrence of a delimiter is. It will replace the delimiter with the null terminator, thereby making "new" string. We can pass NULL to strtok to tell it to continue where it left off. Using this, we can parse line by line from a file based on multiple delimiters. You could save these individual string or evaluate them further.

Reading data from a .csv file to append into a 2D array in c

for (int i = 0; i < 4; i++) {
for (int j = 0; j <4; i++) {
while (c != EOF)
token = strtok((fgets(token,5,fp)), delim);
}
}
Hey everyone, I'm new to C and I was given a project to take a csv file and count the average number of the values in each column. Right now I'm trying to parse the lines by the commas. I found the function [strtok], but I'm definitely implementing it incorrectly. I have the number of rows and columns in the csv file, I just need help figuring out how to parse each line by "," and place those values into a 2D array. Above is my current code that I was going to use to append the values to the array, but I keep getting a "Segmentation fault". Any help would be appreciated.
Here is the whole code for the function. I include stdio.h and stdlib.h:
void main() {
char *strcat(char *dest, const char *src);
char *strtok(char *str, const char *delim);
char *file_name = "test.txt";
FILE *fp = fopen(file_name, "r");
int array[4][4];
//int array1 [2] = {1, 3};
int counter = 0;
char *token = " ";
const char *delim = (const char *)',';
char c = fgetc(fp);
for (int i = 0; i < 4; i++) {
token = "";
for (int j = 0; j <4; i++) {
while (c != EOF)
token = strtok((fgets(token,5,fp)), delim);
}
}
fclose(fp);
}
A sample input would be something like this:
10,20,30,60
40,50,60,70
70,80,90,80
100,110,120,70
Yes you are right, you are using strtok incorrectly.
The first thing I would do is to read each line and the parse the line using
strtok, like this:
char line[1024];
const char *delim=",\n";
while(fgets(line, sizeof line, fp))
{
char *token = strtok(line, delim);
do {
printf("token: %s\n", token);
} while(token = strtok(NULL, delim));
}
strtok requires that all subsequent calls of strtok must be called with
NULL. strtok will return NULL when no more token can be found, usually the
end of the line has been reached. Note that I added the newline in the
delimiters argument. When the destination buffer is large enough fgets writes
the newline as well. Putting the newline in the delimiters list is nice trick
because strtok will get rid of the newline for you.
The code above gives you a way getting each cell of the csv, as a string. You
would have to convert the values yourself. This is the tricky bit, if the csv
contains empty spaces, quotes, etc, you need different strategies to parse the
correct value of the cell. You can use function like strtol & friend which
allow you to recover from errors, but they are not bullet proof, there will be
cases when they fail as well.
An easy example would be:
char line[1024];
const char *delim=",\n";
while(fgets(line, sizeof line, fp))
{
char *token = strtok(line, delim);
do {
int val;
if(sscanf(token, "%d", &n) != 1)
fprintf(stderr, "'%s' is not a number!\n", token);
else
printf("number found: %d\n", val);
} while(token = strtok(NULL, delim));
}
Note that this not cover all cases, for example cell that are in quotes.
The last thing to be done would be to store the values. One way of doing it is
to allocate memory for a pointer to an int array and reallocate memory for
every cell. Here again the problem lies in the csv file, sometimes they have the
wrong format, some rows will be empty or some rows will have more or less
columns than the other rows, this can be tricky. At this point it would be a good
idea to use a library for parsing csv.
The following code will assume that csv is well formatted and the number of
columns is always the same across all rows and no line is longer than 1023
characters long. When *cols is 0, I calculate the number of columns base on
the first line. If other rows have less columns, all remaining values will be 0
(because of the calloc sets new allocated memory to 0). If there are more
colmuns than in the first row, this columns will be ignored:
int **parse_csv(const char *filename, size_t *rows, size_t *cols)
{
if(filename == NULL || rows == NULL || cols == NULL)
return NULL;
FILE *fp = fopen(filename, "r");
if(fp == NULL)
return NULL;
int **csv = NULL, **tmp;
*rows = 0;
*cols = 0;
char line[1024];
char *token;
char *delim = ",\n";
while(fgets(line, sizeof line, fp))
{
tmp = realloc(csv, (*rows + 1) * sizeof *csv);
if(tmp == NULL)
return csv; // return all parsed rows so far
csv = tmp;
if(*cols == 0)
{
// calculating number of rows
char copy[1024];
strcpy(copy, line);
token = strtok(copy, delim);
do {
(*cols)++;
} while((token = strtok(NULL, delim)));
}
int *row = calloc(*cols, sizeof *row);
if(row == NULL)
{
if(*rows == 0)
{
free(csv);
return NULL;
}
return csv; // return all parsed rows so far
}
// increment rows count
(*rows)++;
size_t idx = 0;
token = strtok(line, delim);
do {
if(sscanf(token, "%d", row + idx) != 1)
row[idx] = 0; // in case the conversion fails,
// just to make sure to have a defined value
// in the cell
idx++;
} while((token = strtok(NULL, delim)) && idx < *cols);
csv[*rows - 1] = row;
}
fclose(fp);
return csv;
}
void free_csv(int **csv, size_t rows)
{
if(csv == NULL)
return;
for(size_t i = 0; i < rows; ++i)
free(csv[i]);
free(csv);
}
Now you can parse it like this:
size_t cols, rows;
int **csv = parse_csv("file.csv", &rows, &cols);
if(csv == NULL)
{
// error handling...
// do not continue
}
...
free_csv(csv, rows);
Now csv[3][4] would give you the cell at row 3, col 4 (starting from 0).
edit
Things I noticed from you code:
void main() is wrong. main should have only one of the following prototypes:
int main(void);
int main(int argc, char **argv);
int main(int argc, char *argv[]);
Another:
int main(void)
{
char *strcat(char *dest, const char *src);
char *strtok(char *str, const char *delim);
...
}
Don't put that in the main function, put it outside, also there are standard
header files for this. In this case include string.h
#include <string.h>
int main(void)
{
...
}
Another
const char *delim = (const char *)',';
This is just wrong, it's like trying to sell an apple and call it orange. ','
is a single character of type char. It has the value 44. It's the same as
doing:
const char *delim = (const char*) 44;
you are setting the address where delim should point to 44.
You have to use double quotes:
const char *delim = ",";
Note that 'x' and "x" are not the same. 'x' is 120 (see ASCII), it's
a single char. "x" is a string literal, it returns you a pointer to the start
of a sequence of characters that ends with the '\0'-terminating byte, aka a
string. Those are fundamentally different things in C.

c read block of lines and store them [duplicate]

I am really new to C, and the reading files thing drives me crazy...
I want read a file including name, born place and phone number, etc. All separated by tab
The format might be like this:
Bob Jason Los Angeles 33333333
Alice Wong Washington DC 111-333-222
So I create a struct to record it.
typedef struct Person{
char name[20];
char address[30];
char phone[20];
} Person;
I tried many ways to read this file into struct but it failed.
I tired fread:
read_file = fopen("read.txt", "r");
Person temp;
fread(&temp, sizeof(Person), 100, read_file);
printf("%s %s %s \n", temp.name, temp.address, temp.phone);
But char string does not recorded into temp separated by tab, it read the whole file into temp.name and get weird output.
Then I tried fscanf and sscanf, those all not working for separating tab
fscanf(read_file, "%s %s %s", temp.name, temp.address, temp.phone);
Or
fscanf(read_file, "%s\t%s\t%s", temp.name, temp.address, temp.phone);
This separates the string by space, so I get Bob and Jason separately, while indeed, I need to get "Bob Jason" as one char string. And I did separate these format by tab when I created the text file.
Same for sscanf, I tried different ways many times...
Please help...
I suggest:
Use fgets to read the text line by line.
Use strtok to separate the contents of the line by using tab as the delimiter.
// Use an appropriate number for LINE_SIZE
#define LINE_SIZE 200
char line[LINE_SIZE];
if ( fgets(line, sizeof(line), read_file) == NULL )
{
// Deal with error.
}
Person temp;
char* token = strtok(line, "\t");
if ( token == NULL )
{
// Deal with error.
}
else
{
// Copy token at most the number of characters
// temp.name can hold. Similar logic applies to address
// and phone number.
temp.name[0] = '\0';
strncat(temp.name, token, sizeof(temp.name)-1);
}
token = strtok(NULL, "\t");
if ( token == NULL )
{
// Deal with error.
}
else
{
temp.address[0] = '\0';
strncat(temp.address, token, sizeof(temp.address)-1);
}
token = strtok(NULL, "\n");
if ( token == NULL )
{
// Deal with error.
}
else
{
temp.phone[0] = '\0';
strncat(temp.phone, token, sizeof(temp.phone)-1);
}
Update
Using a helper function, the code can be reduced in size. (Thanks #chux)
// The helper function.
void copyToken(char* destination,
char* source,
size_t maxLen;
char const* delimiter)
{
char* token = strtok(source, delimiter);
if ( token != NULL )
{
destination[0] = '\0';
strncat(destination, token, maxLen-1);
}
}
// Use an appropriate number for LINE_SIZE
#define LINE_SIZE 200
char line[LINE_SIZE];
if ( fgets(line, sizeof(line), read_file) == NULL )
{
// Deal with error.
}
Person temp;
copyToken(temp.name, line, sizeof(temp.name), "\t");
copyToken(temp.address, NULL, sizeof(temp.address), "\t");
copyToken(temp.phone, NULL, sizeof(temp.phone), "\n");
This is only for demonstration, there are better ways to initialize variables, but to illustrate your main question i.e. reading a file delimited by tabs, you can write a function something like this:
Assuming a strict field definition, and your struct definition you can get tokens using strtok().
//for a file with constant field definitions
void GetFileContents(char *file, PERSON *person)
{
char line[260];
FILE *fp;
char *buf=0;
char temp[80];
int i = -1;
fp = fopen(file, "r");
while(fgets(line, 260, fp))
{
i++;
buf = strtok(line, "\t\n");
if(buf) strcpy(person[i].name, buf);
buf = strtok(NULL, "\t\n");
if(buf) strcpy(person[i].address, buf);
buf = strtok(NULL, "\t\n");
if(buf) strcpy(person[i].phone, buf);
//Note: if you have more fields, add more strtok/strcpy sections
//Note: This method will ONLY work for consistent number of fields.
//If variable number of fields, suggest 2 dimensional string array.
}
fclose(fp);
}
Call it in main() like this:
int main(void)
{
//...
PERSON person[NUM_LINES], *pPerson; //NUM_LINES defined elsewhere
//and there are better ways
//this is just for illustration
pPerson = &person[0];//initialize pointer to person
GetFileContents(filename, pPerson); //call function to populate person.
//...
return 0;
}
First thing,
fread(&temp, sizeof(temp), 100, read_file);
will not work because the fields are not fixed width, so it will always read 20 characters for name 30 for address and so on, which is not always the correct thing to do.
You need to read one line at a time, and then parse the line, you can use any method you like to read a like, a simple one is by using fgets() like this
char line[100];
Person persons[100];
int index;
index = 0;
while (fgets(line, sizeof(line), read_file) != NULL)
{
persons[i++] = parseLineAndExtractPerson(line);
}
Now we need a function to parse the line and store the data in you Person struct instance
char *extractToken(const char *const line, char *buffer, size_t bufferLength)
{
char *pointer;
size_t length;
if ((line == NULL) || (buffer == NULL))
return NULL;
pointer = strpbrk(line, "\t");
if (pointer == NULL)
length = strlen(line);
else
length = pointer - line;
if (length >= bufferLength) /* truncate the string if it was too long */
length = bufferLength - 1;
buffer[length] = '\0';
memcpy(buffer, line, length);
return pointer + 1;
}
Person parseLineAndExtractPerson(const char *line)
{
Person person;
person.name[0] = '\0';
person.address[0] = '\0';
person.phone[0] = '\0';
line = extractToken(line, person.name, sizeof(person.name));
line = extractToken(line, person.address, sizeof(person.address));
line = extractToken(line, person.phone, sizeof(person.phone));
return person;
}
Here is a sample implementation of a loop to read at most 100 records
int main(void)
{
char line[100];
Person persons[100];
int index;
FILE *read_file;
read_file = fopen("/path/to/the/file.type", "r");
if (read_file == NULL)
return -1;
index = 0;
while ((index < 100) && (fgets(line, sizeof(line), read_file) != NULL))
{
size_t length;
/* remove the '\n' left by `fgets()'. */
length = strlen(line);
if ((length > 0) && (line[length - 1] == '\n'))
line[length - 1] = '\0';
persons[index++] = parseLineAndExtractPerson(line);
}
fclose(read_file);
while (--index >= 0)
printf("%s: %s, %s\n", persons[index].name, persons[index].address, persons[index].phone);
return 0;
}
Here is a complete program that does what I think you need
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct Person{
char name[20];
char address[30];
char phone[20];
} Person;
char *extractToken(const char *const line, char *buffer, size_t bufferLength)
{
char *pointer;
size_t length;
if ((line == NULL) || (buffer == NULL))
return NULL;
pointer = strpbrk(line, "\t");
if (pointer == NULL)
length = strlen(line);
else
length = pointer - line;
if (length >= bufferLength) /* truncate the string if it was too long */
length = bufferLength - 1;
buffer[length] = '\0';
memcpy(buffer, line, length);
return pointer + 1;
}
Person parseLineAndExtractPerson(const char *line)
{
Person person;
person.name[0] = '\0';
person.address[0] = '\0';
person.phone[0] = '\0';
line = extractToken(line, person.name, sizeof(person.name));
line = extractToken(line, person.address, sizeof(person.address));
line = extractToken(line, person.phone, sizeof(person.phone));
return person;
}
int main(void)
{
char line[100];
Person persons[100];
int index;
FILE *read_file;
read_file = fopen("/home/iharob/data.dat", "r");
if (read_file == NULL)
return -1;
index = 0;
while (fgets(line, sizeof(line), read_file) != NULL)
{
size_t length;
length = strlen(line);
if (line[length - 1] == '\n')
line[length - 1] = '\0';
persons[index++] = parseLineAndExtractPerson(line);
}
fclose(read_file);
while (--index >= 0)
printf("%s: %s, %s\n", persons[index].name, persons[index].address, persons[index].phone);
return 0;
}
Parsing strings returned by fgets can be very annoying, especially when input is truncated. In fact, fgets leaves a lot to be desired. Did you get the correct string or was there more? Is there a newline at the end? For that matter, is the end 20 bytes away or 32768 bytes away? It would be nice if you didn't need to count that many bytes twice -- once with fgets and once with strlen, just to remove a newline that you didn't want.
Things like fscanf don't necessarily work as intended in this situation unless you have C99's "scanset" feature available, and then that will automatically add a null terminator, if you have enough room. The return value of any of the scanf family is your friend in determining whether success or failure occurred.
You can avoid the null terminator by using %NNc, where NN is the width, but if there's a \t in those NN bytes, then you need to separate it and move it to the next field, except that means bytes in the next field must be moved to the field after that one, and the 90th field will need its bytes moved to the 91st field... And hopefully you only need to do that once... Obviously that isn't actually a solution either.
Given those reasons, I feel it's easier just to read until you encounter one of the expected delimiters and let you decide the behavior of the function when the size specified is too small for a null terminator, yet large enough to fill your buffer. Anyway, here's the code. I think it's pretty straightforward:
/*
* Read a token.
*
* tok: The buffer used to store the token.
* max: The maximum number of characters to store in the buffer.
* delims: A string containing the individual delimiter bytes.
* fileptr: The file pointer to read the token from.
*
* Return value:
* - max: The buffer is full. In this case, the string _IS NOT_ null terminated.
* This may or may not be a problem: it's your choice.
* - (size_t)-1: An I/O error occurred before the last delimiter
* (just like with `fgets`, use `feof`).
* - any other value: The length of the token as `strlen` would return.
* In this case, the string _IS_ null terminated.
*/
size_t
read_token(char *restrict tok, size_t max, const char *restrict delims,
FILE *restrict fileptr)
{
int c;
size_t n;
for (n = 0; n < max && (c = getchar()) != EOF &&
strchr(delims, c) == NULL; ++n)
*tok++ = c;
if (c == EOF)
return (size_t)-1;
if (n == max)
return max;
*tok = 0;
return n;
}
Usage is pretty straightforward as well:
#include <stdio.h>
#include <stdlib.h>
typedef struct person {
char name[20];
char address[30];
char phone[20];
} Person;
int
main(void)
{
FILE *read_file;
Person temp;
size_t line_num;
size_t len;
int c;
int exit_status = EXIT_SUCCESS;
read_file = fopen("read.txt", "r");
if (read_file == NULL) {
fprintf(stderr, "Error opening read.txt\n");
return 1;
}
for (line_num = 0;; ++line_num) {
/*
* Used for detecting early EOF
* (e.g. the last line contains only a name).
*/
temp.name[0] = temp.phone[0] = 0;
len = read_token(temp.name, sizeof(temp.name), "\t",
read_file);
if (len == (size_t)-1)
break;
if (len == max) {
fprintf(stderr, "Skipping bad line %zu\n", line_num + 1);
while ((c = getchar()) != EOF && c != '\n')
; /* nothing */
continue;
}
len = read_token(temp.address, sizeof(temp.address), "\t",
read_file);
if (len == (size_t)-1)
break;
if (len == max) {
fprintf(stderr, "Skipping bad line %zu\n", line_num + 1);
while ((c = getchar()) != EOF && c != '\n')
; /* nothing */
continue;
}
len = read_token(temp.phone, sizeof(temp.phone), "\t",
read_file);
if (len == (size_t)-1)
break;
if (len == max) {
fprintf(stderr, "Skipping bad line %zu\n", line_num + 1);
while ((c = getchar()) != EOF && c != '\n')
; /* nothing */
continue;
}
// Do something with the input here. Example:
printf("Entry %zu:\n"
"\tName: %.*s\n"
"\tAddress: %.*s\n"
"\tPhone: %.*s\n\n",
line_num + 1,
(int)sizeof(temp.name), temp.name,
(int)sizeof(temp.address), temp.address,
(int)sizeof(temp.phone), temp.phone);
}
if (ferror(read_file)) {
fprintf(stderr, "error reading from file\n");
exit_status = EXIT_FAILURE;
}
else if (feof(read_file) && temp.phone[0] == 0 && temp.name[0] != 0) {
fprintf(stderr, "Unexpected end of file while reading entry %zu\n",
line_num + 1);
exit_status = EXIT_FAILURE;
}
//else feof(read_file) is still true, but we parsed a full entry/record
fclose(read_file);
return exit_status;
}
Notice how the exact same 8 lines of code appear in the read loop to handle the return value of read_token? Because of that, I think there's probably room for another function to call read_token and handle its return value, allowing main to simply call this "read_token handler", but I think the code above gives you the basic idea about how to work with read_token and how it can apply in your situation. You might change the behavior in some way, if you like, but the read_token function above would suit me rather well when working with delimited input like this (things would be a bit more complex when you add quoted fields into the mix, but not much more complex as far as I can tell). You can decide what happens with max being returned. I opted for it being considered an error, but you might think otherwise. You might even add an extra getchar when n == max and consider max being a successful return value and something like (size_t)-2 being the "token too large" error indicator instead.

Read files separated by tab in c

I am really new to C, and the reading files thing drives me crazy...
I want read a file including name, born place and phone number, etc. All separated by tab
The format might be like this:
Bob Jason Los Angeles 33333333
Alice Wong Washington DC 111-333-222
So I create a struct to record it.
typedef struct Person{
char name[20];
char address[30];
char phone[20];
} Person;
I tried many ways to read this file into struct but it failed.
I tired fread:
read_file = fopen("read.txt", "r");
Person temp;
fread(&temp, sizeof(Person), 100, read_file);
printf("%s %s %s \n", temp.name, temp.address, temp.phone);
But char string does not recorded into temp separated by tab, it read the whole file into temp.name and get weird output.
Then I tried fscanf and sscanf, those all not working for separating tab
fscanf(read_file, "%s %s %s", temp.name, temp.address, temp.phone);
Or
fscanf(read_file, "%s\t%s\t%s", temp.name, temp.address, temp.phone);
This separates the string by space, so I get Bob and Jason separately, while indeed, I need to get "Bob Jason" as one char string. And I did separate these format by tab when I created the text file.
Same for sscanf, I tried different ways many times...
Please help...
I suggest:
Use fgets to read the text line by line.
Use strtok to separate the contents of the line by using tab as the delimiter.
// Use an appropriate number for LINE_SIZE
#define LINE_SIZE 200
char line[LINE_SIZE];
if ( fgets(line, sizeof(line), read_file) == NULL )
{
// Deal with error.
}
Person temp;
char* token = strtok(line, "\t");
if ( token == NULL )
{
// Deal with error.
}
else
{
// Copy token at most the number of characters
// temp.name can hold. Similar logic applies to address
// and phone number.
temp.name[0] = '\0';
strncat(temp.name, token, sizeof(temp.name)-1);
}
token = strtok(NULL, "\t");
if ( token == NULL )
{
// Deal with error.
}
else
{
temp.address[0] = '\0';
strncat(temp.address, token, sizeof(temp.address)-1);
}
token = strtok(NULL, "\n");
if ( token == NULL )
{
// Deal with error.
}
else
{
temp.phone[0] = '\0';
strncat(temp.phone, token, sizeof(temp.phone)-1);
}
Update
Using a helper function, the code can be reduced in size. (Thanks #chux)
// The helper function.
void copyToken(char* destination,
char* source,
size_t maxLen;
char const* delimiter)
{
char* token = strtok(source, delimiter);
if ( token != NULL )
{
destination[0] = '\0';
strncat(destination, token, maxLen-1);
}
}
// Use an appropriate number for LINE_SIZE
#define LINE_SIZE 200
char line[LINE_SIZE];
if ( fgets(line, sizeof(line), read_file) == NULL )
{
// Deal with error.
}
Person temp;
copyToken(temp.name, line, sizeof(temp.name), "\t");
copyToken(temp.address, NULL, sizeof(temp.address), "\t");
copyToken(temp.phone, NULL, sizeof(temp.phone), "\n");
This is only for demonstration, there are better ways to initialize variables, but to illustrate your main question i.e. reading a file delimited by tabs, you can write a function something like this:
Assuming a strict field definition, and your struct definition you can get tokens using strtok().
//for a file with constant field definitions
void GetFileContents(char *file, PERSON *person)
{
char line[260];
FILE *fp;
char *buf=0;
char temp[80];
int i = -1;
fp = fopen(file, "r");
while(fgets(line, 260, fp))
{
i++;
buf = strtok(line, "\t\n");
if(buf) strcpy(person[i].name, buf);
buf = strtok(NULL, "\t\n");
if(buf) strcpy(person[i].address, buf);
buf = strtok(NULL, "\t\n");
if(buf) strcpy(person[i].phone, buf);
//Note: if you have more fields, add more strtok/strcpy sections
//Note: This method will ONLY work for consistent number of fields.
//If variable number of fields, suggest 2 dimensional string array.
}
fclose(fp);
}
Call it in main() like this:
int main(void)
{
//...
PERSON person[NUM_LINES], *pPerson; //NUM_LINES defined elsewhere
//and there are better ways
//this is just for illustration
pPerson = &person[0];//initialize pointer to person
GetFileContents(filename, pPerson); //call function to populate person.
//...
return 0;
}
First thing,
fread(&temp, sizeof(temp), 100, read_file);
will not work because the fields are not fixed width, so it will always read 20 characters for name 30 for address and so on, which is not always the correct thing to do.
You need to read one line at a time, and then parse the line, you can use any method you like to read a like, a simple one is by using fgets() like this
char line[100];
Person persons[100];
int index;
index = 0;
while (fgets(line, sizeof(line), read_file) != NULL)
{
persons[i++] = parseLineAndExtractPerson(line);
}
Now we need a function to parse the line and store the data in you Person struct instance
char *extractToken(const char *const line, char *buffer, size_t bufferLength)
{
char *pointer;
size_t length;
if ((line == NULL) || (buffer == NULL))
return NULL;
pointer = strpbrk(line, "\t");
if (pointer == NULL)
length = strlen(line);
else
length = pointer - line;
if (length >= bufferLength) /* truncate the string if it was too long */
length = bufferLength - 1;
buffer[length] = '\0';
memcpy(buffer, line, length);
return pointer + 1;
}
Person parseLineAndExtractPerson(const char *line)
{
Person person;
person.name[0] = '\0';
person.address[0] = '\0';
person.phone[0] = '\0';
line = extractToken(line, person.name, sizeof(person.name));
line = extractToken(line, person.address, sizeof(person.address));
line = extractToken(line, person.phone, sizeof(person.phone));
return person;
}
Here is a sample implementation of a loop to read at most 100 records
int main(void)
{
char line[100];
Person persons[100];
int index;
FILE *read_file;
read_file = fopen("/path/to/the/file.type", "r");
if (read_file == NULL)
return -1;
index = 0;
while ((index < 100) && (fgets(line, sizeof(line), read_file) != NULL))
{
size_t length;
/* remove the '\n' left by `fgets()'. */
length = strlen(line);
if ((length > 0) && (line[length - 1] == '\n'))
line[length - 1] = '\0';
persons[index++] = parseLineAndExtractPerson(line);
}
fclose(read_file);
while (--index >= 0)
printf("%s: %s, %s\n", persons[index].name, persons[index].address, persons[index].phone);
return 0;
}
Here is a complete program that does what I think you need
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct Person{
char name[20];
char address[30];
char phone[20];
} Person;
char *extractToken(const char *const line, char *buffer, size_t bufferLength)
{
char *pointer;
size_t length;
if ((line == NULL) || (buffer == NULL))
return NULL;
pointer = strpbrk(line, "\t");
if (pointer == NULL)
length = strlen(line);
else
length = pointer - line;
if (length >= bufferLength) /* truncate the string if it was too long */
length = bufferLength - 1;
buffer[length] = '\0';
memcpy(buffer, line, length);
return pointer + 1;
}
Person parseLineAndExtractPerson(const char *line)
{
Person person;
person.name[0] = '\0';
person.address[0] = '\0';
person.phone[0] = '\0';
line = extractToken(line, person.name, sizeof(person.name));
line = extractToken(line, person.address, sizeof(person.address));
line = extractToken(line, person.phone, sizeof(person.phone));
return person;
}
int main(void)
{
char line[100];
Person persons[100];
int index;
FILE *read_file;
read_file = fopen("/home/iharob/data.dat", "r");
if (read_file == NULL)
return -1;
index = 0;
while (fgets(line, sizeof(line), read_file) != NULL)
{
size_t length;
length = strlen(line);
if (line[length - 1] == '\n')
line[length - 1] = '\0';
persons[index++] = parseLineAndExtractPerson(line);
}
fclose(read_file);
while (--index >= 0)
printf("%s: %s, %s\n", persons[index].name, persons[index].address, persons[index].phone);
return 0;
}
Parsing strings returned by fgets can be very annoying, especially when input is truncated. In fact, fgets leaves a lot to be desired. Did you get the correct string or was there more? Is there a newline at the end? For that matter, is the end 20 bytes away or 32768 bytes away? It would be nice if you didn't need to count that many bytes twice -- once with fgets and once with strlen, just to remove a newline that you didn't want.
Things like fscanf don't necessarily work as intended in this situation unless you have C99's "scanset" feature available, and then that will automatically add a null terminator, if you have enough room. The return value of any of the scanf family is your friend in determining whether success or failure occurred.
You can avoid the null terminator by using %NNc, where NN is the width, but if there's a \t in those NN bytes, then you need to separate it and move it to the next field, except that means bytes in the next field must be moved to the field after that one, and the 90th field will need its bytes moved to the 91st field... And hopefully you only need to do that once... Obviously that isn't actually a solution either.
Given those reasons, I feel it's easier just to read until you encounter one of the expected delimiters and let you decide the behavior of the function when the size specified is too small for a null terminator, yet large enough to fill your buffer. Anyway, here's the code. I think it's pretty straightforward:
/*
* Read a token.
*
* tok: The buffer used to store the token.
* max: The maximum number of characters to store in the buffer.
* delims: A string containing the individual delimiter bytes.
* fileptr: The file pointer to read the token from.
*
* Return value:
* - max: The buffer is full. In this case, the string _IS NOT_ null terminated.
* This may or may not be a problem: it's your choice.
* - (size_t)-1: An I/O error occurred before the last delimiter
* (just like with `fgets`, use `feof`).
* - any other value: The length of the token as `strlen` would return.
* In this case, the string _IS_ null terminated.
*/
size_t
read_token(char *restrict tok, size_t max, const char *restrict delims,
FILE *restrict fileptr)
{
int c;
size_t n;
for (n = 0; n < max && (c = getchar()) != EOF &&
strchr(delims, c) == NULL; ++n)
*tok++ = c;
if (c == EOF)
return (size_t)-1;
if (n == max)
return max;
*tok = 0;
return n;
}
Usage is pretty straightforward as well:
#include <stdio.h>
#include <stdlib.h>
typedef struct person {
char name[20];
char address[30];
char phone[20];
} Person;
int
main(void)
{
FILE *read_file;
Person temp;
size_t line_num;
size_t len;
int c;
int exit_status = EXIT_SUCCESS;
read_file = fopen("read.txt", "r");
if (read_file == NULL) {
fprintf(stderr, "Error opening read.txt\n");
return 1;
}
for (line_num = 0;; ++line_num) {
/*
* Used for detecting early EOF
* (e.g. the last line contains only a name).
*/
temp.name[0] = temp.phone[0] = 0;
len = read_token(temp.name, sizeof(temp.name), "\t",
read_file);
if (len == (size_t)-1)
break;
if (len == max) {
fprintf(stderr, "Skipping bad line %zu\n", line_num + 1);
while ((c = getchar()) != EOF && c != '\n')
; /* nothing */
continue;
}
len = read_token(temp.address, sizeof(temp.address), "\t",
read_file);
if (len == (size_t)-1)
break;
if (len == max) {
fprintf(stderr, "Skipping bad line %zu\n", line_num + 1);
while ((c = getchar()) != EOF && c != '\n')
; /* nothing */
continue;
}
len = read_token(temp.phone, sizeof(temp.phone), "\t",
read_file);
if (len == (size_t)-1)
break;
if (len == max) {
fprintf(stderr, "Skipping bad line %zu\n", line_num + 1);
while ((c = getchar()) != EOF && c != '\n')
; /* nothing */
continue;
}
// Do something with the input here. Example:
printf("Entry %zu:\n"
"\tName: %.*s\n"
"\tAddress: %.*s\n"
"\tPhone: %.*s\n\n",
line_num + 1,
(int)sizeof(temp.name), temp.name,
(int)sizeof(temp.address), temp.address,
(int)sizeof(temp.phone), temp.phone);
}
if (ferror(read_file)) {
fprintf(stderr, "error reading from file\n");
exit_status = EXIT_FAILURE;
}
else if (feof(read_file) && temp.phone[0] == 0 && temp.name[0] != 0) {
fprintf(stderr, "Unexpected end of file while reading entry %zu\n",
line_num + 1);
exit_status = EXIT_FAILURE;
}
//else feof(read_file) is still true, but we parsed a full entry/record
fclose(read_file);
return exit_status;
}
Notice how the exact same 8 lines of code appear in the read loop to handle the return value of read_token? Because of that, I think there's probably room for another function to call read_token and handle its return value, allowing main to simply call this "read_token handler", but I think the code above gives you the basic idea about how to work with read_token and how it can apply in your situation. You might change the behavior in some way, if you like, but the read_token function above would suit me rather well when working with delimited input like this (things would be a bit more complex when you add quoted fields into the mix, but not much more complex as far as I can tell). You can decide what happens with max being returned. I opted for it being considered an error, but you might think otherwise. You might even add an extra getchar when n == max and consider max being a successful return value and something like (size_t)-2 being the "token too large" error indicator instead.

Resources