Copying a line from a csv file using pointers - c

I'm writing a program that is supposed to search for a name in a CSV file and copy the record (all the info on the same line as the name) that goes along with it.
For example, if CSV file contains:
Bob, 13, 12345612
Eli, 12, 21398743
I would input "Bob" to get the first line, and copy this into an array called "record".
So far my code is as follows:
#include<stdio.h>
#include<stdlib.h>
void FindRecord(char *a, char *b, char c[]);
void main(void){
char arrayName[100];
char arrayNewname[100];
char *name = arrayName;
char *newname = arrayNewname;
char record[1000];
printf("Please input a name in the phonebook: ");
scanf("%s", name);
printf("Please input a replacement name: ");
scanf("%s", newname);
FindRecord("phonebook.csv",name,record);
}
void FindRecord(char *filename, char *name, char record[]){
//Create temp array of max size
char temp[1000];
//Open file
FILE *f = fopen(filename, "r");
//Make sure file exists
if (f == NULL){
printf("File does not exist");
fclose(f);
exit(1);
}
//While
while(!feof(f)){
//Read one line at a time
fgets(temp, 1000, f);
int i = 0;
int *p;
for(int i = 0; i < 1000; i++){
if(temp[i] == *name){
record[i] = temp[i];
name++;
}
size_t n = (sizeof record /sizeof record[0]);
if(temp[i] == *name){
*p = temp[i + n];
}
}
}
printf("%s", record);
fclose(f);
}
Basically, I've found Bob and copied Bob, but do not understand how to proceed using pointers (not allowed to use string.h) and copying the rest of the line. I've been trying to play around with the length of the word once Ive found it but this isn't working either because of pointers. Any help/hints would be appreciated.

When you are not allowed to use string.h you have to compare strings char by char.
Start by having a loop to find the name.
See: Compare two strings character by character in C
And then copy the rest of the record. Either char by char or with more advance functions like memcpy:
https://www.techonthenet.com/c_language/standard_library_functions/string_h/memcpy.php
Now who is int *p ?
You didn't initialized this pointer. You have to either malloc it or assign it to an existing memory.
See for more information:
https://pebble.gitbooks.io/learning-c-with-pebble/content/chapter08.html
I think you should read more about pointers and then things will work much better for you.

Not being allowed to use string.h is a good exercise in computing the string lengths manually (necessary to remove the trailing '\n' from the lines read with fgets) and also a good exercise for manual string comparisons.
For example, if you are reading lines into buf, you can use a simple for loop to get the length of buf, e.g.
int blen = 0;
for (; buf[blen]; blen++) {} /* get line length */
(note: you find the length of name, say nlen in a similar manner)
Then having the length in blen, you can easily check that the final character in buf is the '\n' character and remove it by overwriting the newline with the nul-terminating character, e.g.
if (blen && buf[blen - 1] == '\n') /* check/remove '\n' */
buf[--blen] = 0; /* overwrite with '\0' */
The remainder of your findrecord function is just a matter of iterating forward over each character looking for the character that is the first character in name. Once found, you simply compare then next nlen character to see if you have found name in buf. You can easily do that with:
char *np = name, /* pointer to name */
*bp = p; /* current pointer in buf */
...
for (i = 0; /* compre name in buf */
i < nlen && *np && *bp && *np == *bp;
i++, np++, bp++) {}
/* validate nlen chars match in buf */
if (np - name == nlen && *(np-1) == *(bp-1)) {
One you have validated you found name in buf, simply copy buf to record insuring your nul-terminate record when done copying buf, e.g.
if (np - name == nlen && *(np-1) == *(bp-1)) {
bp = buf;
for (i = 0; buf[i]; i++) /* copy buf to record */
record[i] = buf[i];
record[i] = buf[i]; /* nul-terminate */
return record; /* return record */
}
Putting it altogether, you could do something similar to the following:
#include <stdio.h>
#include <stdlib.h>
enum { MAXC = 100, MAXL = 1000 }; /* if you need constants, define them */
char *findrecord (FILE *fp, char *name, char *record);
/* main is type 'int', and has arguments -- use them */
int main (int argc, char **argv) {
char name[MAXC] = "",
replace[MAXC] = "",
record[MAXL] = "",
*matched;
FILE *fp = argc > 1 ? fopen (argv[1], "r") : NULL;
if (!fp) { /* validate file open for reading */
fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
return 1;
}
/* prompt, read, validate name */
printf ("Please input a name in the phonebook: ");
if (scanf ("%99[^\n]%*c", name) != 1) {
fprintf (stderr, "error: invalid input - name.\n");
return 1;
}
/* prompt, read, validate replace */
printf("Please input a replacement name: ");
if (scanf ("%99[^\n]%*c", replace) != 1) {
fprintf (stderr, "error: invalid input - replace.\n");
return 1;
}
/* search name, copy record, return indicates success/failure */
matched = findrecord (fp, name, record);
if (fp != stdin) fclose (fp); /* close file if not stdin */
if (matched) { /* if name matched */
printf ("record : '%s'\n", record);
}
return 0; /* main() returns a value */
}
char *findrecord (FILE *fp, char *name, char *record){
char buf[MAXL] = ""; /* buf for line */
while (fgets (buf, MAXL, fp)) { /* for each line */
char *p = buf;
int blen = 0;
for (; buf[blen]; blen++) {} /* get line length */
if (blen && buf[blen - 1] == '\n') /* check/remove '\n' */
buf[--blen] = 0; /* overwrite with '\0' */
for (; *p && *p != '\n'; p++) /* for each char in line */
if (*p == *name) { /* match start of name? */
char *np = name, /* pointer to name */
*bp = p; /* current pointer in buf */
int i = 0, /* general 'i' var */
nlen = 0; /* name length var */
for (nlen = 0; name[nlen]; nlen++) {} /* name length */
for (i = 0; /* compre name in buf */
i < nlen && *np && *bp && *np == *bp;
i++, np++, bp++) {}
/* validate nlen chars match in buf */
if (np - name == nlen && *(np-1) == *(bp-1)) {
bp = buf;
for (i = 0; buf[i]; i++) /* copy buf to record */
record[i] = buf[i];
record[i] = buf[i]; /* nul-terminate */
return record; /* return record */
}
}
}
return NULL; /* indicate no match in file */
}
Example Use/Output
$ ./bin/findrec dat/bob.csv
Please input a name in the phonebook: Bob
Please input a replacement name: Sam
record : 'Bob, 13, 12345612'
Non-Match Example
$ ./bin/findrec dat/bob.csv
Please input a name in the phonebook: Jerry
Please input a replacement name: Bob
Look things over and let me know if you have further questions.

Related

removing trailing and leading spaces from a file

I am trying to read lines from a text file of unknown length.
In the line there can be leading and trailing white-spaces until the string occurs.
So my first step is to read line by line and allocate memory for the strings. Then remove all the leading and trailing white spaces.
After that I want to check if the string has any white space characters in it which is an invalid character. For example the string can not look like this "bad string" but can look like this "goodstring".
However when I call the function to remove the leading and trailing white spaces it also removes characters before or after a white space.
Could someone tell me what I am doing wrong?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#define NCHAR 64
char *readline (FILE *fp, char **buffer);
char *strstrip(char *s);
int main (int argc, char **argv) {
char *line = NULL;
size_t idx = 0;
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) {
fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
return 1;
}
while (readline (fp, &line)) { /* read each line in 'fp' */
printf (" line[%2zu] : %s\n", idx++, line);
free (line);
line = NULL;
}
if (fp != stdin) fclose (fp);
return 0;
}
/* read line from 'fp' allocate *buffer NCHAR in size
* realloc as necessary. Returns a pointer to *buffer
* on success, NULL otherwise.
*/
char *readline (FILE *fp, char **buffer)
{
int ch;
size_t buflen = 0, nchar = NCHAR;
size_t n;
char *invalid_character = " ";
*buffer = malloc (nchar); /* allocate buffer nchar in length */
if (!*buffer) {
fprintf (stderr, "readline() error: virtual memory exhausted.\n");
return NULL;
}
while ((ch = fgetc(fp)) != '\n' && ch != EOF)
{
(*buffer)[buflen++] = ch;
if (buflen + 1 >= nchar) { /* realloc */
char *tmp = realloc (*buffer, nchar * 2);
if (!tmp) {
fprintf (stderr, "error: realloc failed, "
"returning partial buffer.\n");
(*buffer)[buflen] = 0;
return *buffer;
}
*buffer = tmp;
nchar *= 2;
}
strstrip(*buffer); //remove traiing/leading spaces
}
(*buffer)[buflen] = 0; /* nul-terminate */
if (invalid_character[n = strspn(invalid_character, *buffer)] == '\0') //check if a string has invalid character ' ' in it
{
puts(" invalid characters");
}
if (buflen == 0 && ch == EOF) { /* return NULL if nothing read */
free (*buffer);
*buffer = NULL;
}
return *buffer;
}
char *strstrip(char *s)
{
size_t size;
char *end;
size = strlen(s);
if (!size)
return s;
end = s + size - 1;
while (end >= s && isspace(*end))
end--;
*(end + 1) = '\0';
while (*s && isspace(*s))
s++;
return s;
}
You do not need to worry about the length of the string passed to strstrip(), simply iterate over all characters in the string removing whitespace characters, e.g. the following version removals ALL whitespace from s:
/** remove ALL leading, interleaved and trailing whitespace, in place.
* the original start address is preserved but due to reindexing,
* the contents of the original are not preserved. returns pointer
* to 's'. (ctype.h required)
*/
char *strstrip (char *s)
{
if (!s) return NULL; /* valdiate string not NULL */
if (!*s) return s; /* handle empty string */
char *p = s, *wp = s; /* pointer and write-pointer */
while (*p) { /* loop over each character */
while (isspace ((unsigned char)*p)) /* if whitespace advance ptr */
p++;
*wp++ = *p; /* use non-ws char */
if (*p)
p++;
}
*wp = 0; /* nul-terminate */
return s;
}
(note: if the argument to isspace() is type char, a cast to unsigned char is required, see NOTES Section, e.g. man 3 isalpha)
Removing only Excess Whitespace
The following version removes leading and trailing whitespace and collapses multiple sequences of whitespace to a single space:
/** remove excess leading, interleaved and trailing whitespace, in place.
* the original start address is preserved but due to reindexing,
* the contents of the original are not preserved. returns pointer
* to 's'. (ctype.h required) NOTE: LATEST
*/
char *strstrip (char *s)
{
if (!s) return NULL; /* valdiate string not NULL */
if (!*s) return s; /* handle empty string */
char *p = s, *wp = s; /* pointer and write-pointer */
while (*p) {
if (isspace((unsigned char)*p)) { /* test for ws */
if (wp > s) /* ignore leading ws, while */
*wp++ = *p; /* preserving 1 between words */
while (*p && isspace (unsigned char)(*p)) /* skip remainder */
p++;
if (!*p) /* bail on end-of-string */
break;
}
if (*p == '.') /* handle space between word and '.' */
while (wp > s && isspace ((unsigned char)*(wp - 1)))
wp--;
*wp++ = *p; /* use non-ws char */
p++;
}
while (wp > s && isspace ((unsigned char)*(wp - 1))) /* trim trailing ws */
wp--;
*wp = 0; /* nul-terminate */
return s;
}
(note: s must be mutable and therefore cannot be a string-literal)

Reading words separately from file

I'm trying to make a program that scans a file containing words line by line and removes words that are spelled the same if you read them backwards (palindromes)
This is the program.c file:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "header.h"
int main(int argc, char **argv)
{
if(argc != 3)
{
printf("Wrong parameters");
return 0;
}
FILE *data;
FILE *result;
char *StringFromFile = (char*)malloc(255);
char *word = (char*)malloc(255);
const char *dat = argv[1];
const char *res = argv[2];
data = fopen(dat, "r");
result =fopen(res, "w");
while(fgets(StringFromFile, 255, data))
{
function1(StringFromFile, word);
fputs(StringFromFile, result);
}
free(StringFromFile);
free (word);
fclose(data);
fclose(result);
return 0;
}
This is the header.h file:
#ifndef HEADER_H_INCLUDEC
#define HEADER_H_INCLUDED
void function1(char *StringFromFile, char *word);
void moving(char *StringFromFile, int *index, int StringLength, int WordLength);
#endif
This is the function file:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "header.h"
void function1(char *StringFromFile, char *word)
{
int StringLength = strlen(StringFromFile);
int WordLength;
int i;
int p;
int k;
int t;
int m;
int match;
for(i = 0; i < StringLength; i++)
{ k=0;
t=0;
m=i;
if (StringFromFile[i] != ' ')
{ while (StringFromFile[i] != ' ')
{
word[k]=StringFromFile[i];
k=k+1;
i=i+1;
}
//printf("%s\n", word);
WordLength = strlen(word)-1;
p = WordLength-1;
match=0;
while (t <= p)
{
if (word[t] == word[p])
{
match=match+1;
}
t=t+1;
p=p-1;
}
if ((match*2) >= (WordLength))
{
moving(StringFromFile, &m, StringLength, WordLength);
}
}
}
}
void moving(char *StringFromFile, int *index, int StringLength, int WordLength)
{ int i;
int q=WordLength-1;
for(i = *index; i < StringLength; i++)
{
StringFromFile[i-1] = StringFromFile[i+q];
}
*(index) = *(index)-1;
}
It doesn't read each word correctly, though.
This is the data file:
abcba rttt plllp
aaaaaaaaaaaa
ababa
abbbba
kede
These are the separate words the program reads:
abcba
rttta
plllp
aaaaaaaaaaaa
ababa
abbbba
kede
This is the result file:
abcba rtttp
kede
It works fine if there is only one word in a single line, but it messes up when there are multiple words. Any help is appreciated.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "header.h"
# define MAX 255
int Find_Number_Words_in_Line( char str[MAX] )
{
char *ptr;
int count = 0;
int j;
/* advance character pointer ptr until end of str[MAX] */
/* everytime you see the space character, increase count */
/* might not always work, you'll need to handle multiple space characters before/between/after words */
ptr = str;
for ( j = 0; j < MAX; j++ )
{
if ( *ptr == ' ' )
count++;
else if (( *ptr == '\0' ) || ( *ptr == '\n' ))
break;
ptr++;
}
return count;
}
void Extract_Word_From_Line_Based_on_Position( char line[MAX], char word[MAX], const int position )
{
char *ptr;
/* move pointer down line[], counting past the number of spaces specified by position */
/* then copy the next word from line[] into word[] */
}
int Is_Palindrome ( char str[MAX] )
{
/* check if str[] is a palindrome, if so return 1, else return 0 */
}
int main(int argc, char **argv)
{
FILE *data_file;
FILE *result_file;
char *line_from_data_file = (char*)malloc(MAX);
char *word = (char*)malloc(MAX);
const char *dat = argv[1];
const char *res = argv[2];
int j, n;
if (argc != 3)
{
printf("Wrong parameters");
return 0;
}
data_file = fopen(dat, "r");
result_file = fopen(res, "w");
fgets( line_from_data_file, MAX, data_file );
while ( ! feof( data_file ) )
{
/*
fgets returns everything up to newline character from data_file,
function1 in original context would only run once for each line read
from data_file, so you would only get the first word
function1( line_from_data_file, word );
fputs( word, result_file );
fgets( line_from_data_file, MAX, data_file );
instead try below, you will need to write the code for these new functions
don't be afraid to name functions in basic English for what they are meant to do
make your code more easily readable
*/
n = Find_Number_Words_in_Line( line_from_data_file );
for ( j = 0; j < n; j++ )
{
Extract_Word_From_Line_Based_on_Position( line_from_data_file, word, n );
if ( Is_Palindrome( word ) )
fputs( word, result_file ); /* this will put one palindrome per line in result file */
}
fgets( line_from_data_file, MAX, data_file );
}
free( line_from_data_file );
free( word );
fclose( data_file );
fclose( result_file );
return 0;
}
To follow up from the comments, you may be overthinking the problem a bit. To check whether each word in each line of a file is a palindrome, you have a 2 part problem. (1) reading each line (fgets is fine), and (2) breaking each line into individual words (tokens) so that you can test whether each token is a palindrome.
When reading each line with fgets, a simple while loop conditioned on the return of fgets will do. e.g., with a buffer buf of sufficient size (MAXC chars), and FILE * stream fp open for reading, you can do:
while (fgets (buf, MAXC, fp)) { /* read each line */
... /* process line */
}
(you can test the length of the line read into buf is less than MAXC chars to insure you read the complete line, if not, any unread chars will be placed in buf on the next loop iteration. This check, and how you want to handle it, is left for you.)
Once you have your line read, you can either use a simple pair of pointers (start and end pointers) to work your way through buf, or you can use strtok and let it return a pointer to the beginning of each word in the line based on the set of delimiters you pass to it. For example, to split a line into words, you probably want to use delimiters like " \t\n.,:;!?" to insure you get words alone and not words with punctuation (e.g. in the line "sit here.", you want "sit" and "here", not "here.")
Using strtok is straight forward. On the first call, you pass the name of the buffer holding the string to be tokenized and a pointer to the string containing the delimiters (e.g. strtok (buf, delims) above), then for each subsequent call (until the end of the line is reached) you use NULL as name of the buffer (e.g. strtok (NULL, delims)) You can either call it once and then loop until NULL is returned, or you can do it all using a single for loop given that for allows setting an initial condition as part of the statement, e.g., using separate calls:
char *delims = " \t\n.,:;"; /* delimiters */
char *p = strtok (buf, delims); /* first call to strtok */
while ((p = strtok (NULL, delims))) { /* all subsequent calls */
... /* check for palindrome */
}
Or you can simply make the initial call and all subsequent calls in a for loop:
/* same thing in a single 'for' statement */
for (p = strtok (buf, delims); p; p = strtok (NULL, delims)) {
... /* check for palindrome */
}
Now you are to the point you need to check for palindromes. That is a fairly easy process. Find the length of the token, then either using string indexes, or simply using a pointer to the first and last character, work from the ends to the middle of each token making sure the characters match. On the first mismatch, you know the token is not a palindrome. I find a start and end pointer just as easy as manipulating sting indexes, e.g. with the token in s:
char *ispalindrome (char *s) /* function to check palindrome */
{
char *p = s, /* start pointer */
*ep = s + strlen (s) - 1; /* end pointer */
for ( ; p < ep; p++, ep--) /* work from end to middle */
if (*p != *ep) /* if chars !=, not palindrome */
return NULL;
return s;
}
If you put all the pieces together, you can do something like the following:
#include <stdio.h>
#include <string.h>
enum { MAXC = 256 }; /* max chars for line buffer */
char *ispalindrome (char *s);
int main (int argc, char **argv) {
char buf[MAXC] = "", /* line buffer */
*delims = " \t\n.,:;"; /* delimiters */
unsigned ndx = 0; /* line index */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
return 1;
}
while (fgets (buf, MAXC, fp)) { /* read each line */
char *p = buf; /* pointer to pass to strtok */
printf ("\n line[%2u]: %s\n tokens:\n", ndx++, buf);
for (p = strtok (buf, delims); p; p = strtok (NULL, delims))
if (ispalindrome (p))
printf (" %-16s - palindrome\n", p);
else
printf (" %-16s - not palindrome\n", p);
}
if (fp != stdin) fclose (fp);
return 0;
}
char *ispalindrome (char *s) /* function to check palindrome */
{
char *p = s, *ep = s + strlen (s) - 1; /* ptr & end-ptr */
for ( ; p < ep; p++, ep--) /* work from end to middle */
if (*p != *ep) /* if chars !=, not palindrome */
return NULL;
return s;
}
Example Input
$ cat dat/palins.txt
abcba rttt plllp
aaaaaaaaaaaa
ababa
abbbba
kede
Example Use/Output
$ ./bin/palindrome <dat/palins.txt
line[ 0]: abcba rttt plllp
tokens:
abcba - palindrome
rttt - not palindrome
plllp - palindrome
line[ 1]: aaaaaaaaaaaa
tokens:
aaaaaaaaaaaa - palindrome
line[ 2]: ababa
tokens:
ababa - palindrome
line[ 3]: abbbba
tokens:
abbbba - palindrome
line[ 4]: kede
tokens:
kede - not palindrome
Look things over and think about what it taking place. As mentioned above, insuring you have read a complete line in each call with fgets should be validated, that is left to you. (but with this input file -- of course it will) If you have any questions, let me know and I'll be happy to help further.

Verify file format first and then retrieve required information in C

<School>
</SchoolName>latha2 //skip, but keep
</School>
<Student>
<Team>power //skip,but keep
<StudentNo>1 //skip
<Sport>
<StartDate>16122016</StartDate> //*skip(May or maynot contained)
<SportType>All
<ExpiryDate>16122020</EndDate> //*skip (May or maynot contained)
</Sport>
<Personal>
<phone>50855466 //skip,but keep
<rollno>6 //skip,but keep
</Personal>
<hobby> //skip
</Student>
Note: There are 4 <Student> tags.
Assume that File1 is fixed and File2 is input-file.
In File 1,one school with 4 students. In File 2,there are many schools but have to check with File-1 format repeatedly according to the number of schools it has. Above is an example of File1.
"Scenario"
- There are 4 package of "Student" tags in one school. In each tags, the value of "Team" are repeated.
"Questions with restrictions"
From File 1,"Sport" Tag, "StartDate" and "ExpiryDate" are defined but they may not be contained in every "School" from File2.
If they are defined, how to verify that they should be at the correct line?.
How to verify that format is right even they are not defined in some schools of File2?
Some lines are skipped when 2 files are compared but some lines need to be collected form File2 to write a new txt even they are skipped. From File2, "SchoolName", "Team","phone" and "roll no" are retrieved and write txt altogether line by line.
****Important, retrieve "Team" once from one "School". Because it is repeated 4 times in four "Student" from same "School".
How to retrieve only SchoolName,Team,Phone,RollNo among the skipped lines?
How to retrieve only Team in writing new textfile even it is duplicated in students under one school?
Two things to be done. 1. Match File Format 2.New Text with specific values
"Example of new text"
latha2 // SchoolName
power // Team
5035546 // phone - student1
6 // rollno - student1
5089973 // phone - student2
5 // rollno - student2
5402734 // phone - student3
1 // rollno - student3
8540345 // phone - student4
2 // rollno - student4
Compared to the earlier question, this one isn't about comparing files for differences with certain exceptions. Here you have a format file that provides the valid tags and order, and then a data file that contains the tags with data. So rather than comparing for differences, you are reading the first to obtain the expected/valid tags, then reading/processing the second to obtain the wanted information.
Below, I also have the code check that the tags in the file appear in the correct order. You can loosen that restriction if you don't need it. Another bit of logic skips lines less that 3 chars (a valid tag has at least 3 (e.g. <t>)).
The formatting of the output is very simple and you can improve it as needed. I had no data file to work with, so I used the information you provided and created one by duplicating your file above 3-times in a separate file. Look over the code. As other have mentioned, parsing XML in C, while a great assignment, is rarely done in practice because other tools provide readily available tools for handling the schemas. Let me know if you have any questions. This will provide you with one approach to handling this type of information:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAXL 128
char *gettag (char *s, char *t);
int main (int argc, char **argv) {
if (argc < 3 ) {
fprintf (stderr, "error: insufficient input, usage: %s file1 file2\n",
argv[0]);
return 1;
}
char s1[MAXL] = {0}; /* line buffer */
char *tags[MAXL] = {0}; /* file 1 tags */
char *retr[] = { "<Team>", "<phone>", /* skip/print terms */
"<rollno>" };
char *retr1[] = { "</SchoolName>", /* skip/print once */
"<Team>" };
char *skip[] = { "<StudentNo>","<hobby>" }; /* skip terms */
char *opt[] = { "<StartDate>", /* optional tags */
"<ExpiryDate>"};
size_t retrsz = sizeof retr/sizeof *retr; /* elements in retr */
size_t retr1sz = sizeof retr1/sizeof *retr1;/* elements in retr1*/
size_t skipsz = sizeof skip/sizeof *skip; /* elements in skip */
size_t optsz = sizeof opt/sizeof *opt; /* elements in opt */
size_t tidx = 0; /* tags indes */
size_t idx = 0; /* general index */
size_t i = 0; /* general variable */
FILE *f1, *f2; /* file pointers */
unsigned char retvd[retr1sz]; /* retr1 flag */
unsigned char tagok = 0; /* tag OK flag */
/* initialize retr1 VLA values */
for (i = 0; i < retr1sz; i++)
retvd[i] = 0;
/* open both files or exit */
if (!((f1 = fopen (argv[1], "r")) && (f2 = fopen (argv[2], "r")))) {
fprintf (stderr, "error: file open failure.\n");
return 1;
}
/* read lines from format file1, create tags array */
while (fgets (s1, MAXL, f1))
{
size_t len = strlen (s1);
while (len && (s1[len-1] == '\n' || s1[len-1] == '\r'))
s1[--len] = 0; /* strip newline or carriage return */
if (len < 3) /* skip blank, 3 char for valid tag */
continue;
char *tmp = NULL;
if ((tmp = gettag (s1, NULL)) == NULL) {
fprintf (stderr, "error: tag not found in '%s'", s1);
return 1;
}
tags[tidx++] = tmp;
}
fclose (f1); /* close file1 */
/* read each line in file2 */
while (fgets (s1, MAXL, f2))
{
char tag[MAXL] = {0};
size_t len = strlen (s1);
while (len && (s1[len-1] == '\n' || s1[len-1] == '\r'))
s1[--len] = 0; /* strip newline or carriage return */
if (len < 3) /* skip blank or lines < 3 chars */
goto skipping;
gettag (s1, tag);
/* verify that current tag is a valid tag from format file */
if (strncmp (tag, tags[idx], strlen (tags[idx])) != 0) {
tagok = 0;
for (i = 0; i < tidx; i++) {
if (strncmp (tag, tags[i], strlen (tags[i])) == 0) {
tagok = 1;
break;
}
}
if (!tagok) {
fprintf (stderr, "warning: invalid tag '%s', skipping.\n", tag);
goto skipping; /* or handle as desired (e.g. exit) */
}
}
/* check if tag is retr1 and not retvd, if so skip/print */
for (i = 0; i < retr1sz; i++)
if (strncmp (tag, retr1[i], strlen (retr1[i])) == 0) {
if (!retvd[i]) { /* print line skipped */
char *p = strchr (s1, '>'); /* print data */
printf ("%s\n", (p + 1));
retvd[i] = 1; /* set flag to skip next */
}
goto incriment; /* yes -- it lives.... */
}
/* check if tag is a known retr tag, if so skip/print */
for (i = 0; i < retrsz; i++) /* skip if matches skip[i] */
if (strncmp (tag, retr[i], strlen (retr[i])) == 0) {
char *p = strchr (s1, '>');
printf ("%s\n", (p + 1)); /* print data */
goto incriment;
}
/* check if tag is a known skip tag, if so skip/print */
for (i = 0; i < skipsz; i++) /* skip if matches skip[i] */
if (strncmp (tag, skip[i], strlen (skip[i])) == 0)
goto incriment;
/* check if tag matches optional tag, if so skip */
for (i = 0; i < optsz; i++) {
if (strncmp (tag, opt[i], strlen (opt[i]) == 0))
goto incriment;
}
incriment:;
idx++; /* increment index */
if (idx == tidx) /* reset if tagsz */
idx = 0;
skipping:;
}
fclose (f2); /* xlose file2 */
for (i = 0; i < tidx; i++) /* free tags memory */
free (tags[i]);
return 0;
}
/* extract <tag> from s.
* if 't' is NULL, memory is allocated sufficient to hold <tag> + 1
* characters, else <tag> is copied to 't' without allocations.
* On success, the address of 't' is returned, NULL otherwise
*/
char *gettag (char *s, char *t)
{
if (!s) return NULL; /* test valid string */
char *p = strchr (s, '>'); /* find first '>' in s */
if (!p) return NULL; /* if no '>', return NULL */
size_t len = strlen (s);
unsigned char nt = 0;
int tmpc = 0;
if (len > (size_t)(p - s) + 1) {/* if chars after '>' */
tmpc = *(p + 1); /* save char before term */
*(p + 1) = 0; /* null-terminate at '>' */
nt = 1; /* set null-terminated flag */
}
char *sp = s;
while (sp < p && *sp != '<') /* trim space before '<' */
sp++;
if (!t)
t = strdup (sp); /* allocate/copy to t */
else
strncpy (t, sp, len + 1); /* copy w/terminator */
if (nt) /* if null-terminated */
*(p + 1) = tmpc; /* restore '>' character */
return t;
}
File1 - format (list tags)
$ cat dat/student_format.txt
<School>
</SchoolName>latha2 //skip, but keep
</School>
<Student>
<Team>power //skip,but keep
<StudentNo>1 //skip
<Sport>
<StartDate>16122016</StartDate> //*skip(May or maynot contained)
<SportType>All
<ExpiryDate>16122020</EndDate> //*skip (May or maynot contained)
</Sport>
<Personal>
<phone>50855466 //skip,but keep
<rollno>6 //skip,but keep
</Personal>
<hobby> //skip
</Student>
File1 - data file (same as above 3 times)
$ cat dat/student_file.txt
<School>
</SchoolName>latha2
</School>
<Student>
<Team>power
<StudentNo>1
<Sport>
<StartDate>16122016</StartDate>
<SportType>All
<ExpiryDate>16122020</EndDate>
</Sport>
<Personal>
<phone>50855466
<rollno>6
</Personal>
<hobby>
</Student>
<School>
</SchoolName>latha2
</School>
<Student>
<Team>power
<StudentNo>1
<Sport>
<StartDate>16122016</StartDate>
<SportType>All
<ExpiryDate>16122020</EndDate>
</Sport>
<Personal>
<phone>50855466
<rollno>6
</Personal>
<hobby>
</Student>
<School>
</SchoolName>latha2
</School>
<Student>
<Team>power
<StudentNo>1
<Sport>
<StartDate>16122016</StartDate>
<SportType>All
<ExpiryDate>16122020</EndDate>
</Sport>
<Personal>
<phone>50855466
<rollno>6
</Personal>
<hobby>
</Student>
Example Output
$ ./bin/cmpf1f2_2 dat/student_format.txt dat/student_file.txt
latha2
power
50855466
6
50855466
6
50855466
6

c read block of lines and store them [duplicate]

I am really new to C, and the reading files thing drives me crazy...
I want read a file including name, born place and phone number, etc. All separated by tab
The format might be like this:
Bob Jason Los Angeles 33333333
Alice Wong Washington DC 111-333-222
So I create a struct to record it.
typedef struct Person{
char name[20];
char address[30];
char phone[20];
} Person;
I tried many ways to read this file into struct but it failed.
I tired fread:
read_file = fopen("read.txt", "r");
Person temp;
fread(&temp, sizeof(Person), 100, read_file);
printf("%s %s %s \n", temp.name, temp.address, temp.phone);
But char string does not recorded into temp separated by tab, it read the whole file into temp.name and get weird output.
Then I tried fscanf and sscanf, those all not working for separating tab
fscanf(read_file, "%s %s %s", temp.name, temp.address, temp.phone);
Or
fscanf(read_file, "%s\t%s\t%s", temp.name, temp.address, temp.phone);
This separates the string by space, so I get Bob and Jason separately, while indeed, I need to get "Bob Jason" as one char string. And I did separate these format by tab when I created the text file.
Same for sscanf, I tried different ways many times...
Please help...
I suggest:
Use fgets to read the text line by line.
Use strtok to separate the contents of the line by using tab as the delimiter.
// Use an appropriate number for LINE_SIZE
#define LINE_SIZE 200
char line[LINE_SIZE];
if ( fgets(line, sizeof(line), read_file) == NULL )
{
// Deal with error.
}
Person temp;
char* token = strtok(line, "\t");
if ( token == NULL )
{
// Deal with error.
}
else
{
// Copy token at most the number of characters
// temp.name can hold. Similar logic applies to address
// and phone number.
temp.name[0] = '\0';
strncat(temp.name, token, sizeof(temp.name)-1);
}
token = strtok(NULL, "\t");
if ( token == NULL )
{
// Deal with error.
}
else
{
temp.address[0] = '\0';
strncat(temp.address, token, sizeof(temp.address)-1);
}
token = strtok(NULL, "\n");
if ( token == NULL )
{
// Deal with error.
}
else
{
temp.phone[0] = '\0';
strncat(temp.phone, token, sizeof(temp.phone)-1);
}
Update
Using a helper function, the code can be reduced in size. (Thanks #chux)
// The helper function.
void copyToken(char* destination,
char* source,
size_t maxLen;
char const* delimiter)
{
char* token = strtok(source, delimiter);
if ( token != NULL )
{
destination[0] = '\0';
strncat(destination, token, maxLen-1);
}
}
// Use an appropriate number for LINE_SIZE
#define LINE_SIZE 200
char line[LINE_SIZE];
if ( fgets(line, sizeof(line), read_file) == NULL )
{
// Deal with error.
}
Person temp;
copyToken(temp.name, line, sizeof(temp.name), "\t");
copyToken(temp.address, NULL, sizeof(temp.address), "\t");
copyToken(temp.phone, NULL, sizeof(temp.phone), "\n");
This is only for demonstration, there are better ways to initialize variables, but to illustrate your main question i.e. reading a file delimited by tabs, you can write a function something like this:
Assuming a strict field definition, and your struct definition you can get tokens using strtok().
//for a file with constant field definitions
void GetFileContents(char *file, PERSON *person)
{
char line[260];
FILE *fp;
char *buf=0;
char temp[80];
int i = -1;
fp = fopen(file, "r");
while(fgets(line, 260, fp))
{
i++;
buf = strtok(line, "\t\n");
if(buf) strcpy(person[i].name, buf);
buf = strtok(NULL, "\t\n");
if(buf) strcpy(person[i].address, buf);
buf = strtok(NULL, "\t\n");
if(buf) strcpy(person[i].phone, buf);
//Note: if you have more fields, add more strtok/strcpy sections
//Note: This method will ONLY work for consistent number of fields.
//If variable number of fields, suggest 2 dimensional string array.
}
fclose(fp);
}
Call it in main() like this:
int main(void)
{
//...
PERSON person[NUM_LINES], *pPerson; //NUM_LINES defined elsewhere
//and there are better ways
//this is just for illustration
pPerson = &person[0];//initialize pointer to person
GetFileContents(filename, pPerson); //call function to populate person.
//...
return 0;
}
First thing,
fread(&temp, sizeof(temp), 100, read_file);
will not work because the fields are not fixed width, so it will always read 20 characters for name 30 for address and so on, which is not always the correct thing to do.
You need to read one line at a time, and then parse the line, you can use any method you like to read a like, a simple one is by using fgets() like this
char line[100];
Person persons[100];
int index;
index = 0;
while (fgets(line, sizeof(line), read_file) != NULL)
{
persons[i++] = parseLineAndExtractPerson(line);
}
Now we need a function to parse the line and store the data in you Person struct instance
char *extractToken(const char *const line, char *buffer, size_t bufferLength)
{
char *pointer;
size_t length;
if ((line == NULL) || (buffer == NULL))
return NULL;
pointer = strpbrk(line, "\t");
if (pointer == NULL)
length = strlen(line);
else
length = pointer - line;
if (length >= bufferLength) /* truncate the string if it was too long */
length = bufferLength - 1;
buffer[length] = '\0';
memcpy(buffer, line, length);
return pointer + 1;
}
Person parseLineAndExtractPerson(const char *line)
{
Person person;
person.name[0] = '\0';
person.address[0] = '\0';
person.phone[0] = '\0';
line = extractToken(line, person.name, sizeof(person.name));
line = extractToken(line, person.address, sizeof(person.address));
line = extractToken(line, person.phone, sizeof(person.phone));
return person;
}
Here is a sample implementation of a loop to read at most 100 records
int main(void)
{
char line[100];
Person persons[100];
int index;
FILE *read_file;
read_file = fopen("/path/to/the/file.type", "r");
if (read_file == NULL)
return -1;
index = 0;
while ((index < 100) && (fgets(line, sizeof(line), read_file) != NULL))
{
size_t length;
/* remove the '\n' left by `fgets()'. */
length = strlen(line);
if ((length > 0) && (line[length - 1] == '\n'))
line[length - 1] = '\0';
persons[index++] = parseLineAndExtractPerson(line);
}
fclose(read_file);
while (--index >= 0)
printf("%s: %s, %s\n", persons[index].name, persons[index].address, persons[index].phone);
return 0;
}
Here is a complete program that does what I think you need
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct Person{
char name[20];
char address[30];
char phone[20];
} Person;
char *extractToken(const char *const line, char *buffer, size_t bufferLength)
{
char *pointer;
size_t length;
if ((line == NULL) || (buffer == NULL))
return NULL;
pointer = strpbrk(line, "\t");
if (pointer == NULL)
length = strlen(line);
else
length = pointer - line;
if (length >= bufferLength) /* truncate the string if it was too long */
length = bufferLength - 1;
buffer[length] = '\0';
memcpy(buffer, line, length);
return pointer + 1;
}
Person parseLineAndExtractPerson(const char *line)
{
Person person;
person.name[0] = '\0';
person.address[0] = '\0';
person.phone[0] = '\0';
line = extractToken(line, person.name, sizeof(person.name));
line = extractToken(line, person.address, sizeof(person.address));
line = extractToken(line, person.phone, sizeof(person.phone));
return person;
}
int main(void)
{
char line[100];
Person persons[100];
int index;
FILE *read_file;
read_file = fopen("/home/iharob/data.dat", "r");
if (read_file == NULL)
return -1;
index = 0;
while (fgets(line, sizeof(line), read_file) != NULL)
{
size_t length;
length = strlen(line);
if (line[length - 1] == '\n')
line[length - 1] = '\0';
persons[index++] = parseLineAndExtractPerson(line);
}
fclose(read_file);
while (--index >= 0)
printf("%s: %s, %s\n", persons[index].name, persons[index].address, persons[index].phone);
return 0;
}
Parsing strings returned by fgets can be very annoying, especially when input is truncated. In fact, fgets leaves a lot to be desired. Did you get the correct string or was there more? Is there a newline at the end? For that matter, is the end 20 bytes away or 32768 bytes away? It would be nice if you didn't need to count that many bytes twice -- once with fgets and once with strlen, just to remove a newline that you didn't want.
Things like fscanf don't necessarily work as intended in this situation unless you have C99's "scanset" feature available, and then that will automatically add a null terminator, if you have enough room. The return value of any of the scanf family is your friend in determining whether success or failure occurred.
You can avoid the null terminator by using %NNc, where NN is the width, but if there's a \t in those NN bytes, then you need to separate it and move it to the next field, except that means bytes in the next field must be moved to the field after that one, and the 90th field will need its bytes moved to the 91st field... And hopefully you only need to do that once... Obviously that isn't actually a solution either.
Given those reasons, I feel it's easier just to read until you encounter one of the expected delimiters and let you decide the behavior of the function when the size specified is too small for a null terminator, yet large enough to fill your buffer. Anyway, here's the code. I think it's pretty straightforward:
/*
* Read a token.
*
* tok: The buffer used to store the token.
* max: The maximum number of characters to store in the buffer.
* delims: A string containing the individual delimiter bytes.
* fileptr: The file pointer to read the token from.
*
* Return value:
* - max: The buffer is full. In this case, the string _IS NOT_ null terminated.
* This may or may not be a problem: it's your choice.
* - (size_t)-1: An I/O error occurred before the last delimiter
* (just like with `fgets`, use `feof`).
* - any other value: The length of the token as `strlen` would return.
* In this case, the string _IS_ null terminated.
*/
size_t
read_token(char *restrict tok, size_t max, const char *restrict delims,
FILE *restrict fileptr)
{
int c;
size_t n;
for (n = 0; n < max && (c = getchar()) != EOF &&
strchr(delims, c) == NULL; ++n)
*tok++ = c;
if (c == EOF)
return (size_t)-1;
if (n == max)
return max;
*tok = 0;
return n;
}
Usage is pretty straightforward as well:
#include <stdio.h>
#include <stdlib.h>
typedef struct person {
char name[20];
char address[30];
char phone[20];
} Person;
int
main(void)
{
FILE *read_file;
Person temp;
size_t line_num;
size_t len;
int c;
int exit_status = EXIT_SUCCESS;
read_file = fopen("read.txt", "r");
if (read_file == NULL) {
fprintf(stderr, "Error opening read.txt\n");
return 1;
}
for (line_num = 0;; ++line_num) {
/*
* Used for detecting early EOF
* (e.g. the last line contains only a name).
*/
temp.name[0] = temp.phone[0] = 0;
len = read_token(temp.name, sizeof(temp.name), "\t",
read_file);
if (len == (size_t)-1)
break;
if (len == max) {
fprintf(stderr, "Skipping bad line %zu\n", line_num + 1);
while ((c = getchar()) != EOF && c != '\n')
; /* nothing */
continue;
}
len = read_token(temp.address, sizeof(temp.address), "\t",
read_file);
if (len == (size_t)-1)
break;
if (len == max) {
fprintf(stderr, "Skipping bad line %zu\n", line_num + 1);
while ((c = getchar()) != EOF && c != '\n')
; /* nothing */
continue;
}
len = read_token(temp.phone, sizeof(temp.phone), "\t",
read_file);
if (len == (size_t)-1)
break;
if (len == max) {
fprintf(stderr, "Skipping bad line %zu\n", line_num + 1);
while ((c = getchar()) != EOF && c != '\n')
; /* nothing */
continue;
}
// Do something with the input here. Example:
printf("Entry %zu:\n"
"\tName: %.*s\n"
"\tAddress: %.*s\n"
"\tPhone: %.*s\n\n",
line_num + 1,
(int)sizeof(temp.name), temp.name,
(int)sizeof(temp.address), temp.address,
(int)sizeof(temp.phone), temp.phone);
}
if (ferror(read_file)) {
fprintf(stderr, "error reading from file\n");
exit_status = EXIT_FAILURE;
}
else if (feof(read_file) && temp.phone[0] == 0 && temp.name[0] != 0) {
fprintf(stderr, "Unexpected end of file while reading entry %zu\n",
line_num + 1);
exit_status = EXIT_FAILURE;
}
//else feof(read_file) is still true, but we parsed a full entry/record
fclose(read_file);
return exit_status;
}
Notice how the exact same 8 lines of code appear in the read loop to handle the return value of read_token? Because of that, I think there's probably room for another function to call read_token and handle its return value, allowing main to simply call this "read_token handler", but I think the code above gives you the basic idea about how to work with read_token and how it can apply in your situation. You might change the behavior in some way, if you like, but the read_token function above would suit me rather well when working with delimited input like this (things would be a bit more complex when you add quoted fields into the mix, but not much more complex as far as I can tell). You can decide what happens with max being returned. I opted for it being considered an error, but you might think otherwise. You might even add an extra getchar when n == max and consider max being a successful return value and something like (size_t)-2 being the "token too large" error indicator instead.

Read files separated by tab in c

I am really new to C, and the reading files thing drives me crazy...
I want read a file including name, born place and phone number, etc. All separated by tab
The format might be like this:
Bob Jason Los Angeles 33333333
Alice Wong Washington DC 111-333-222
So I create a struct to record it.
typedef struct Person{
char name[20];
char address[30];
char phone[20];
} Person;
I tried many ways to read this file into struct but it failed.
I tired fread:
read_file = fopen("read.txt", "r");
Person temp;
fread(&temp, sizeof(Person), 100, read_file);
printf("%s %s %s \n", temp.name, temp.address, temp.phone);
But char string does not recorded into temp separated by tab, it read the whole file into temp.name and get weird output.
Then I tried fscanf and sscanf, those all not working for separating tab
fscanf(read_file, "%s %s %s", temp.name, temp.address, temp.phone);
Or
fscanf(read_file, "%s\t%s\t%s", temp.name, temp.address, temp.phone);
This separates the string by space, so I get Bob and Jason separately, while indeed, I need to get "Bob Jason" as one char string. And I did separate these format by tab when I created the text file.
Same for sscanf, I tried different ways many times...
Please help...
I suggest:
Use fgets to read the text line by line.
Use strtok to separate the contents of the line by using tab as the delimiter.
// Use an appropriate number for LINE_SIZE
#define LINE_SIZE 200
char line[LINE_SIZE];
if ( fgets(line, sizeof(line), read_file) == NULL )
{
// Deal with error.
}
Person temp;
char* token = strtok(line, "\t");
if ( token == NULL )
{
// Deal with error.
}
else
{
// Copy token at most the number of characters
// temp.name can hold. Similar logic applies to address
// and phone number.
temp.name[0] = '\0';
strncat(temp.name, token, sizeof(temp.name)-1);
}
token = strtok(NULL, "\t");
if ( token == NULL )
{
// Deal with error.
}
else
{
temp.address[0] = '\0';
strncat(temp.address, token, sizeof(temp.address)-1);
}
token = strtok(NULL, "\n");
if ( token == NULL )
{
// Deal with error.
}
else
{
temp.phone[0] = '\0';
strncat(temp.phone, token, sizeof(temp.phone)-1);
}
Update
Using a helper function, the code can be reduced in size. (Thanks #chux)
// The helper function.
void copyToken(char* destination,
char* source,
size_t maxLen;
char const* delimiter)
{
char* token = strtok(source, delimiter);
if ( token != NULL )
{
destination[0] = '\0';
strncat(destination, token, maxLen-1);
}
}
// Use an appropriate number for LINE_SIZE
#define LINE_SIZE 200
char line[LINE_SIZE];
if ( fgets(line, sizeof(line), read_file) == NULL )
{
// Deal with error.
}
Person temp;
copyToken(temp.name, line, sizeof(temp.name), "\t");
copyToken(temp.address, NULL, sizeof(temp.address), "\t");
copyToken(temp.phone, NULL, sizeof(temp.phone), "\n");
This is only for demonstration, there are better ways to initialize variables, but to illustrate your main question i.e. reading a file delimited by tabs, you can write a function something like this:
Assuming a strict field definition, and your struct definition you can get tokens using strtok().
//for a file with constant field definitions
void GetFileContents(char *file, PERSON *person)
{
char line[260];
FILE *fp;
char *buf=0;
char temp[80];
int i = -1;
fp = fopen(file, "r");
while(fgets(line, 260, fp))
{
i++;
buf = strtok(line, "\t\n");
if(buf) strcpy(person[i].name, buf);
buf = strtok(NULL, "\t\n");
if(buf) strcpy(person[i].address, buf);
buf = strtok(NULL, "\t\n");
if(buf) strcpy(person[i].phone, buf);
//Note: if you have more fields, add more strtok/strcpy sections
//Note: This method will ONLY work for consistent number of fields.
//If variable number of fields, suggest 2 dimensional string array.
}
fclose(fp);
}
Call it in main() like this:
int main(void)
{
//...
PERSON person[NUM_LINES], *pPerson; //NUM_LINES defined elsewhere
//and there are better ways
//this is just for illustration
pPerson = &person[0];//initialize pointer to person
GetFileContents(filename, pPerson); //call function to populate person.
//...
return 0;
}
First thing,
fread(&temp, sizeof(temp), 100, read_file);
will not work because the fields are not fixed width, so it will always read 20 characters for name 30 for address and so on, which is not always the correct thing to do.
You need to read one line at a time, and then parse the line, you can use any method you like to read a like, a simple one is by using fgets() like this
char line[100];
Person persons[100];
int index;
index = 0;
while (fgets(line, sizeof(line), read_file) != NULL)
{
persons[i++] = parseLineAndExtractPerson(line);
}
Now we need a function to parse the line and store the data in you Person struct instance
char *extractToken(const char *const line, char *buffer, size_t bufferLength)
{
char *pointer;
size_t length;
if ((line == NULL) || (buffer == NULL))
return NULL;
pointer = strpbrk(line, "\t");
if (pointer == NULL)
length = strlen(line);
else
length = pointer - line;
if (length >= bufferLength) /* truncate the string if it was too long */
length = bufferLength - 1;
buffer[length] = '\0';
memcpy(buffer, line, length);
return pointer + 1;
}
Person parseLineAndExtractPerson(const char *line)
{
Person person;
person.name[0] = '\0';
person.address[0] = '\0';
person.phone[0] = '\0';
line = extractToken(line, person.name, sizeof(person.name));
line = extractToken(line, person.address, sizeof(person.address));
line = extractToken(line, person.phone, sizeof(person.phone));
return person;
}
Here is a sample implementation of a loop to read at most 100 records
int main(void)
{
char line[100];
Person persons[100];
int index;
FILE *read_file;
read_file = fopen("/path/to/the/file.type", "r");
if (read_file == NULL)
return -1;
index = 0;
while ((index < 100) && (fgets(line, sizeof(line), read_file) != NULL))
{
size_t length;
/* remove the '\n' left by `fgets()'. */
length = strlen(line);
if ((length > 0) && (line[length - 1] == '\n'))
line[length - 1] = '\0';
persons[index++] = parseLineAndExtractPerson(line);
}
fclose(read_file);
while (--index >= 0)
printf("%s: %s, %s\n", persons[index].name, persons[index].address, persons[index].phone);
return 0;
}
Here is a complete program that does what I think you need
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct Person{
char name[20];
char address[30];
char phone[20];
} Person;
char *extractToken(const char *const line, char *buffer, size_t bufferLength)
{
char *pointer;
size_t length;
if ((line == NULL) || (buffer == NULL))
return NULL;
pointer = strpbrk(line, "\t");
if (pointer == NULL)
length = strlen(line);
else
length = pointer - line;
if (length >= bufferLength) /* truncate the string if it was too long */
length = bufferLength - 1;
buffer[length] = '\0';
memcpy(buffer, line, length);
return pointer + 1;
}
Person parseLineAndExtractPerson(const char *line)
{
Person person;
person.name[0] = '\0';
person.address[0] = '\0';
person.phone[0] = '\0';
line = extractToken(line, person.name, sizeof(person.name));
line = extractToken(line, person.address, sizeof(person.address));
line = extractToken(line, person.phone, sizeof(person.phone));
return person;
}
int main(void)
{
char line[100];
Person persons[100];
int index;
FILE *read_file;
read_file = fopen("/home/iharob/data.dat", "r");
if (read_file == NULL)
return -1;
index = 0;
while (fgets(line, sizeof(line), read_file) != NULL)
{
size_t length;
length = strlen(line);
if (line[length - 1] == '\n')
line[length - 1] = '\0';
persons[index++] = parseLineAndExtractPerson(line);
}
fclose(read_file);
while (--index >= 0)
printf("%s: %s, %s\n", persons[index].name, persons[index].address, persons[index].phone);
return 0;
}
Parsing strings returned by fgets can be very annoying, especially when input is truncated. In fact, fgets leaves a lot to be desired. Did you get the correct string or was there more? Is there a newline at the end? For that matter, is the end 20 bytes away or 32768 bytes away? It would be nice if you didn't need to count that many bytes twice -- once with fgets and once with strlen, just to remove a newline that you didn't want.
Things like fscanf don't necessarily work as intended in this situation unless you have C99's "scanset" feature available, and then that will automatically add a null terminator, if you have enough room. The return value of any of the scanf family is your friend in determining whether success or failure occurred.
You can avoid the null terminator by using %NNc, where NN is the width, but if there's a \t in those NN bytes, then you need to separate it and move it to the next field, except that means bytes in the next field must be moved to the field after that one, and the 90th field will need its bytes moved to the 91st field... And hopefully you only need to do that once... Obviously that isn't actually a solution either.
Given those reasons, I feel it's easier just to read until you encounter one of the expected delimiters and let you decide the behavior of the function when the size specified is too small for a null terminator, yet large enough to fill your buffer. Anyway, here's the code. I think it's pretty straightforward:
/*
* Read a token.
*
* tok: The buffer used to store the token.
* max: The maximum number of characters to store in the buffer.
* delims: A string containing the individual delimiter bytes.
* fileptr: The file pointer to read the token from.
*
* Return value:
* - max: The buffer is full. In this case, the string _IS NOT_ null terminated.
* This may or may not be a problem: it's your choice.
* - (size_t)-1: An I/O error occurred before the last delimiter
* (just like with `fgets`, use `feof`).
* - any other value: The length of the token as `strlen` would return.
* In this case, the string _IS_ null terminated.
*/
size_t
read_token(char *restrict tok, size_t max, const char *restrict delims,
FILE *restrict fileptr)
{
int c;
size_t n;
for (n = 0; n < max && (c = getchar()) != EOF &&
strchr(delims, c) == NULL; ++n)
*tok++ = c;
if (c == EOF)
return (size_t)-1;
if (n == max)
return max;
*tok = 0;
return n;
}
Usage is pretty straightforward as well:
#include <stdio.h>
#include <stdlib.h>
typedef struct person {
char name[20];
char address[30];
char phone[20];
} Person;
int
main(void)
{
FILE *read_file;
Person temp;
size_t line_num;
size_t len;
int c;
int exit_status = EXIT_SUCCESS;
read_file = fopen("read.txt", "r");
if (read_file == NULL) {
fprintf(stderr, "Error opening read.txt\n");
return 1;
}
for (line_num = 0;; ++line_num) {
/*
* Used for detecting early EOF
* (e.g. the last line contains only a name).
*/
temp.name[0] = temp.phone[0] = 0;
len = read_token(temp.name, sizeof(temp.name), "\t",
read_file);
if (len == (size_t)-1)
break;
if (len == max) {
fprintf(stderr, "Skipping bad line %zu\n", line_num + 1);
while ((c = getchar()) != EOF && c != '\n')
; /* nothing */
continue;
}
len = read_token(temp.address, sizeof(temp.address), "\t",
read_file);
if (len == (size_t)-1)
break;
if (len == max) {
fprintf(stderr, "Skipping bad line %zu\n", line_num + 1);
while ((c = getchar()) != EOF && c != '\n')
; /* nothing */
continue;
}
len = read_token(temp.phone, sizeof(temp.phone), "\t",
read_file);
if (len == (size_t)-1)
break;
if (len == max) {
fprintf(stderr, "Skipping bad line %zu\n", line_num + 1);
while ((c = getchar()) != EOF && c != '\n')
; /* nothing */
continue;
}
// Do something with the input here. Example:
printf("Entry %zu:\n"
"\tName: %.*s\n"
"\tAddress: %.*s\n"
"\tPhone: %.*s\n\n",
line_num + 1,
(int)sizeof(temp.name), temp.name,
(int)sizeof(temp.address), temp.address,
(int)sizeof(temp.phone), temp.phone);
}
if (ferror(read_file)) {
fprintf(stderr, "error reading from file\n");
exit_status = EXIT_FAILURE;
}
else if (feof(read_file) && temp.phone[0] == 0 && temp.name[0] != 0) {
fprintf(stderr, "Unexpected end of file while reading entry %zu\n",
line_num + 1);
exit_status = EXIT_FAILURE;
}
//else feof(read_file) is still true, but we parsed a full entry/record
fclose(read_file);
return exit_status;
}
Notice how the exact same 8 lines of code appear in the read loop to handle the return value of read_token? Because of that, I think there's probably room for another function to call read_token and handle its return value, allowing main to simply call this "read_token handler", but I think the code above gives you the basic idea about how to work with read_token and how it can apply in your situation. You might change the behavior in some way, if you like, but the read_token function above would suit me rather well when working with delimited input like this (things would be a bit more complex when you add quoted fields into the mix, but not much more complex as far as I can tell). You can decide what happens with max being returned. I opted for it being considered an error, but you might think otherwise. You might even add an extra getchar when n == max and consider max being a successful return value and something like (size_t)-2 being the "token too large" error indicator instead.

Resources