Use scanf to read lines or break on special character - c

Is it possible to read lines of text with scanf() - excluding \n and break on special(chosen) character, but include that character
This is my current expression: while(scanf("%49[^:\n]%*c", x)==1)
but this one excludes :.
Is it possible to break reading on : but read that character too?

Ok I am using Johannes-Schaub-litb's code.
char * getline(char cp) {
char * line = malloc(100), * linep = line;
size_t lenmax = 100, len = lenmax;
int c;
if(line == NULL)
return NULL;
for(;;) {
c = fgetc(stdin);
if(c == EOF)
break;
if(--len == 0) {
len = lenmax;
intptr_t diff = line - linep;
char * linen = realloc(linep, lenmax *= 2);
if(linen == NULL) {
free(linep);
return NULL;
}
line = linen + diff;
linep = linen;
}
if((*line++ = c) == cp)
break;
}
*line = '\0';
return linep;
}
Still I use this code ...and it works fine.
The code will be modified a bit more later.

Is it possible to read lines of text with scanf() - excluding \n and break on special(chosen) character, but include that character(?)
Yes. But scanf() is notorious for being used wrong and difficult to use right. Certainly the scanf() approach will work for most user input. Only a rare implementation will completely meet OP's goal without missing corner cases. IMO, it is not worth it.
Alternatively, let us try the direct approach, repeatedly use fgetc(). Some untested code:
char *luvatar_readline(char *destination, size_t size, char special) {
assert(size > 0);
char *p = destitution;
int ch;
while (((ch = fgetc(stdin)) != EOF) && (ch != '\n')) {
if (size > 1) {
size--;
*p++ = ch;
} else {
// Ignore extra input or
// TBD what should be done if no room left
}
if (ch == (unsigned char) special) {
break;
}
}
*p = '\0';
if (ch == EOF) {
// If rare input error
if (ferror(stdin)) {
return NULL;
}
// If nothing read and end-of-file
if ((p == destination) && feof(stdin)) {
return NULL;
}
}
return destination;
}
Sample usage
char buffer[50];
while (luvatar_readline(buffer, sizeof buffer_t, ':')) {
puts(buffer);
}
Corner cases TBD: Unclear what OP wants if special is '\n' or '\0'.
OP's while(scanf("%49[^:\n]%*c", x)==1) has many problems.
Does not cope with input the begins with : or '\n', leaving x unset.
Does not know if the character after the non-:, non-'\n' input was a :, '\n', EOF.
Does not consume extra input past 49.
Uses a fixed spacial character ':', rather than a general one.

I think that you want to do that:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void) {
char *line = NULL;
size_t len = 0;
ssize_t read;
while ((read = getline(&line, &len, stdin)) != -1) {
if (read > 0 && line[read - 1] == '\n') {
if (read > 1 && line[read - 2] == '\r') {
line[read - 2] = '\0'; // we can remove the carriage return
}
else {
line[read - 1] = '\0'; // we can remove the new line
}
}
char const *delim = ":";
printf("parsing line :\n");
char *token = strtok(line, delim);
while (token != NULL) {
printf("token: %s\n", token);
token = strtok(NULL, delim);
}
}
free(line);
}

I have done this in a little different way. Maybe this can crash on Windows.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main() {
char *input_str;
/**
* Dynamic memory allocation.
* Might crash on windows.
*/
int status = scanf("%m[^.]", &input_str);
/**
* If the first character is the
* terminating character then scanf scans nothing
* and returns 0.
*/
if (status > 0) {
/**
* Calculate the length of the string.
*/
size_t len = strlen(input_str);
/**
* While allocating memory provide
* two extra cell. One for the character
* you want to include.
* One for the NULL character.
*/
char *new_str = (char*) calloc (len + 2, sizeof(char));
/**
* Check for memory allocation.
*/
if(new_str == NULL) {
printf("Memory Allocation failed\n");
exit(1);
}
/**
* Copy the string.
*/
strcpy(new_str, input_str, len);
/**
* get your desired terminating character
* from buffer
*/
new_str[len++] = getc(stdin);
/**
* Append the NULL character
*/
new_str[len++] = '\0';
/**
* eat other characters
* from buffer.
*/
while(getc(stdin) != '\n');
/**
* Free the memory used in
* dynamic memory allocation
* in scanf. Which is a must
* according to the scanf man page.
*/
free(input_str);
} else {
char new_str[2] = ".\0";
/**
* eat other characters
* from buffer.
*/
while(getc(stdin) != '\n');
}
}
I have used dot as a terminating character.

Related

Segmentation Fault when finding longest word in input

I have written a program to find the longest word in the input. I get no errors when using valgrind or running tests locally, but the grading program I email the code to reports a segmentation fault.
int main(void)
{
char *longest = malloc(1);
size_t size = 1;
do {
char word[20];
if (scanf("%s", word) > 0) {
if (strlen(word) > size) {
longest = realloc(longest,strlen(word)+1);
strcpy(longest,word);
size = strlen(word);
}
}
} while (getchar() != EOF);
printf("%zu characters in longest word: %s\n", strlen(longest),longest);
free(longest);
return 0;
}
Your problem is in the line char word[20]; and the way scanf reads words. From scanf's point of view, a word is any sequence of non-spaces. For example, realloc(longest,strlen(word)+1); is treated as one word, and that alone is longer than 20 characters.
You should use a more robust function to read words and allocate space for them. The most cost-efficient solution is getline() for reading the line followed by strsep() for extracting words.
Without relying on the luxurities of POSIX-functions, only standard-C for variable word-length:
#include <assert.h> // assert()
#include <stddef.h> // size_t
#include <stdbool.h> // bool, true, false
#include <stdlib.h> // EXIT_FAILURE, realloc(), free()
#include <stdio.h> // fscanf(), fgetc(), ungetc(), printf(), fputs()
#include <ctype.h> // isspace()
#include <string.h> // strlen(), strcat(), strcpy()
#define WORD_BUFFER_SIZE 20
#define STRING(value) #value
#define STRINGIFY(value) STRING(value)
// reads and discards whitespace, returns false on EOF
bool skip_ws(FILE *stream)
{
int ch;
while ((ch = fgetc(stream)) != EOF && isspace(ch));
if(!isspace(ch) && ch != EOF) // if it was not whitespace and not EOF
ungetc(ch, stream); // pretend we were never here.
return ch != EOF;
}
bool read_word(char **dst, FILE *stream)
{
assert(dst);
char chunk_buffer[WORD_BUFFER_SIZE + 1];
if (!skip_ws(stream)) // if we encounter EOF before any other non-whitespace
return false;
// read chunk by chunk
for (size_t i = 0; fscanf(stream, "%" STRINGIFY(WORD_BUFFER_SIZE) "s", chunk_buffer) == 1; ++i)
{
size_t chunk_length = strlen(chunk_buffer);
// adjust *dst's size
char *tmp = realloc(*dst, (i * WORD_BUFFER_SIZE + chunk_length + 1) * sizeof(*tmp));
if (!tmp) {
free(*dst);
*dst = NULL;
return false;
}
*dst = tmp;
if (i == 0) // zero terminate it if it is the first junk
**dst = '\0'; // for strcat() to behave well.
strcat(*dst, chunk_buffer); // add the current chunk to *dst.
int next_char = fgetc(stream);
ungetc(next_char, stream);
if (chunk_length < WORD_BUFFER_SIZE || isspace(next_char) || next_char == EOF)
return true;
}
return true;
}
int main(void)
{
char *word = NULL;
char *longest_word = NULL;
size_t longest_length = 0;
while (read_word(&word, stdin)) {
size_t length = strlen(word);
if (length > longest_length) {
char *tmp = realloc(longest_word, (length + 1) * sizeof *tmp);
if (!tmp) {
fputs("Not enough memory. :(\n\n", stderr);
free(longest_word);
free(word);
return EXIT_FAILURE;
}
longest_length = length;
longest_word = tmp;
strcpy(longest_word, word);
}
}
free(word);
printf("%zu characters in the longest word: \"%s\"\n\n", longest_length, longest_word);
free(longest_word);
}

C printf changing contents of variable inside function

So I am writing a little function to parse paths, it looks like this:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int parse_path() {
char *pathname = "this/is/a/path/hello";
int char_index = 0;
char current_char = pathname[char_index];
char *buffer = malloc(2 * sizeof(char));
char *current_char_str = malloc(2 * sizeof(char));
while (current_char != '\0' && (int)current_char != 11) {
if (char_index == 0 && current_char == '/') {
char_index++; current_char = pathname[char_index];
continue;
}
while (current_char != '/' && current_char != '\0') {
current_char_str[0] = current_char;
current_char_str[1] = '\0';
buffer = (char *)realloc(buffer, (strlen(buffer) + 2) * sizeof(char));
strcat(buffer, current_char_str);
char_index++; current_char = pathname[char_index];
}
if (strlen(buffer)) {
printf("buffer(%s)\n", buffer);
current_char_str[0] = '\0';
buffer[0] = '\0';
}
char_index++; current_char = pathname[char_index];
}
};
int main(int argc, char *argv[]) {
parse_path();
printf("hello\n");
return 0;
}
Now, there is undefined behavior in my code, it looks like the printf call inside the main method is changing the buffer variable... as you can see, the output of this program is:
buffer(this)
buffer(is)
buffer(a)
buffer(path)
buffer(hello)
buffer(buffer(%s)
)
buffer(hello)
hello
I have looked at other posts where the same sort of problem is mentioned and people have told me to use a static char array etc. but that does not seem to help.
Any suggestions?
For some reason, at one time in this program the "hello" string from printf is present in my buffer variable.
Sebastian, if you are still having problems after #PaulOgilvie answer, then it is most likely due to not understanding his answer. Your problem is due to buffer being allocated but not initialized. When you call malloc, it allocates a block of at least the size requested, and returns a pointer to the beginning address for the new block -- but does nothing with the contents of the new block -- meaning the block is full random values that just happened to be in the range of addresses for the new block.
So when you call strcat(buffer, current_char_str); the first time and there is nothing but random garbage in buffer and no nul-terminating character -- you do invoke Undefined Behavior. (there is no end-of-string in buffer to be found)
To fix the error, you simply need to make buffer an empty-string after it is allocated by setting the first character to the nul-terminating character, or use calloc instead to allocate the block which will ensure all bytes are set to zero.
For example:
int parse_path (const char *pathname)
{
int char_index = 0, ccs_index = 0;
char current_char = pathname[char_index];
char *buffer = NULL;
char *current_char_str = NULL;
if (!(buffer = malloc (2))) {
perror ("malloc-buffer");
return 0;
}
*buffer = 0; /* make buffer empty-string, or use calloc */
...
Also do not hardcode paths or numbers (that includes the 0 and 2, but we will let those slide for now). Hardcoding "this/is/a/path/hello" within parse_path() make is a rather un-useful function. Instead, make your pathname variable your parameter so I can take any path you want to send to it...
While the whole idea of realloc'ing 2-characters at a time is rather inefficient, you always need to realloc with a temporary pointer rather than the pointer itself. Why? realloc can and does fail and when it does, it returns NULL. If you are using the pointer itself, you will overwrite your current pointer address with NULL in the event of failure, losing the address to your existing block of memory forever creating a memory leak. Instead,
void *tmp = realloc (buffer, strlen(buffer) + 2);
if (!tmp) {
perror ("realloc-tmp");
goto alldone; /* use goto to break nested loops */
}
...
}
alldone:;
/* return something meaningful, your function is type 'int' */
}
A short example incorporating the fixes and temporary pointer would be:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int parse_path (const char *pathname)
{
int char_index = 0, ccs_index = 0;
char current_char = pathname[char_index];
char *buffer = NULL;
char *current_char_str = NULL;
if (!(buffer = malloc (2))) {
perror ("malloc-buffer");
return 0;
}
*buffer = 0; /* make buffer empty-string, or use calloc */
if (!(current_char_str = malloc (2))) {
perror ("malloc-current_char_str");
return 0;
}
while (current_char != '\0' && (int) current_char != 11) {
if (char_index == 0 && current_char == '/') {
char_index++;
current_char = pathname[char_index];
continue;
}
while (current_char != '/' && current_char != '\0') {
current_char_str[0] = current_char;
current_char_str[1] = '\0';
void *tmp = realloc (buffer, strlen(buffer) + 2);
if (!tmp) {
perror ("realloc-tmp");
goto alldone;
}
strcat(buffer, current_char_str);
char_index++;
current_char = pathname[char_index];
}
if (strlen(buffer)) {
printf("buffer(%s)\n", buffer);
current_char_str[0] = '\0';
buffer[0] = '\0';
}
if (current_char != '\0') {
char_index++;
current_char = pathname[char_index];
}
}
alldone:;
return ccs_index;
}
int main(int argc, char* argv[]) {
parse_path ("this/is/a/path/hello");
printf ("hello\n");
return 0;
}
(note: your logic is quite tortured above and you could just use a fixed buffer of PATH_MAX size (include limits.h) and dispense with allocating. Otherwise, you should allocate some anticipated number of characters for buffer to begin with, like strlen (pathname) which would ensure sufficient space for each path component without reallocating. I'd rather over-allocate by 1000-characters than screw up indexing worrying about reallocating 2-characters at a time...)
Example Use/Output
> bin\parsepath.exe
buffer(this)
buffer(is)
buffer(a)
buffer(path)
buffer(hello)
hello
A More Straight-Forward Approach Without Allocation
Simply using a buffer of PATH_MAX size or an allocated buffer of at least strlen (pathname) size will allow you to simply step down your string without any reallocations, e.g.
#include <stdio.h>
#include <limits.h> /* for PATH_MAX - but VS doesn't provide it, so we check */
#ifndef PATH_MAX
#define PATH_MAX 2048
#endif
void parse_path (const char *pathname)
{
const char *p = pathname;
char buffer[PATH_MAX], *b = buffer;
while (*p) {
if (*p == '/') {
if (p != pathname) {
*b = 0;
printf ("buffer (%s)\n", buffer);
b = buffer;
}
}
else
*b++ = *p;
p++;
}
if (b != buffer) {
*b = 0;
printf ("buffer (%s)\n", buffer);
}
}
int main (int argc, char* argv[]) {
char *path = argc > 1 ? argv[1] : "this/is/a/path/hello";
parse_path (path);
printf ("hello\n");
return 0;
}
Example Use/Output
> parsepath2.exe
buffer (this)
buffer (is)
buffer (a)
buffer (path)
buffer (hello)
hello
Or
> parsepath2.exe another/path/that/ends/in/a/filename
buffer (another)
buffer (path)
buffer (that)
buffer (ends)
buffer (in)
buffer (a)
buffer (filename)
hello
Now you can pass any path you would like to parse as an argument to your program and it will be parsed without having to change or recompile anything. Look things over and let me know if you have questions.
You strcat something to buffer but buffer has never been initialized. strcat will first search for the first null character and then copy the string to concatenate there. You are now probably overwriting memory that is not yours.
Before the outer while loop do:
*buffer= '\0';
There are 2 main problems in your code:
the arrays allocated by malloc() are not initialized, so you have undefined behavior when you call strlen(buffer) before setting a null terminator inside the array buffer points to. The program could just crash, but in your case whatever contents is present in the memory block and after it is retained up to the first null byte.
just before the end of the outer loop, you should only take the next character from the path if the current character is a '/'. In your case, you skip the null terminator and the program has undefined behavior as you read beyond the end of the string constant. Indeed, the parse continues through another string constant "buffer(%s)\n" and through yet another one "hello". The string constants seem to be adjacent without padding on your system, which is just a coincidence.
Here is a corrected version:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
void parse_path(const char *pathname) {
int char_index = 0;
char current_char = pathname[char_index];
char *buffer = calloc(1, 1);
char *current_char_str = calloc(1, 1);
while (current_char != '\0' && current_char != 11) {
if (char_index == 0 && current_char == '/') {
char_index++; current_char = pathname[char_index];
continue;
}
while (current_char != '/' && current_char != '\0') {
current_char_str[0] = current_char;
current_char_str[1] = '\0';
buffer = (char *)realloc(buffer, strlen(buffer) + 2);
strcat(buffer, current_char_str);
char_index++; current_char = pathname[char_index];
}
if (strlen(buffer)) {
printf("buffer(%s)\n", buffer);
current_char_str[0] = '\0';
buffer[0] = '\0';
}
if (current_char == '/') {
char_index++; current_char = pathname[char_index];
}
}
}
int main(int argc, char *argv[]) {
parse_path("this/is/a/path/hello");
printf("hello\n");
return 0;
}
Output:
buffer(this)
buffer(is)
buffer(a)
buffer(path)
buffer(hello)
hello
Note however some remaining problems:
allocation failure is not tested, resulting in undefined behavior,
allocated blocks are not freed, resulting in memory leaks,
it is unclear why you test current_char != 11: did you mean to stop at TAB or newline?
Here is a much simpler version with the same behavior:
#include <stdio.h>
#include <string.h>
void parse_path(const char *pathname) {
int i, n;
for (i = 0; pathname[i] != '\0'; i += n) {
if (pathname[i] == '/') {
n = 1; /* skip path separators and empty items */
} else {
n = strcspn(pathname + i, "/"); /* get the length of the path item */
printf("buffer(%.*s)\n", n, pathname + i);
}
}
}
int main(int argc, char *argv[]) {
parse_path("this/is/a/path/hello");
printf("hello\n");
return 0;
}

how can I append a char to a string allocating memory dynamically in C?

I wrote this code, but inserts garbage in the start of string:
void append(char *s, char c) {
int len = strlen(s);
s[len] = c;
s[len + 1] = '\0';
}
int main(void) {
char c, *s;
int i = 0;
s = malloc(sizeof(char));
while ((c = getchar()) != '\n') {
i++;
s = realloc(s, i * sizeof(char));
append(s, c);
}
printf("\n%s",s);
}
How can I do it?
There are multiple problems in your code:
you iterate until you read a newline ('\n') from the standard input stream. This will cause an endless loop if the end of file occurs before you read a newline, which would happen if you redirect standard input from an empty file.
c should be defined as int so you can test for EOF properly.
s should be null terminated at all times, you must set the first byte to '\0' after malloc() as this function does not initialize the memory it allocates.
i should be initialized to 1 so the first realloc() extends the array by 1 etc. As coded, your array is one byte too short to accommodate the extra character.
you should check for memory allocation failure.
for good style, you should free the allocated memory before exiting the program
main() should return an int, preferably 0 for success.
Here is a corrected version:
#include <stdio.h>
#include <stdlib.h>
/* append a character to a string, assuming s points to an array with enough space */
void append(char *s, char c) {
size_t len = strlen(s);
s[len] = c;
s[len + 1] = '\0';
}
int main(void) {
int c;
char *s;
size_t i = 1;
s = malloc(i * sizeof(char));
if (s == NULL) {
printf("memory allocation failure\n");
return 1;
}
*s = '\0';
while ((c = getchar()) != EOF && c != '\n') {
i++;
s = realloc(s, i * sizeof(char));
if (s == NULL) {
printf("memory allocation failure\n");
return 1;
}
append(s, c);
}
printf("%s\n", s);
free(s);
return 0;
}
when you call strlen it searches for a '\0' char to end the string. You don't have this char inside your string to the behavior of strlen is unpredictable.
Your append function is acually good.
Also, a minor thing, you need to add return 0; to your main function. And i should start from 1 instead if 0.
Here is how it should look:
int main(void){
char *s;
size_t i = 1;
s = malloc (i * sizeof(char));//Just for fun. The i is not needed.
if(s == NULL) {
fprintf(stderr, "Coul'd not allocate enough memory");
return 1;
}
s[0] = '\0';
for(char c = getchar(); c != '\n' && c != EOF; c = getchar()) {//it is not needed in this case to store the result as an int.
i++;
s = realloc (s,i * sizeof(char) );
if(s == NULL) {
fprintf(stderr, "Coul'd not allocate enough memory");
return 1;
}
append (s,c);
}
printf("%s\n",s);
return 0;
}
Thanks for the comments that helped me improve the code (and for my english). I am not perfect :)
The inner realloc needs to allocate one element more (for the trailing \0) and you have to initialize s[0] = '\0' before starting the loop.
Btw, you can replace your append by strcat() or write it like
size_t i = 0;
s = malloc(1);
/* TODO: check for s != NULL */
while ((c = getchar()) != '\n') {
s[i] = c;
i++;
s = realloc(s, i + 1);
/* TODO: check for s != NULL */
}
s[i] = '\0';

c read block of lines and store them [duplicate]

I am really new to C, and the reading files thing drives me crazy...
I want read a file including name, born place and phone number, etc. All separated by tab
The format might be like this:
Bob Jason Los Angeles 33333333
Alice Wong Washington DC 111-333-222
So I create a struct to record it.
typedef struct Person{
char name[20];
char address[30];
char phone[20];
} Person;
I tried many ways to read this file into struct but it failed.
I tired fread:
read_file = fopen("read.txt", "r");
Person temp;
fread(&temp, sizeof(Person), 100, read_file);
printf("%s %s %s \n", temp.name, temp.address, temp.phone);
But char string does not recorded into temp separated by tab, it read the whole file into temp.name and get weird output.
Then I tried fscanf and sscanf, those all not working for separating tab
fscanf(read_file, "%s %s %s", temp.name, temp.address, temp.phone);
Or
fscanf(read_file, "%s\t%s\t%s", temp.name, temp.address, temp.phone);
This separates the string by space, so I get Bob and Jason separately, while indeed, I need to get "Bob Jason" as one char string. And I did separate these format by tab when I created the text file.
Same for sscanf, I tried different ways many times...
Please help...
I suggest:
Use fgets to read the text line by line.
Use strtok to separate the contents of the line by using tab as the delimiter.
// Use an appropriate number for LINE_SIZE
#define LINE_SIZE 200
char line[LINE_SIZE];
if ( fgets(line, sizeof(line), read_file) == NULL )
{
// Deal with error.
}
Person temp;
char* token = strtok(line, "\t");
if ( token == NULL )
{
// Deal with error.
}
else
{
// Copy token at most the number of characters
// temp.name can hold. Similar logic applies to address
// and phone number.
temp.name[0] = '\0';
strncat(temp.name, token, sizeof(temp.name)-1);
}
token = strtok(NULL, "\t");
if ( token == NULL )
{
// Deal with error.
}
else
{
temp.address[0] = '\0';
strncat(temp.address, token, sizeof(temp.address)-1);
}
token = strtok(NULL, "\n");
if ( token == NULL )
{
// Deal with error.
}
else
{
temp.phone[0] = '\0';
strncat(temp.phone, token, sizeof(temp.phone)-1);
}
Update
Using a helper function, the code can be reduced in size. (Thanks #chux)
// The helper function.
void copyToken(char* destination,
char* source,
size_t maxLen;
char const* delimiter)
{
char* token = strtok(source, delimiter);
if ( token != NULL )
{
destination[0] = '\0';
strncat(destination, token, maxLen-1);
}
}
// Use an appropriate number for LINE_SIZE
#define LINE_SIZE 200
char line[LINE_SIZE];
if ( fgets(line, sizeof(line), read_file) == NULL )
{
// Deal with error.
}
Person temp;
copyToken(temp.name, line, sizeof(temp.name), "\t");
copyToken(temp.address, NULL, sizeof(temp.address), "\t");
copyToken(temp.phone, NULL, sizeof(temp.phone), "\n");
This is only for demonstration, there are better ways to initialize variables, but to illustrate your main question i.e. reading a file delimited by tabs, you can write a function something like this:
Assuming a strict field definition, and your struct definition you can get tokens using strtok().
//for a file with constant field definitions
void GetFileContents(char *file, PERSON *person)
{
char line[260];
FILE *fp;
char *buf=0;
char temp[80];
int i = -1;
fp = fopen(file, "r");
while(fgets(line, 260, fp))
{
i++;
buf = strtok(line, "\t\n");
if(buf) strcpy(person[i].name, buf);
buf = strtok(NULL, "\t\n");
if(buf) strcpy(person[i].address, buf);
buf = strtok(NULL, "\t\n");
if(buf) strcpy(person[i].phone, buf);
//Note: if you have more fields, add more strtok/strcpy sections
//Note: This method will ONLY work for consistent number of fields.
//If variable number of fields, suggest 2 dimensional string array.
}
fclose(fp);
}
Call it in main() like this:
int main(void)
{
//...
PERSON person[NUM_LINES], *pPerson; //NUM_LINES defined elsewhere
//and there are better ways
//this is just for illustration
pPerson = &person[0];//initialize pointer to person
GetFileContents(filename, pPerson); //call function to populate person.
//...
return 0;
}
First thing,
fread(&temp, sizeof(temp), 100, read_file);
will not work because the fields are not fixed width, so it will always read 20 characters for name 30 for address and so on, which is not always the correct thing to do.
You need to read one line at a time, and then parse the line, you can use any method you like to read a like, a simple one is by using fgets() like this
char line[100];
Person persons[100];
int index;
index = 0;
while (fgets(line, sizeof(line), read_file) != NULL)
{
persons[i++] = parseLineAndExtractPerson(line);
}
Now we need a function to parse the line and store the data in you Person struct instance
char *extractToken(const char *const line, char *buffer, size_t bufferLength)
{
char *pointer;
size_t length;
if ((line == NULL) || (buffer == NULL))
return NULL;
pointer = strpbrk(line, "\t");
if (pointer == NULL)
length = strlen(line);
else
length = pointer - line;
if (length >= bufferLength) /* truncate the string if it was too long */
length = bufferLength - 1;
buffer[length] = '\0';
memcpy(buffer, line, length);
return pointer + 1;
}
Person parseLineAndExtractPerson(const char *line)
{
Person person;
person.name[0] = '\0';
person.address[0] = '\0';
person.phone[0] = '\0';
line = extractToken(line, person.name, sizeof(person.name));
line = extractToken(line, person.address, sizeof(person.address));
line = extractToken(line, person.phone, sizeof(person.phone));
return person;
}
Here is a sample implementation of a loop to read at most 100 records
int main(void)
{
char line[100];
Person persons[100];
int index;
FILE *read_file;
read_file = fopen("/path/to/the/file.type", "r");
if (read_file == NULL)
return -1;
index = 0;
while ((index < 100) && (fgets(line, sizeof(line), read_file) != NULL))
{
size_t length;
/* remove the '\n' left by `fgets()'. */
length = strlen(line);
if ((length > 0) && (line[length - 1] == '\n'))
line[length - 1] = '\0';
persons[index++] = parseLineAndExtractPerson(line);
}
fclose(read_file);
while (--index >= 0)
printf("%s: %s, %s\n", persons[index].name, persons[index].address, persons[index].phone);
return 0;
}
Here is a complete program that does what I think you need
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct Person{
char name[20];
char address[30];
char phone[20];
} Person;
char *extractToken(const char *const line, char *buffer, size_t bufferLength)
{
char *pointer;
size_t length;
if ((line == NULL) || (buffer == NULL))
return NULL;
pointer = strpbrk(line, "\t");
if (pointer == NULL)
length = strlen(line);
else
length = pointer - line;
if (length >= bufferLength) /* truncate the string if it was too long */
length = bufferLength - 1;
buffer[length] = '\0';
memcpy(buffer, line, length);
return pointer + 1;
}
Person parseLineAndExtractPerson(const char *line)
{
Person person;
person.name[0] = '\0';
person.address[0] = '\0';
person.phone[0] = '\0';
line = extractToken(line, person.name, sizeof(person.name));
line = extractToken(line, person.address, sizeof(person.address));
line = extractToken(line, person.phone, sizeof(person.phone));
return person;
}
int main(void)
{
char line[100];
Person persons[100];
int index;
FILE *read_file;
read_file = fopen("/home/iharob/data.dat", "r");
if (read_file == NULL)
return -1;
index = 0;
while (fgets(line, sizeof(line), read_file) != NULL)
{
size_t length;
length = strlen(line);
if (line[length - 1] == '\n')
line[length - 1] = '\0';
persons[index++] = parseLineAndExtractPerson(line);
}
fclose(read_file);
while (--index >= 0)
printf("%s: %s, %s\n", persons[index].name, persons[index].address, persons[index].phone);
return 0;
}
Parsing strings returned by fgets can be very annoying, especially when input is truncated. In fact, fgets leaves a lot to be desired. Did you get the correct string or was there more? Is there a newline at the end? For that matter, is the end 20 bytes away or 32768 bytes away? It would be nice if you didn't need to count that many bytes twice -- once with fgets and once with strlen, just to remove a newline that you didn't want.
Things like fscanf don't necessarily work as intended in this situation unless you have C99's "scanset" feature available, and then that will automatically add a null terminator, if you have enough room. The return value of any of the scanf family is your friend in determining whether success or failure occurred.
You can avoid the null terminator by using %NNc, where NN is the width, but if there's a \t in those NN bytes, then you need to separate it and move it to the next field, except that means bytes in the next field must be moved to the field after that one, and the 90th field will need its bytes moved to the 91st field... And hopefully you only need to do that once... Obviously that isn't actually a solution either.
Given those reasons, I feel it's easier just to read until you encounter one of the expected delimiters and let you decide the behavior of the function when the size specified is too small for a null terminator, yet large enough to fill your buffer. Anyway, here's the code. I think it's pretty straightforward:
/*
* Read a token.
*
* tok: The buffer used to store the token.
* max: The maximum number of characters to store in the buffer.
* delims: A string containing the individual delimiter bytes.
* fileptr: The file pointer to read the token from.
*
* Return value:
* - max: The buffer is full. In this case, the string _IS NOT_ null terminated.
* This may or may not be a problem: it's your choice.
* - (size_t)-1: An I/O error occurred before the last delimiter
* (just like with `fgets`, use `feof`).
* - any other value: The length of the token as `strlen` would return.
* In this case, the string _IS_ null terminated.
*/
size_t
read_token(char *restrict tok, size_t max, const char *restrict delims,
FILE *restrict fileptr)
{
int c;
size_t n;
for (n = 0; n < max && (c = getchar()) != EOF &&
strchr(delims, c) == NULL; ++n)
*tok++ = c;
if (c == EOF)
return (size_t)-1;
if (n == max)
return max;
*tok = 0;
return n;
}
Usage is pretty straightforward as well:
#include <stdio.h>
#include <stdlib.h>
typedef struct person {
char name[20];
char address[30];
char phone[20];
} Person;
int
main(void)
{
FILE *read_file;
Person temp;
size_t line_num;
size_t len;
int c;
int exit_status = EXIT_SUCCESS;
read_file = fopen("read.txt", "r");
if (read_file == NULL) {
fprintf(stderr, "Error opening read.txt\n");
return 1;
}
for (line_num = 0;; ++line_num) {
/*
* Used for detecting early EOF
* (e.g. the last line contains only a name).
*/
temp.name[0] = temp.phone[0] = 0;
len = read_token(temp.name, sizeof(temp.name), "\t",
read_file);
if (len == (size_t)-1)
break;
if (len == max) {
fprintf(stderr, "Skipping bad line %zu\n", line_num + 1);
while ((c = getchar()) != EOF && c != '\n')
; /* nothing */
continue;
}
len = read_token(temp.address, sizeof(temp.address), "\t",
read_file);
if (len == (size_t)-1)
break;
if (len == max) {
fprintf(stderr, "Skipping bad line %zu\n", line_num + 1);
while ((c = getchar()) != EOF && c != '\n')
; /* nothing */
continue;
}
len = read_token(temp.phone, sizeof(temp.phone), "\t",
read_file);
if (len == (size_t)-1)
break;
if (len == max) {
fprintf(stderr, "Skipping bad line %zu\n", line_num + 1);
while ((c = getchar()) != EOF && c != '\n')
; /* nothing */
continue;
}
// Do something with the input here. Example:
printf("Entry %zu:\n"
"\tName: %.*s\n"
"\tAddress: %.*s\n"
"\tPhone: %.*s\n\n",
line_num + 1,
(int)sizeof(temp.name), temp.name,
(int)sizeof(temp.address), temp.address,
(int)sizeof(temp.phone), temp.phone);
}
if (ferror(read_file)) {
fprintf(stderr, "error reading from file\n");
exit_status = EXIT_FAILURE;
}
else if (feof(read_file) && temp.phone[0] == 0 && temp.name[0] != 0) {
fprintf(stderr, "Unexpected end of file while reading entry %zu\n",
line_num + 1);
exit_status = EXIT_FAILURE;
}
//else feof(read_file) is still true, but we parsed a full entry/record
fclose(read_file);
return exit_status;
}
Notice how the exact same 8 lines of code appear in the read loop to handle the return value of read_token? Because of that, I think there's probably room for another function to call read_token and handle its return value, allowing main to simply call this "read_token handler", but I think the code above gives you the basic idea about how to work with read_token and how it can apply in your situation. You might change the behavior in some way, if you like, but the read_token function above would suit me rather well when working with delimited input like this (things would be a bit more complex when you add quoted fields into the mix, but not much more complex as far as I can tell). You can decide what happens with max being returned. I opted for it being considered an error, but you might think otherwise. You might even add an extra getchar when n == max and consider max being a successful return value and something like (size_t)-2 being the "token too large" error indicator instead.

Read files separated by tab in c

I am really new to C, and the reading files thing drives me crazy...
I want read a file including name, born place and phone number, etc. All separated by tab
The format might be like this:
Bob Jason Los Angeles 33333333
Alice Wong Washington DC 111-333-222
So I create a struct to record it.
typedef struct Person{
char name[20];
char address[30];
char phone[20];
} Person;
I tried many ways to read this file into struct but it failed.
I tired fread:
read_file = fopen("read.txt", "r");
Person temp;
fread(&temp, sizeof(Person), 100, read_file);
printf("%s %s %s \n", temp.name, temp.address, temp.phone);
But char string does not recorded into temp separated by tab, it read the whole file into temp.name and get weird output.
Then I tried fscanf and sscanf, those all not working for separating tab
fscanf(read_file, "%s %s %s", temp.name, temp.address, temp.phone);
Or
fscanf(read_file, "%s\t%s\t%s", temp.name, temp.address, temp.phone);
This separates the string by space, so I get Bob and Jason separately, while indeed, I need to get "Bob Jason" as one char string. And I did separate these format by tab when I created the text file.
Same for sscanf, I tried different ways many times...
Please help...
I suggest:
Use fgets to read the text line by line.
Use strtok to separate the contents of the line by using tab as the delimiter.
// Use an appropriate number for LINE_SIZE
#define LINE_SIZE 200
char line[LINE_SIZE];
if ( fgets(line, sizeof(line), read_file) == NULL )
{
// Deal with error.
}
Person temp;
char* token = strtok(line, "\t");
if ( token == NULL )
{
// Deal with error.
}
else
{
// Copy token at most the number of characters
// temp.name can hold. Similar logic applies to address
// and phone number.
temp.name[0] = '\0';
strncat(temp.name, token, sizeof(temp.name)-1);
}
token = strtok(NULL, "\t");
if ( token == NULL )
{
// Deal with error.
}
else
{
temp.address[0] = '\0';
strncat(temp.address, token, sizeof(temp.address)-1);
}
token = strtok(NULL, "\n");
if ( token == NULL )
{
// Deal with error.
}
else
{
temp.phone[0] = '\0';
strncat(temp.phone, token, sizeof(temp.phone)-1);
}
Update
Using a helper function, the code can be reduced in size. (Thanks #chux)
// The helper function.
void copyToken(char* destination,
char* source,
size_t maxLen;
char const* delimiter)
{
char* token = strtok(source, delimiter);
if ( token != NULL )
{
destination[0] = '\0';
strncat(destination, token, maxLen-1);
}
}
// Use an appropriate number for LINE_SIZE
#define LINE_SIZE 200
char line[LINE_SIZE];
if ( fgets(line, sizeof(line), read_file) == NULL )
{
// Deal with error.
}
Person temp;
copyToken(temp.name, line, sizeof(temp.name), "\t");
copyToken(temp.address, NULL, sizeof(temp.address), "\t");
copyToken(temp.phone, NULL, sizeof(temp.phone), "\n");
This is only for demonstration, there are better ways to initialize variables, but to illustrate your main question i.e. reading a file delimited by tabs, you can write a function something like this:
Assuming a strict field definition, and your struct definition you can get tokens using strtok().
//for a file with constant field definitions
void GetFileContents(char *file, PERSON *person)
{
char line[260];
FILE *fp;
char *buf=0;
char temp[80];
int i = -1;
fp = fopen(file, "r");
while(fgets(line, 260, fp))
{
i++;
buf = strtok(line, "\t\n");
if(buf) strcpy(person[i].name, buf);
buf = strtok(NULL, "\t\n");
if(buf) strcpy(person[i].address, buf);
buf = strtok(NULL, "\t\n");
if(buf) strcpy(person[i].phone, buf);
//Note: if you have more fields, add more strtok/strcpy sections
//Note: This method will ONLY work for consistent number of fields.
//If variable number of fields, suggest 2 dimensional string array.
}
fclose(fp);
}
Call it in main() like this:
int main(void)
{
//...
PERSON person[NUM_LINES], *pPerson; //NUM_LINES defined elsewhere
//and there are better ways
//this is just for illustration
pPerson = &person[0];//initialize pointer to person
GetFileContents(filename, pPerson); //call function to populate person.
//...
return 0;
}
First thing,
fread(&temp, sizeof(temp), 100, read_file);
will not work because the fields are not fixed width, so it will always read 20 characters for name 30 for address and so on, which is not always the correct thing to do.
You need to read one line at a time, and then parse the line, you can use any method you like to read a like, a simple one is by using fgets() like this
char line[100];
Person persons[100];
int index;
index = 0;
while (fgets(line, sizeof(line), read_file) != NULL)
{
persons[i++] = parseLineAndExtractPerson(line);
}
Now we need a function to parse the line and store the data in you Person struct instance
char *extractToken(const char *const line, char *buffer, size_t bufferLength)
{
char *pointer;
size_t length;
if ((line == NULL) || (buffer == NULL))
return NULL;
pointer = strpbrk(line, "\t");
if (pointer == NULL)
length = strlen(line);
else
length = pointer - line;
if (length >= bufferLength) /* truncate the string if it was too long */
length = bufferLength - 1;
buffer[length] = '\0';
memcpy(buffer, line, length);
return pointer + 1;
}
Person parseLineAndExtractPerson(const char *line)
{
Person person;
person.name[0] = '\0';
person.address[0] = '\0';
person.phone[0] = '\0';
line = extractToken(line, person.name, sizeof(person.name));
line = extractToken(line, person.address, sizeof(person.address));
line = extractToken(line, person.phone, sizeof(person.phone));
return person;
}
Here is a sample implementation of a loop to read at most 100 records
int main(void)
{
char line[100];
Person persons[100];
int index;
FILE *read_file;
read_file = fopen("/path/to/the/file.type", "r");
if (read_file == NULL)
return -1;
index = 0;
while ((index < 100) && (fgets(line, sizeof(line), read_file) != NULL))
{
size_t length;
/* remove the '\n' left by `fgets()'. */
length = strlen(line);
if ((length > 0) && (line[length - 1] == '\n'))
line[length - 1] = '\0';
persons[index++] = parseLineAndExtractPerson(line);
}
fclose(read_file);
while (--index >= 0)
printf("%s: %s, %s\n", persons[index].name, persons[index].address, persons[index].phone);
return 0;
}
Here is a complete program that does what I think you need
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct Person{
char name[20];
char address[30];
char phone[20];
} Person;
char *extractToken(const char *const line, char *buffer, size_t bufferLength)
{
char *pointer;
size_t length;
if ((line == NULL) || (buffer == NULL))
return NULL;
pointer = strpbrk(line, "\t");
if (pointer == NULL)
length = strlen(line);
else
length = pointer - line;
if (length >= bufferLength) /* truncate the string if it was too long */
length = bufferLength - 1;
buffer[length] = '\0';
memcpy(buffer, line, length);
return pointer + 1;
}
Person parseLineAndExtractPerson(const char *line)
{
Person person;
person.name[0] = '\0';
person.address[0] = '\0';
person.phone[0] = '\0';
line = extractToken(line, person.name, sizeof(person.name));
line = extractToken(line, person.address, sizeof(person.address));
line = extractToken(line, person.phone, sizeof(person.phone));
return person;
}
int main(void)
{
char line[100];
Person persons[100];
int index;
FILE *read_file;
read_file = fopen("/home/iharob/data.dat", "r");
if (read_file == NULL)
return -1;
index = 0;
while (fgets(line, sizeof(line), read_file) != NULL)
{
size_t length;
length = strlen(line);
if (line[length - 1] == '\n')
line[length - 1] = '\0';
persons[index++] = parseLineAndExtractPerson(line);
}
fclose(read_file);
while (--index >= 0)
printf("%s: %s, %s\n", persons[index].name, persons[index].address, persons[index].phone);
return 0;
}
Parsing strings returned by fgets can be very annoying, especially when input is truncated. In fact, fgets leaves a lot to be desired. Did you get the correct string or was there more? Is there a newline at the end? For that matter, is the end 20 bytes away or 32768 bytes away? It would be nice if you didn't need to count that many bytes twice -- once with fgets and once with strlen, just to remove a newline that you didn't want.
Things like fscanf don't necessarily work as intended in this situation unless you have C99's "scanset" feature available, and then that will automatically add a null terminator, if you have enough room. The return value of any of the scanf family is your friend in determining whether success or failure occurred.
You can avoid the null terminator by using %NNc, where NN is the width, but if there's a \t in those NN bytes, then you need to separate it and move it to the next field, except that means bytes in the next field must be moved to the field after that one, and the 90th field will need its bytes moved to the 91st field... And hopefully you only need to do that once... Obviously that isn't actually a solution either.
Given those reasons, I feel it's easier just to read until you encounter one of the expected delimiters and let you decide the behavior of the function when the size specified is too small for a null terminator, yet large enough to fill your buffer. Anyway, here's the code. I think it's pretty straightforward:
/*
* Read a token.
*
* tok: The buffer used to store the token.
* max: The maximum number of characters to store in the buffer.
* delims: A string containing the individual delimiter bytes.
* fileptr: The file pointer to read the token from.
*
* Return value:
* - max: The buffer is full. In this case, the string _IS NOT_ null terminated.
* This may or may not be a problem: it's your choice.
* - (size_t)-1: An I/O error occurred before the last delimiter
* (just like with `fgets`, use `feof`).
* - any other value: The length of the token as `strlen` would return.
* In this case, the string _IS_ null terminated.
*/
size_t
read_token(char *restrict tok, size_t max, const char *restrict delims,
FILE *restrict fileptr)
{
int c;
size_t n;
for (n = 0; n < max && (c = getchar()) != EOF &&
strchr(delims, c) == NULL; ++n)
*tok++ = c;
if (c == EOF)
return (size_t)-1;
if (n == max)
return max;
*tok = 0;
return n;
}
Usage is pretty straightforward as well:
#include <stdio.h>
#include <stdlib.h>
typedef struct person {
char name[20];
char address[30];
char phone[20];
} Person;
int
main(void)
{
FILE *read_file;
Person temp;
size_t line_num;
size_t len;
int c;
int exit_status = EXIT_SUCCESS;
read_file = fopen("read.txt", "r");
if (read_file == NULL) {
fprintf(stderr, "Error opening read.txt\n");
return 1;
}
for (line_num = 0;; ++line_num) {
/*
* Used for detecting early EOF
* (e.g. the last line contains only a name).
*/
temp.name[0] = temp.phone[0] = 0;
len = read_token(temp.name, sizeof(temp.name), "\t",
read_file);
if (len == (size_t)-1)
break;
if (len == max) {
fprintf(stderr, "Skipping bad line %zu\n", line_num + 1);
while ((c = getchar()) != EOF && c != '\n')
; /* nothing */
continue;
}
len = read_token(temp.address, sizeof(temp.address), "\t",
read_file);
if (len == (size_t)-1)
break;
if (len == max) {
fprintf(stderr, "Skipping bad line %zu\n", line_num + 1);
while ((c = getchar()) != EOF && c != '\n')
; /* nothing */
continue;
}
len = read_token(temp.phone, sizeof(temp.phone), "\t",
read_file);
if (len == (size_t)-1)
break;
if (len == max) {
fprintf(stderr, "Skipping bad line %zu\n", line_num + 1);
while ((c = getchar()) != EOF && c != '\n')
; /* nothing */
continue;
}
// Do something with the input here. Example:
printf("Entry %zu:\n"
"\tName: %.*s\n"
"\tAddress: %.*s\n"
"\tPhone: %.*s\n\n",
line_num + 1,
(int)sizeof(temp.name), temp.name,
(int)sizeof(temp.address), temp.address,
(int)sizeof(temp.phone), temp.phone);
}
if (ferror(read_file)) {
fprintf(stderr, "error reading from file\n");
exit_status = EXIT_FAILURE;
}
else if (feof(read_file) && temp.phone[0] == 0 && temp.name[0] != 0) {
fprintf(stderr, "Unexpected end of file while reading entry %zu\n",
line_num + 1);
exit_status = EXIT_FAILURE;
}
//else feof(read_file) is still true, but we parsed a full entry/record
fclose(read_file);
return exit_status;
}
Notice how the exact same 8 lines of code appear in the read loop to handle the return value of read_token? Because of that, I think there's probably room for another function to call read_token and handle its return value, allowing main to simply call this "read_token handler", but I think the code above gives you the basic idea about how to work with read_token and how it can apply in your situation. You might change the behavior in some way, if you like, but the read_token function above would suit me rather well when working with delimited input like this (things would be a bit more complex when you add quoted fields into the mix, but not much more complex as far as I can tell). You can decide what happens with max being returned. I opted for it being considered an error, but you might think otherwise. You might even add an extra getchar when n == max and consider max being a successful return value and something like (size_t)-2 being the "token too large" error indicator instead.

Resources