Reading from two columns into variables - c

I'm writing a C program that takes an input file and stores it. The input file has two columns, with an integer in the first and a string in the second, like so:
12 apple
17 frog
20 grass
I've tried using fgets to take an entire line as a string then break it apart using scanf but I'm getting lots of issues. I have searched quite a lot but haven't found anything that answers my question, but sorry if I missed something obvious.
This is the code that I've been trying:
while(fgets(line, sizeof(line), fp))
{
scanf(line, "%d\t%s", &key, value);
insert(key, value, newdict);
}

Let's have a quick go at doing with strtok since someone mentioned it. Let's imagine your file is called file.txt and has the following contents:
10 aaa
20 bbb
30 ccc
This is how we can parse it:
#include <stdio.h>
#include <string.h>
#define MAX_NUMBER_OF_LINES 10 // parse a maximum of 10 lines
#define MAX_LINE_SIZE 50 // parse a maximum of 50 chars per line
int main ()
{
FILE* fh = fopen("file.txt", "r"); // open the file
char temp[MAX_LINE_SIZE]; // some buffer storage for each line
// storage for MAX_NUMBER_OF_LINES integers
int d_out[MAX_NUMBER_OF_LINES];
// storage for MAX_NUMBER_OF_LINES strings each MAX_LINE_SIZE chars long
char s_out[MAX_NUMBER_OF_LINES][MAX_LINE_SIZE];
// i is a special variable that tells us if we're parsing a number or a string (0 for num, 1 for string)
// di and si are indices to keep track of which line we're currently handling
int i = 0, di = 0, si = 0;
while (fgets(temp, MAX_LINE_SIZE, fh) && di < MAX_NUMBER_OF_LINES) // read the input file and parse the string
{
temp[strlen(temp) -1] = '\0'; // get rid of the newline in the buffer
char* c = strtok(temp, " "); // set the delimiters
while(c != NULL)
{
if (i == 0) // i equal to 0 means we're parsing a number
{
i = 1; // next we'll parse a string, let's indicate that
sscanf(c, "%d", &d_out[di++]);
}
else // i must be 1 parsing a string
{
i = 0; // next we'll parse a number
sprintf(s_out[si++], "%s", c);
}
c = strtok(NULL, " ");
}
printf("%d %s\n", d_out[di -1], s_out[si - 1]); // print what we've extracted
}
fclose(fh);
return 0;
}
This will extract the contents from the file and store them in respective arrays, we then print them and get back our original contents:
$ ./a.out
10 aaa
20 bbb
30 ccc

Use:
fgets (name, 100, stdin);
100 is the max length of the buffer. You should adjust it as per your need.
Use:
scanf ("%[^\n]%*c", name);
The [] is the scanset character. [^\n] tells that while the input is not a newline ('\n') take input. Then with the %*c it reads the newline character from the input buffer (which is not read), and the * indicates that this read in input is discarded (assignment suppression), as you do not need it, and this newline in the buffer does not create any problem for next inputs that you might take.

The problem here seems to be that you are reading from the file twice. First with fgets and then with scanf. You will probably not get an errors from the compiler in your use of scanf, but should be getting warnings as you use line for the format string and the other arguments does not match the format. It would also be pretty obvious if you checked the return value from scanf, as it returns the number of successfully scanned items. Your call would most likely return zero (or minus one when you have hit end of file).
You should be using sscanf instead to parse the line you read with fgets.
See e.g. this reference for the different scanf variants.

Your problem can be solved by using sscanf (with the support of getline) like below:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
FILE *fp;
char *line = NULL;
size_t len = 0;
ssize_t read;
/* tokens bags */
char tok_str[255];
int tok_int;
fp = fopen("./file.txt", "r");
if (fp == NULL)
exit(EXIT_FAILURE);
/* Reads the line from the stream. */
while ((read = getline(&line, &len, fp)) != -1) {
/* Scans the character string pointed by line, according to given format. */
sscanf(line, "%d\t%s", &tok_int, tok_str);
printf("%d-%s\n", tok_int, tok_str);
}
if (line)
free(line);
exit(EXIT_SUCCESS);
}
Or, even simpler. You could use fscanf (with the support of feof) and replace the while loop shown above (along with some other redundant code cleanups) with the following one:
/* Tests the end-of-file indicator for the stream. */
while (!feof(fp)) {
/* Scans input from the file stream pointer. */
fscanf(fp,"%d\t%s\n",&tok_int, tok_str);
printf("%d-%s\n", tok_int, tok_str);
}
Assuming that your file contains following lines (where single line format is number[tab]string[newline]):
12 apple
17 frog
20 grass
the output will be:
12-apple
17-frog
20-grass

Related

How to get each string within a buffer fetched with "getline" from a file in C

I'm trying to read every string separated with commas, dots or whitespaces from every line of a text from a file (I'm just receiving alphanumeric characters with scanf for simplicity). I'm using the getline function from <stdio.h> library and it reads the line just fine. But when I try to "iterate" over the buffer that was fetched with it, it always returns the first string read from the file. Let's suppose I have a file called "entry.txt" with the following content:
test1234 test hello
another test2
And my "main.c" contains the following:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAX_WORD 500
int main()
{
FILE *fp;
int currentLine = 1;
size_t characters, maxLine = MAX_WORD * 500;
/* Buffer can keep up to 500 words of 500 characters each */
char *word = (char *)malloc(MAX_WORD * sizeof(char)), *buffer = (char *)malloc((int)maxLine * sizeof(char));
fp = fopen("entry.txt", "r");
if (fp == NULL) {
return 1;
}
for (currentLine = 1; (characters = getline(&buffer, &maxLine, fp)) != -1; currentLine++)
{
/* This line gets "test1234" onto "word" variable, as expected */
sscanf(buffer, "%[a-zA-Z_0-9]", word);
printf("%s", word); // As expected
/* This line should get "test" string, but again it obtains "test1234" from the buffer */
sscanf(buffer, "%[a-zA-Z_0-9]", word);
printf("%s", word); // Not intended...
// Do some stuff with the "word" and "currentLine" variables...
}
return 0;
}
What happens is that I'm trying to get every alphanumeric string (namely word from now on) in sequence from the buffer, when the sscanf function just gives me the first occurrence of a word within the specified buffer string. Also, every line on the entry file can contain an unknown amount of words separated by either whitespaces, commas, dots, special characters, etc.
I'm obtaining every line from the file separately with "getline" because I need to get every word from every line and store it in other place with the "currentLine" variable, so I'll know from which line a given word would've come. Any ideas of how to do that?
fscanf has an input stream argument. A stream can change its state, so that the second call to fscanf reads a different thing. For example:
fscanf(stdin, "%s", str1); // str1 contains some string; stdin advances
fscanf(stdin, "%s", str2); // str2 contains some other sting
scanf does not have a stream argument, but it has a global stream to work with, so it works exactly like fscanf(stdin, ...).
sscanf does not have a stream argument, nor there is any global state to keep track of what was read. There is an input string. You scan it, some characters get converted, and... nothing else changes. The string remains the same string (how could it possibly be otherwise?) and no information about how far the scan has advanced is stored anywhere.
sscanf(buffer, "%s", str1); // str1 contains some string; nothing else changes
sscanf(buffer, "%s", str2); // str2 contains the same sting
So what does a poor programmer fo?
Well I lied. No information about how far the scan has advanced is stored anywhere only if you don't request it.
int nchars;
sscanf(buffer, "%s%n", str1, &nchars); // str1 contains some string;
// nchars contains number of characters consumed
sscanf(buffer+nchars, "%s", str2); // str2 contains some other string
Error handling and %s field widths omitted for brevity. You should never omit them in real code.

How to get string terminated with new line in file handling using fscanf

I have these in my file:
JOS BUTTLER
JASON ROY
DAWID MALAN
JONNY BAISTROW
BEN STOKES
in different lines. And I want them to extract in my program to print them on the exact way they are in file. My imaginary output screen is:
JOS BUTTLER
JASON ROY
DAWID MALAN
JONNY BAISTROW
BEN STOKES
How would I do it using fscanf() and printf(). Moreover suggest me the way to change the delimiters of fscanf() to \n
I have tried something like this:
char n[5][30];
printf("Name of 5 cricketers read from the file:\n");
for(i=0;i<5;i++)
{
fscanf(fp,"%[^\n]s",&n[i]);
printf("%s ",n[i]);
}
fclose(fp);
}
But it works only for the first string and other string could not be displayed. There were garbage values.
Be protected from buffer overflows limiting the length of the input, use "%29[^\n]" instead of "%[^\n]s" (you don't need the s specifier)
Consume the trailing new line (your wildcard ^\n reads until a new line is found) using %*c, * means that a char will be read but won't be assigned:
fscanf(fp, "%29[^\n]%*c", n[i]);
or better yet (as pointed out by #WeatherVane), add a space before %, this will consume any blank space including tabs, spaces and new lines that may be left in the buffer from the previous read:
fscanf(fp, " %29[^\n]", n[i]);
Notice that you don't need an ampersand in &n[i], fscanf wants a pointer but n[i] is already (decays into) a pointer when passed as an argument.
Finally, as pointed out by #paddy, fgets does all that for you and is a safer function, always prefer fgets.
How to get string terminated with new line in file handling using fscanf
Use fgets() to read a line of input into a string. It also reads and saves the line's '\n'.
Buffer size 30 may be too small, consider 60.
To detect if the line is too long and lop off the '\n', read in at least + 2 characters.
// char n[5][30];
#define NAME_N 5
#define NAME_SIZE 30
char n[NAME_N][NAME_SIZE]
char buf[NAME_SIZE + 2]
while (i < NAME_N && fgets(buf, sizeof buf, fp) != NULL) {
size_t len = strlen(buf);
// Lop off potential trailing '\n'
if (len > 0 && buf[len - 1] == '\n') {
buf[--len] = '\0';
}
// Handle unusual name length
if (len >= NAME_SIZE || len == 0) {
fprintf(stderr, "Unacceptable name <%s>.\n", buf);
exit(EXIT_FAILURE);
}
// Success
strcpy(n[i], buf);
printf("%s\n", n[i]);
i++;
}

Issue with fread () and fwrite () functions in C

I have written a basic code which writes into a file a string in binary mode (using fwrite()). Also I can read the same string from the file (using fread()) in to the buffer and print it. It works but in the part where I read from the file, extra junk is also read into the buffer. My question is how to know the length of the bytes to be read, correctly?
The following is the code --
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#define BUFSZ 81
char * get_string (char *, size_t);
int main (int argc, char * argv[])
{
if (argc != 2)
{
fprintf (stderr, "Invalid Arguments!!\n");
printf ("syntax: %s <filename>\n", argv[0]);
exit (1);
}
FILE * fp;
if ((fp = fopen(argv[1], "ab+")) == NULL)
{
fprintf (stderr, "Cannot openm file <%s>\n", argv[1]);
perror ("");
exit (2);
}
char string[BUFSZ];
char readString[BUFSZ];
size_t BYTES, BYTES_READ;
puts ("Enter a string: ");
get_string (string, BUFSZ);
// printf ("You have entered: %s\n", string);
BYTES = fwrite (string, sizeof (char), strlen (string), fp);
printf ("\nYou have written %zu bytes to file <%s>.\n", BYTES, argv[1]);
printf ("\nContents of the file <%s>:\n", argv[1]);
rewind (fp);
BYTES_READ = fread (readString, sizeof (char), BUFSZ, fp);
printf ("%s\n", readString);
printf ("\nYou have read %zu bytes from file <%s>.\n", BYTES_READ, argv[1]);
getchar ();
fclose (fp);
return 0;
}
char * get_string (char * str, size_t n)
{
char * ret_val = fgets (str, n, stdin);
char * find;
if (ret_val)
{
find = strchr (str, '\n');
if (find)
* find = '\0';
else
while (getchar () != '\n')
continue;
}
return ret_val;
}
in the part where I read from the file, extra junk is also read into the buffer.
No, it isn't. Since you're opening the file in append mode, it's possible that you're reading in extra data preceding the string you've written, but you are not reading anything past the end of what you wrote, because there isn't anything there to read. When the file is initially empty or absent, you can verify that by comparing the value of BYTES to the value of BYTES_READ.
What you are actually seeing is the effect of the read-back data not being null terminated. You did not write the terminator to the file, so you could not read it back. It might be reasonable to avoid writing the terminator, but in that case you must supply a new one when you read the data back in. For example,
readString[BYTES_READ] = '\0';
My question is how to know the length of the bytes to be read, correctly?
There are various possibilities. Among the prominent ones are
use fixed-length data
write the string length to the file, ahead of the string data.
Alternatively, in your particular case, when the file starts empty and you write only one string in it, there is also the possibility of capturing and working with how many bytes were read instead of knowing in advance how many should be read.
First of all you get string from the user, which will contain up to BUFSZ-1 characters (get_string() function will remove the trailing newline or skip any character exceeding the BUFSZ limit.
For example, the user might have inserted the word Hello\n, so that after get_string() call string array contains
-------------------
|H|e|l|l|o|'\0'|...
-------------------
Then you fwrite the string buffer to the output file, writing strlen (string) bytes. This doesn't include the string terminator '\0'.
In our example the contents of the output file is
--------------
|H|e|l|l|o|...
--------------
Finally you read back from the file. But since readString array is not initialized, the file contents will be followed by every junk character might be present in the uninitialized array.
For example, readString could have the following initial contents:
---------------------------------------------
|a|a|a|a|a|T|h|i|s| |i|s| |j|u|n|k|!|'\0'|...
---------------------------------------------
and after reading from the file
---------------------------------------------
|H|e|l|l|o|T|h|i|s| |i|s| |j|u|n|k|!|'\0'|...
---------------------------------------------
So that the following string would be printed
HelloThis is junk!
In order to avoid these issues, you have to make sure that a trailing terminator is present in the target buffer. So, just initialize the array in this way:
char readString[BUFSZ] = { 0 };
In this way at least a string terminator will be present in the target array.
Alternatively, memset it to 0 before every read:
memset (readString, 0, BUFSZ);

C how to search string in a file?

I have a problem with my code, I'm trying to search a string in a file and I can read it but, when I compare two strings it takes only the last one of the file as equal to the the first string entered with the scanf().
So imagine I wrote in my file three words and each one is returning to the line.
test
test12
test123
If in my scanf() I write test12 for example or test when it's going to read it will return false to the compare so (!== 0). But if I write test123 it will works because it's the last word of the file but I don't know why?
char word[26];
char singleLine[26];
FILE *file = fopen("bin/Release/myWords.txt", "a+");
scanf("%26s", word);
if (file != NULL) {
while (!feof(file)) {
fgets(singleLine, 26, file);
compare = strcmp(singleLine, word);
if (compare == 0) {
printf("\n%s\n",word);
}
}
fclose(file);
}
Your program only works in very special cases and has several problems:
scanf("%26s", word); may affect up to 27 bytes in the destination array, which is defined with a length of only 26.
furthermore, you should check the return value to avoid undefined behavior on invalid input.
fopen("bin/Release/myWords.txt", "a+"); opens the file in append mode: is this necessary?
while (!feof(file)) is always wrong, you should instead check the return value of fgets() that returns NULL at end of file.
compare = strcmp(singleLine, word); only compares for an exact math of the full line, which can only happen if the word has 25 characters, otherwise the trailing newline in singleLine will cause the comparison to fail. Furthermore, broken lines may cause unexpected results, as well as if the file does not end with a newline.
the reason it matches the last word in the file is you forget to write a trailing newline after the last word in the file, so the last fgets() fills the buffer with the exact word and no trailing newline.
if you search for matches inside the line, you should use a larger buffer and search for a match with strstr.
if you search for a exact match, you should strip the trailing newline before the comparison.
Here is a modified version:
#include <stdio.h>
#include <string.h>
int main() {
char word[27];
char singleLine[256];
FILE *file = fopen("bin/Release/myWords.txt", "r");
if (scanf("%26s", word) != 1)
return 1;
if (file != NULL) {
while (fgets(singleLine, sizeof singleLine, file)) {
singleLine[strcspn(singleLine, "\n")] = '\0'; // strip the newline if any
compare = strcmp(singleLine, word);
if (compare == 0) {
printf("\n%s\n", word);
}
}
fclose(file);
}
return 0;
}

fscanf() how to go in the next line?

So I have a wall of text in a file and I need to recognize some words that are between the $ sign and call them as numbers then print the modified text in another file along with what the numbers correspond to.
Also lines are not defined and columns should be max 80 characters.
Ex:
I $like$ cats.
I [1] cats.
[1] --> like
That's what I did:
#include <stdio.h>
#include <stdlib.h>
#define N 80
#define MAX 9999
int main()
{
FILE *fp;
int i=0,count=0;
char matr[MAX][N];
if((fp = fopen("text.txt","r")) == NULL){
printf("Error.");
exit(EXIT_FAILURE);
}
while((fscanf(fp,"%s",matr[i])) != EOF){
printf("%s ",matr[i]);
if(matr[i] == '\0')
printf("\n");
//I was thinking maybe to find two $ but Idk how to replace the entire word
/*
if(matr[i] == '$')
count++;
if(count == 2){
...code...
}
*/
i++;
}
fclose(fp);
return 0;
}
My problem is that fscanf doesn't recognize '\0' so it doesn't go in the next line when I print the array..also I don't know how to replace $word$ with a number.
Not only will fscanf("%s") read one whitespace-delimited string at a time, it will also eat all whitespace between those strings, including line terminators. If you want to reproduce the input whitespace in the output, as your example suggests you do, then you need a different approach.
Also lines are not defined and columns should be max 80 characters.
I take that to mean the number of lines is not known in advance, and that it is acceptable to assume that no line will contain more than 80 characters (not counting any line terminator).
When you say
My problem is that fscanf doesn't recognize '\0' so it doesn't go in the next line when I print the array
I suppose you're talking about this code:
char matr[MAX][N];
/* ... */
if(matr[i] == '\0')
Given that declaration for matr, the given condition will always evaluate to false, regardless of any other consideration. fscanf() does not factor in at all. The type of matr[i] is char[N], an array of N elements of type char. That evaluates to a pointer to the first element of the array, which pointer will never be NULL. It looks like you're trying to determine when to write a newline, but nothing remotely resembling this approach can do that.
I suggest you start by taking #Barmar's advice to read line-by-line via fgets(). That might look like so:
char line[N+2]; /* N + 2 leaves space for both newline and string terminator */
if (fgets(line, sizeof(line), fp) != NULL) {
/* one line read; handle it ... */
} else {
/* handle end-of-file or I/O error */
}
Then for each line you read, parse out the "$word$" tokens by whatever means you like, and output the needed results (everything but the $-delimited tokens verbatim; the bracket substitution number for each token). Of course, you'll need to memorialize the substitution tokens for later output. Remember to make copies of those, as the buffer will be overwritten on each read (if done as I suggest above).
fscanf() does recognize '\0', under select circumstances, but that is not the issue here.
Code needs to detect '\n'. fscanf(fp,"%s"... will not do that. The first thing "%s" directs is to consume (and not save) any leading white-space including '\n'. Read a line of text with fgets().
Simple read 1 line at a time. Then march down the buffer looking for words.
Following uses "%n" to track how far in the buffer scanning stopped.
// more room for \n \0
#define BUF_SIZE (N + 1 + 1)
char buffer[BUF_SIZE];
while (fgets(buffer, sizeof buffer, stdin) != NULL) {
char *p = buffer;
char word[sizeof buffer];
int n;
while (sscanf(p, "%s%n", word, &n) == 1) {
// do something with word
if (strcmp(word, "$zero$") == 0) fputs("0", stdout);
else if (strcmp(word, "$one$") == 0) fputs("1", stdout);
else fputs(word, stdout);
fputc(' ', stdout);
p += n;
}
fputc('\n', stdout);
}
Use fread() to read the file contents to a char[] buffer. Then iterate through this buffer and whenever you find a $ you perform a strncmp to detect with which value to replace it (keep in mind, that there is a 2nd $ at the end of the word). To replace $word$ with a number you need to either shrink or extend the buffer at the position of the word - this depends on the string size of the number in ascii format (look solutions up on google, normally you should be able to use memmove). Then you can write the number to the cave, that arose from extending the buffer (just overwrite the $word$ aswell).
Then write the buffer to the file, overwriting all its previous contents.

Resources