Ignoring "=" character with fscanf - c

I am trying to read a file that contains lines in this format abc=1234. How can I make fscanf ignore the = and store str1="abc" and str2="1234"?
I tried this:
fscanf(fich1, "%[^=]=%[^=]" , palavra, num_char)

I'd recommend using fgets to read lines and then parse them with sscanf. But you can use the same principle for just fscanf if you want.
#include <stdio.h>
int main(void) {
char buf[100];
char str1[100];
char str2[100];
if(! fgets(buf, sizeof buf, stdin)) return 1;
if(sscanf(buf, "%[^=]=%s", str1, str2) != 2) return 1;
puts(str1);
puts(str2);
}
So what does %[^=]=%s do? First %[^=] reads everything until the first occurrence of = and stores it in str1. Then it reads a = and discards it. Then it reads a string to str2. And here you can see the problem with your format string. %[^=] expects the string to end with =, and you have another one at the end. So you would have a successful read of the string abc=1234=.
Note that %[^=] and %s treats white space a little differently. So if that's a concern, you need to account for that. For example with %[^=]=%[^\n].
And in order to avoid buffer overflow, you also might want to do %99[^=]=%99[^\n].

Related

How to get each string within a buffer fetched with "getline" from a file in C

I'm trying to read every string separated with commas, dots or whitespaces from every line of a text from a file (I'm just receiving alphanumeric characters with scanf for simplicity). I'm using the getline function from <stdio.h> library and it reads the line just fine. But when I try to "iterate" over the buffer that was fetched with it, it always returns the first string read from the file. Let's suppose I have a file called "entry.txt" with the following content:
test1234 test hello
another test2
And my "main.c" contains the following:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAX_WORD 500
int main()
{
FILE *fp;
int currentLine = 1;
size_t characters, maxLine = MAX_WORD * 500;
/* Buffer can keep up to 500 words of 500 characters each */
char *word = (char *)malloc(MAX_WORD * sizeof(char)), *buffer = (char *)malloc((int)maxLine * sizeof(char));
fp = fopen("entry.txt", "r");
if (fp == NULL) {
return 1;
}
for (currentLine = 1; (characters = getline(&buffer, &maxLine, fp)) != -1; currentLine++)
{
/* This line gets "test1234" onto "word" variable, as expected */
sscanf(buffer, "%[a-zA-Z_0-9]", word);
printf("%s", word); // As expected
/* This line should get "test" string, but again it obtains "test1234" from the buffer */
sscanf(buffer, "%[a-zA-Z_0-9]", word);
printf("%s", word); // Not intended...
// Do some stuff with the "word" and "currentLine" variables...
}
return 0;
}
What happens is that I'm trying to get every alphanumeric string (namely word from now on) in sequence from the buffer, when the sscanf function just gives me the first occurrence of a word within the specified buffer string. Also, every line on the entry file can contain an unknown amount of words separated by either whitespaces, commas, dots, special characters, etc.
I'm obtaining every line from the file separately with "getline" because I need to get every word from every line and store it in other place with the "currentLine" variable, so I'll know from which line a given word would've come. Any ideas of how to do that?
fscanf has an input stream argument. A stream can change its state, so that the second call to fscanf reads a different thing. For example:
fscanf(stdin, "%s", str1); // str1 contains some string; stdin advances
fscanf(stdin, "%s", str2); // str2 contains some other sting
scanf does not have a stream argument, but it has a global stream to work with, so it works exactly like fscanf(stdin, ...).
sscanf does not have a stream argument, nor there is any global state to keep track of what was read. There is an input string. You scan it, some characters get converted, and... nothing else changes. The string remains the same string (how could it possibly be otherwise?) and no information about how far the scan has advanced is stored anywhere.
sscanf(buffer, "%s", str1); // str1 contains some string; nothing else changes
sscanf(buffer, "%s", str2); // str2 contains the same sting
So what does a poor programmer fo?
Well I lied. No information about how far the scan has advanced is stored anywhere only if you don't request it.
int nchars;
sscanf(buffer, "%s%n", str1, &nchars); // str1 contains some string;
// nchars contains number of characters consumed
sscanf(buffer+nchars, "%s", str2); // str2 contains some other string
Error handling and %s field widths omitted for brevity. You should never omit them in real code.

Reading multiple strings using sscanf() based on a delimiter

I have string having multiple words separated with commas like
char str[]="K&R,c89,c99,c11";
I am trying to read the first 2 words into a separate character arrays using sscanf().
sscanf(str, "%[^,] s%[^,]s", str1, str2);
I intended sscanf() to scan through str till reaching a ,, store it to str1, continue scanning till another , and store into str2.
But value is being stored only into str1 while str2 seem to be having garbage.
I tried removing the space between the %[^,]ss if that was of any significance but it made no difference on the output.
What am I doing wrong? Or is this not possible for multiple words?
I've heard of doing something like this with strtok() but I was wondering if sscanf() could be used for this.
Duh.. It took me a while to see it. Get rid of the s in your format string. The character class [...] takes the place of s and by putting s in there, you are forcing sscanf to look for a literal s in str, e.g.
#include <stdio.h>
#define MAX 8
int main (void) {
char str[]="K&R,c89,c99,c11";
char str1[MAX] = "";
char str2[MAX] = "";
if (sscanf(str, "%[^,],%[^,]", str1, str2) == 2)
printf ("str1 : %s\nstr2 : %s\n", str1, str2);
return 0;
}
Example Use/Output
$ ./bin/sscanfcomma
str1 : K&R
str2 : c89
Also, consider protecting your arrays from overflow with, e.g.
if (sscanf(str, "%7[^,],%7[^,]", str1, str2) == 2)

Split string with multiple delimiters using strtok in C

I have problem with splitting a string. The code below works, but only if between strings are ' ' (spaces). But I need to split strings even if there is any whitespace char. Is strtok() even necessary?
char input[1024];
char *string[3];
int i=0;
fgets(input,1024,stdin)!='\0') //get input
{
string[0]=strtok(input," "); //parce first string
while(string[i]!=NULL) //parce others
{
printf("string [%d]=%s\n",i,string[i]);
i++;
string[i]=strtok(NULL," ");
}
A simple example that shows how to use multiple delimiters and potential improvements in your code. See embedded comments for explanation.
Be warned about the general shortcomings of strtok() (from manual):
These functions modify their first argument.
These functions cannot be used on constant strings.
The identity of the delimiting byte is lost.
The strtok() function uses a static buffer while parsing, so it's not thread
safe. Use strtok_r() if this matters to you.
#include <stdio.h>
#include<string.h>
int main(void)
{
char input[1024];
char *string[256]; // 1) 3 is dangerously small,256 can hold a while;-)
// You may want to dynamically allocate the pointers
// in a general, robust case.
char delimit[]=" \t\r\n\v\f"; // 2) POSIX whitespace characters
int i = 0, j = 0;
if(fgets(input, sizeof input, stdin)) // 3) fgets() returns NULL on error.
// 4) Better practice to use sizeof
// input rather hard-coding size
{
string[i]=strtok(input,delimit); // 5) Make use of i to be explicit
while(string[i]!=NULL)
{
printf("string [%d]=%s\n",i,string[i]);
i++;
string[i]=strtok(NULL,delimit);
}
for (j=0;j<i;j++)
printf("%s", string[i]);
}
return 0;
}

Input using sscanf with regular expression

I want to take input of a particular part of a string like
"First (helloWorld): last"
From that string I want to take input only "helloWorld" by regular expression. I am using
%*[^(] (%s):"
But that does not serve my purpose. Please somebody help me to solve this problem.
The format specifiers in the scanf family of functions are not generally considered to be a species of regular expression.
However, you can do what you want something like this.
#include <stdio.h>
int main() {
char str[256];
scanf("First (helloWorld): last", "%*[^(](%[^)]%*[^\n]", str);
printf("%s\n", str);
return 0;
}
%*[^(] read and discard everything up to opening paren
( read and discard the opening paren
%[^)] read and store up up to (but not including) the closing paren
%*[^\n] read and discard up to (but not including) the newline
The last format specifier is not necessary in the context of the above sscanf, but would be useful if reading from a stream and you want it positioned at the end of the current line for the next read. Note that the newline is still left in the stream, though.
Rather than use fscanf (or scanf) to read from a stream directly, it's pretty much always better read a line with fgets and then extract the fields of interest with sscanf
// Read lines, extracting the first parenthesized substring.
#include <stdio.h>
int main() {
char line[256], str[128];
while (fgets(line, sizeof line, stdin)) {
sscanf(line, "%*[^(](%127[^)]", str);
printf("|%s|\n", str);
}
return 0;
}
Sample run:
one (two) three
|two|
four (five) six
|five|
seven eight (nine) ten
|nine|
Sorry, no true regular expression parser in standard C.
Using the format in the scanf() family is not a full-fledged regular expression, but can do the job. "%n" tells sscanf() to save the current scanning offset.
#include <stdio.h>
#include <stdlib.h>
char *foo(char *buf) {
#define NotOpenParen "%*[^(]"
#define NotCloseParen "%*[^)]"
int start;
int end = 0;
sscanf(buf, NotOpenParen "(%n" NotCloseParen ")%n", &start, &end);
if (end == 0) {
return NULL; // End never found
}
buf[end-1] = '\0';
return &buf[start];
}
// Usage example
char buf[] = "First (helloWorld): last";
printf("%s\n", foo(buf));
But this approach fails on "First (): last". More code would be needed.
A pair of strchr() calls is better.
char *foo(char *buf) {
char *start = strchr(buf, '(');
if (start == NULL) {
return NULL; // start never found
}
char *end = strchr(start, ')');
if (end == NULL) {
return NULL; // end never found
}
*end = '\0';
return &start[1];
}
Else one needs to use a not-part-of-the C-spec solution.

Tell sscanf to consider \0 as a valid character to read

I am parsing a file with formatted strings. Need some help with parsing.
Consider below example.
int main()
{
char value[32], name[32];
int buff_ret1, buff_ret2;
char *buff = "1000000:Hello";
char *buff_other ="200000:\0";
buff_ret1 = sscanf(buff,"%[^:]:%s", value, name);
printf("buff_ret1 is %d\n", buff_ret1);
buff_ret2 = sscanf(buff_other,"%[^:]:%s", value, name);
printf("buff_ret2 is %d\n", buff_ret2);
return 0;
}
I am expecting value of buff1_ret and and buff2_ret to be 2, but buff_ret2 value is coming as 1. I understand that it is not considering NUL. Is there a way I can say to sscanf function to consider NUL as a character to read.
No, this is not possible. From sscanf
Reaching the end of the string in sscanf() shall be equivalent to encountering end-of-file for fscanf().
This means \0 (end of string) is interpreted as end of file.

Resources