Parsing string with sscanf that has a string in it - c

Project is in C. I need to parse strings that are always formatted the following way: integer, whitespace, plus sign, multi-word string, plus sign, white space, integer, whitespace, integer, end-of-line
Example:
10 +This is 1 string+ 2 -1
I'm having a hard time figuring out what to enter in the formatting of sscanf so that the string surrounded by the '+' signs get parsed correctly, without including the + signs. Assuming sscanf can be used for this case.
I tried "%d +%s+ %d %d" and that didn't work.

You use %s but that reads up to the first white space character. You want to read a string of not-plus-signs, so say that's what sscanf() should do:
"%d +%[^+]+ %d %d"
That's a scan set — see POSIX sscanf(). You should also protect yourself from buffer overflow. If you have:
char buffer[256];
use:
"%d +%255[^+]+ %d %d"
Note the off-by-one in the lengths — this is a design feature of the scanf() family of functions. You could skip leading spaces by putting a space after the first + in the format string. It is not possible to skip trailing spaces before the second + in the data; you'll have to remove those separately.
You ask for 'end of line' after the 3rd number. That's fairly hard. You might use:
"%d +%255[^+]+ %d %d %n"
passing an extra pointer to int argument to hold the offset of the last character parsed. The blank before the %n skips white space, including newlines, so if you read into int nbytes; (passing &nbytes), then you'd check if (buffer[nbytes] != '\0') { …handle trailing garbage… } (but only after checking that you had four successful conversion specifications — %n conversion specifications are not counted in the return value from sscanf() et al). There are other solutions to that; they're all grubby to some extent.

Related

C language: scanf and sscanf expressions

I've encountered a expressions that go something like this inside scanf and sscanf arguments:
sscanf(buffer, "%d,%100[^,]%*c%f", destination_pointer)
or
scanf("\n%99s", destination);
What is the correct way of interpreting these? I know what things like "%s %c %d" are, and also that the %100 or generally "%number" is the size of the input to be read. But what about the rest? All I can find are basic examples, nothing near this complex. Is there any reference guide?
What is the correct way to interpreted these?
sscanf(buffer, "%d,%100[^,]%*c%f", destinantion_pointer)
Is an invalid call. There are 3 conversion specifiers that need an argument - %d, %[], %f. That means exactly 3 arguments after formatting string are needed, but only one destinantion_pointer is provided.
%d - ignore any whitespace characters, read an int in base 10
, - read a comma
%100[^,] - read maximum number of 100 characters that are not a comma. Maximum up to 101 bytes (100 characters + null byte) are stored in destination buffer.
%[set] - reads characters in the set
%[^set] - reads characters that are not in the set
%*c - ignore one character (a comma, because %100[^,] reads up until a comma, or the string has ended, which would make scanf return here). Note - ignoring the result of conversion with * makes scanf not increment the return value in the case reading was successful.
%f - ignore any whitespace characters, read a float (in any format - decimal, scientific or hexadecimal)
scanf("\n%99s", destinantion);
\n - read (and ignore) any number of whitespace characters (whitespace, means anything for that isspace() returns nonzero, so either space, form feed, line feed, carriage return, tab or vertical tab)
%99s - ignore any leading whitespace characters (\n in front of it is useless...), then read up to 99 characters that are not whitespaces (the resulting buffer has to be at least 100 bytes long).

What is the effect of space at different places in the scanf format string?

The below is what I understood so far. Please confirm, add, correct as the case may be -
scanf (" %c %d %s", &a, &b, c);
In the above - the first space before %c makes sure the buffer is cleared before scanf starts accepting new string into it from stdin for this particular function scanf call. This clears any of the delimiters from any previous input function calls..
The remaining two spaces before %d and %s allow any number of spaces or tabs but not enter key press between the user's entry of a, b and b,c , respectively.
Even with the above, none of the inputs can contain a space in it i.e. space is the delimiter for each of the three inputs. To add space it has to be specified in the string control braces [] like "%[a-z A-Z_0-9]" can contain any upper or lower case alphabets, digits 0-9, a space and an underscore - but will treat all other characters as invalid - The invalid character will go to the next input in the format string, if any, so if %[___] above was followed by %c and an astrick is pressed, the astrick is put into the character corresponding to %c.
Please confirm, correct, add. Thanks.
From cppreference.com:
Any single whitespace character in the format string consumes all available consecutive whitespace characters from the input
So all the spaces in the format string just mean to skip over any whitespace in the input.
There's no difference between the spaces before %c and the other spaces. The initial spaces don't clear the buffer, it just skips over any initial whitespace in the input. This ensures that %c will read the first non-whitespace character in the input.
Whitespace includes space, TAB, and newline characters. So you can put spaces or press enter between each input.
You don't actually need the spaces before %d or %s. These formats don't read anything that contains whitespace, and they automatically skip over any whitespace before the object they read. The spaces in the format string are redundant and do no harm, and may make it easier to read.
All conversion specifiers other than [, c, and n consume and discard all leading whitespace characters (determined as if by calling isspace) before attempting to parse the input.
...
The conversion specifiers that do not consume leading whitespace, such as %c, can be made to do so by using a whitespace character in the format string
the first space before %c makes sure the buffer is cleared before scanf starts accepting new string into it from stdin for this particular function scanf call. This clears any of the delimiters from any previous input function calls.
No. The buffer is not cleared and there are no delimiters.
The remaining two spaces before %d and %s allow any number of spaces or tabs but not enter key press between the user's entry of a, b and b,c , respectively.
No. The enter key produces a newline character, which is whitespace.
Even with the above, none of the inputs can contain a space in it i.e. space is the delimiter for each of the three inputs.
No.
The invalid character will go to the next input in the format string
Yes. This isn't limited to %[ ]: If e.g. %d sees 12foo in the input stream, it will consume 12 and leave foo to be read by the rest of the format string (however, if there are no leading digits at all, %d will fail and abort processing).
Any whitespace character in the format string reads and consumes all available whitespace characters at this point in the input stream, including spaces, tabs, and newlines. It doesn't matter whether the space appears before %c or %d or %s: All whitespace (including newlines) in the input is skipped.
%c accepts spaces just fine. Space is not a delimiter because %c has no delimiters; it always reads a single character. The only reason it can't read a space in your code is that it is preceded by in the format string, which will have skipped over all available whitespace.
As for %d and %s, they implicitly skip leading whitespace. That is, " %d" is equivalent to "%d" and " %s" is equivalent to "%s".

Using scanf in for loop

Here is my c code:
int main()
{
int a;
for (int i = 0; i < 3; i++)
scanf("%d ", &a);
return 0;
}
When I input things like 1 2 3, it will ask me to input more, and I need to input something not ' '.
However, when I change it to (or other thing not ' ')
scanf("%d !", &a);
and input 1 ! 2! 3!, it will not ask more input.
The final space in scanf("%d ", &a); instructs scanf to consume all white space following the number. It will keep reading from stdin until you type something that is not white space. Simplify the format this way:
scanf("%d", &a);
scanf will still ignore white space before the numbers.
Conversely, the format "%d !" consumes any white space following the number and a single !. It stops scanning when it gets this character, or another non space character which it leaves in the input stream. You cannot tell from the return value whether it matched the ! or not.
scanf is very clunky, it is very difficult to use it correctly. It is often better to read a line of input with fgets() and parse that with sscanf() or even simpler functions such as strtol(), strspn() or strcspn().
scanf("%d", &a);
This should do the job.
Basically, scanf() consumes stdin input as much as its pattern matches. If you pass "%d" as the pattern, it will stop reading input after a integer is found. However, if you feed it with "%dx" for example, it matches with all integers followed by a character 'x'.
More Details:
Your pattern string could have the following characters:
Whitespace character: the function will read and ignore any whitespace
characters encountered before the next non-whitespace character
(whitespace characters include spaces, newline and tab characters --
see isspace). A single whitespace in the format string validates any
quantity of whitespace characters extracted from the stream (including
none).
Non-whitespace character, except format specifier (%): Any character that is not either a whitespace character (blank, newline or
tab) or part of a format specifier (which begin with a % character)
causes the function to read the next character from the stream,
compare it to this non-whitespace character and if it matches, it is
discarded and the function continues with the next character of
format. If the character does not match, the function fails, returning
and leaving subsequent characters of the stream unread.
Format specifiers: A sequence formed by an initial percentage sign (%) indicates a format specifier, which is used to specify the type
and format of the data to be retrieved from the stream and stored into
the locations pointed by the additional arguments.
Source: http://www.cplusplus.com/reference/cstdio/scanf/

C fscanf reading in the correct format

I'm totally stuck with fscanf formatizer in C
Alice:(44;69) Bob:(74;68) John:(57;98)
This is what I need to read from file. Name:(score1, score2). But I failed to construct the correct formatizer for it:
while(fscanf(f, "%[a-zA-Z]%[;(]%d %d", &buff, &garbage, &s1, &s2)!= EOF){
What am I doing wrong?
First of all if you check e.g. this scanf (and family) reference you can see that you can add an asterisk to a format code to suppress assignment, so no need to pass "garbage" variables.
Secondly for your problem, the numbers are split with semicolon, but you have a space in the format which corresponds to whitespace.
In fact, due to the pattern-matching functionality built-in into scanf you should be able to simplify the format specification to e.g.
fscanf(f, " %[^:]:(%d;%d)", buff, &s1, &s2)
The "%[^:]" format reads everything as a string until it sees a colon. The rest of the format then matches the colon, the left parenthesis, a decimal number, a semicolon, another decimal number and a right parenthesis. I also added a leading space in the format, to skip leading whitespace if there is any.

Reading text with sscanf and fgets

So my text file looks similar to this
1. First 1.1
2. Second 2.2
Essentially an integer, string and then a float.
Using sscanf() and fgets() in theory, I should be able to scan this in (I have to do it in this format) but only get the integer can someone help point what I am doing wrong?
while(!feof(foo))
{
fgets(name, sizeof(name) - 1, foo);
sscanf(name,"%d%c%f", &intarray[i], &chararray[i], &floatarray[i]);
i++;
}
Where intarray, chararray, and floatarray are 1D arrays and i is an int initialized to 0.
The structure of the loop is wrong; you should not use feof() like that and you must always check the status of both fgets() and sscanf(). This code avoids overflowing the input arrays, too.
enum { MAX_ENTRIES = 10 };
int i;
int intarray[MAX_ENTRIES];
float floatarray[MAX_ENTRIES];
char chararray[MAX_ENTRIES][50];
for (i = 0; i < MAX_ENTRIES && fgets(name, sizeof(name), foo) != 0; i++)
{
if (sscanf(name,"%d. %49s %f", &intarray[i], chararray[i], &floatarray[i]) != 3)
...process format error...
}
Note the major changes:
The dot after the integer must be scanned by the format string.
The chararray has to be a 2D array to make any sense. If you read a single character with %c, it would contain the space after the first number, and the subsequent conversion specification (for the float value) would fail because the string name is not a floating point value.
The & in front of chararray[i] is not wanted when it is a 2D array. It would be needed if you were really reading a single character in a 1D array of characters instead of the whole string such as 'First' or 'Second' from the sample data.
The test checks that three values were converted successfully. Any smaller value indicates problems. With sscanf(), you'd only get EOF returned if there was nothing in the string for the first conversion specification to work on (empty string, all white space); you'd get 0 returned if the first non-blank was alphabetic or a punctuation character other than + or -, etc.
If you really want a single character instead of the name, then you'll have to arrange to read the extra characters in the word, maybe using:
if (sscanf(name,"%d %c%*s %f", &intarray[i], chararray[i], &floatarray[i]) != 3)
There's a space before the %c which is crucial; it will skip white space in the input, and then the %c will pick up the first non-blank character. The %*s will read more characters, skipping any white space (there won't be any) and then scanning a string of characters up to the next white space. The * suppresses an assignment; the scanned data won't be stored anywhere.
One of the major advantages of the fgets() plus sscanf() paradigm is that when you report the format error, you can report to the user the complete line of input that caused problems. If you use raw fscanf() or scanf(), you can only report on the first character that caused trouble, typically up to the end of the line, and then only if you write code to read that data. It is fiddlier (so the reporting is usually not very careful), and the available information is not as helpful to the user on those rare occasions when the reporting tries to be careful.
You need to change your format string to:
"%d %s %f"
The spaces are because you have spaces in your input data, the %s because you want to read a multi-character string at that point (%c only reads one character); don't worry though, as %s won't read past a space. You'll need to make sure you've got enough space in the target buffer to read the string, of course.
If you only want the first character of the second word, try:
"%d %c%s %f"
And add an extra (dummy) buffer to receive the string parsed by %s which you want to discard.
won't it be %s for string else it will only read a character with %c and then the float value might be affected.
try "%d %s %f"
%s won't help since it may read the float value itself. as far as I know, %c reads a single character. then it searches for a space that leads to problem. To scan the word, you can use a loop (terminated by a space ofcourse).

Resources