does sscanf() delete characters from stdin when %*[^|] delimiter used? - c

According to the man page for sscanf(), the * character
An optional '*' assignment-suppression character: scanf() reads input
as directed by the conversion specification, but discards the input.
From this, I (rightly?) assume that something like
sscanf(string,"%*[^|]%*c%[^|]%*c", vars)
Would take the input, "text|neededtext|", ignore all text before the first "|" (ie deleting it from stdin?), ignore (and delete from stdin) the next character, ie the "|" store the "neededtext" and then delete the final "|" character, leaving stdin empty?
If yes, then is it ever needed to run a cleanup function after this sscanf() call, to catch some weird exception just in case something goes wrong, or is the code above always guaranteed to work?
I have run some tests, and it appears that sscanf() does eat up all characters from stdin, but I just want to make sure.

Not always. See #2
A #Jonathan Leffler commented, sscanf() works with strings and does not affect stdin. Let us assume the question is about the *scanf() family instead.
Problems with *scanf(string,"%*[^|]%*c%[^|]%*c", vars)
Code does not check the result of *scanf(). This leads to subtle to detect problems. Best to code a check.
char vars[100];
if (sscanf(string,"%*[^|]%*c%99[^|]%*c", vars) != 1) Handle_Error();
"%*[^|]" scans nothing and stops the whole scan function if the leading character is '|'. "%*[^|]" scans 1 or more non-'|', not 0 or more. Hence the "Not always" above. With scanf(), no characters are consume - they remain in stdin.
The only possible character after "%*[^|]" to continue scanning is '|'. Might as well code "%*[^|]%|" than "%*[^|]%*c"
sscanf() differs from scanf() and fscanf() in that the later 2 can scan and save a null character '\0'. sscanf() stops scanning upon reaching the null character. IMO: Use of "%s" and "%[^...] with fscanf() scanf() should be avoided to prevent a hacker exploit. OP's fgets()+sscanf() is the better approach.
%[^|] lacks a width. Without this limit, buffer overrun is not prevented. See #1.
"%*c" at the end may or may not scan a character. The result value does not reflect its success or failure as scan directives with '*' do not contribute directly to the return value. If successful trailing scanning of "%*c" was needed (which could be "|" per point #3), suggest using "%n" to detect is scanning proceed that far.
char vars[100];
int n = 0;
sscanf(string,"%*[^|]|%99[^|]| %n", vars);
if (n > 0) Success();
// May also want to use
if (string[n]) Fail_ExtraTextOnLine();
In C, %*[^|] is a scanf specifier, not delimiter.

Related

scanf function for reading full line input using ^\n

What is the difference between
scanf(" %[^\n]", str_1);
and
scanf("%[^/n],str_2);
I am trying to read a full line as an input using scanf function, the first one works fine but as soon as I remove the space between " and % the function takes no input.
scanf(" %[^\n]", str_1);
This will skip whitespace (until it finds a non-whitespace character) and then start reading characters into str_1 until it reaches a newline or EOF. It will not do any bounds checking, so may overflow the storage for str_1. If an EOF is read before finding a non-whitespace character, it will return 0 without writing anything into str_1 (not even a NUL character).
scanf("%[^/n],str_2);
This will read characters other than / and n into str_2 until it sees either a / or n or an EOF. Like the first one, it does not do any bounds checking so may (will likely?) overflow the storage of str_2. If the first character of input is either a / or a n (or EOF) it will fail to match the pattern at all, returning 0 and not storing anything into str_2 (not even a NUL terminator).
Basic scanf semantics -- whitspace in the format string will skip 0 or more whitespace characters in the input until a non-whitespace character is reached to start whatever the next pattern is. %[ matches characters with no bounds checking, so should never be used with untrusted input -- you should always use an explicit bound or m modifier.
If you want to read lines of input (as opposed to whitespace delimited words, ignoring differences between newlines and other whitespace), you should use fgets or getline

What is the difference between these two scanf statements?

I am having some doubt. The doubt is
What is the difference between the following two scanf statements.
scanf("%s",buf);
scanf("%[^\n]", buf);
If I am giving the second scanf in the while loop, it is going infinitely. Because the \n is in the stdin.
But in the first statement, reads up to before the \n. It also will not read the \n.
But The first statement does not go in infinitely. Why?
Regarding the properties of the %s format specifier, quoting C11 standrad, chapter §7.21.6.2, fscanf()
s Matches a sequence of non-white-space characters.
The newline is a whitespace character, so only a newlinew won't be a match for %s.
So, in case the newline is left in the buffer, it does not scan the newline alone, and wait for the next non-whitespace input to appear on stdin.
The %s format specifier specifies that scanf() should read all characters in the standard input buffer stdin until it encounters the first whitespace character, and then stop there. The whitespace ('\n') remains in the stdin buffer until consumed by another function, like getchar().
In the second case there is no mention of stopping.
You can think of scanf as extracting words separated by whitespace from a stream of characters. Imagine reading a file which contains a table of numbers, for example, without worrying about the exact number count per line or the exact space count and nature between numbers.
Whitespace, for the record, is horizontal and vertical (these exist) tabs, carriage returns, newlines, form feeds and last not least, actual spaces.
In order to free the user from details, scanf treats all whitespace the same: It normally skips it until it hits a non-whitespace and then tries to convert the character sequence starting there according to the specified input conversion. E.g. with "%d" it expects a sequence of digits, perhaps preceded by a minus sign.
The input conversion "%s" also starts with skipping whitespace (and that's clearer documented in the opengroup's man page than in the Linux one).
After skipping leading whitespace, "%s" accepts everything until another whitespace is read (and put back in the input, because it isn't made part of the "word" being read). That sequence of non-whitespace chars -- basically a "word" -- is stored in the buffer provided. For example, scanning a string from " a bc " results in skipping 3 spaces and storing "a" in the buffer. (The next scanf would skip the intervening space and put "bc" in the buffer. The next scanf after that would skip the remaining whitespace, encounter the end of file and return EOF.) So if a user is asked to enter three words they could give three words on one line or on three lines or on any number of lines preceded or separated by any number of empty lines, i.e. any number of subsequent newlines. Scanf couldn't care less.
There are a few exceptions to the "skip leading whitespace" strategy. Both concern conversions which usually indicate that the user wants to have more control about the input conversion. One of them is "%c" which just reads the next character. The other one is the "%[" spec which details exactly which characters are considered part of the next "word" to read. The conversion specification you use, "%[^\n]", reads everything except newline. Input from the keyboard is normally passed to a program line by line, and each line is by definition terminated by a newline. The newline of the first line passed to your program will be the first character from the input stream which does not match the conversion specification. Scanf will read it, inspect it and then put it back in the input stream (with ungetc()) for somebody else to consume. Unfortunately, it will itself be the next consumer, in another loop iteration (as I assume). Now the very first character it encounters (the newline) does not match the input conversion (which demands anything but the newline). Scanf therefore gives up immediately, puts the offending character dutifully back in the input for somebody else to consume and returns 0 indicating the failure to even perfom the very first conversion in the format string. But alas, it itself will be the next consumer. Yes, machines are stupid.
First scanf("%s",buf); scan only word or string, but second scanf("%[^\n]", buf); reads a string until a user inputs is new line character.
Let's take a look at these two code snippets :
#include <stdio.h>
int main(void){
char sentence[20] = {'\0'};
scanf("%s", sentence);
printf("\n%s\n", sentence);
return 0;
}
Input : Hello, my name is Claudio.
Output : Hello
#include <stdio.h>
int main(void){
char sentence[20] = {'\0'};
scanf("%[^\n]", sentence);
printf("\n%s\n", sentence);
return 0;
}
Input : Hello, my name is Claudio.
Output : Hello, my name is Claudio.
%[^\n] is an inverted group scan and this is how I personally use it, as it allows me to input a sentece with blank spaces in it.
Common
Both expect buf to be a pointer to a character array. Both append a null character to that array if at least 1 character was saved. Both return 1 if something was saved. Both return EOF if end-of-file detected before saving anything. Both return EOF in input error is detected. Both may save buf with embedded '\0' characters in it.
scanf("%s",buf);
scanf("%[^\n]", buf);
Differences
"%s" 1) consumes and discards leading white-space including '\n', space, tab, etc. 2) then saves non-white-space to buf until 3) a white-space is detected (which is then put back into stdin). buf will not contain any white-space.
"%[^\n]" 1) does not consume and discards leading white-space. 2) it saves non-'\n' characters to buf until 3) a '\n' is detected (which is then put back into stdin). If the first character read is a '\n', then nothing is saved in buf and 0 is returned. The '\n' remains in stdin and explains OP's infinite loop.
Failure to test the return value of scanf() is a common code oversight. Better code checks the return value of scanf().
IMO: code should never use either:
Both fail to limit the number of characters read. Use fgets().
You can think of %s as %[^\n \t\f\r\v], that is, after skipping any leading whitespace, a group a non-whitespace characters.

scanf() behaviour for strings with more than one word

Well I've been programming in C for quite a while now, and there is this question about the function scanf()
here is my problem:
I know that every element in ASCII table is a character and I even know that %s is a data specified for a string which is a collection of characters
My questions:
1.why does scanf() stops scanning after we press enter. If enter is also character why cant it be added as a component of the string that is being scanned.
2.My second question and what I require the most is why does it stops scanning after a space, when space is again a character?
Note: My question is not about how to avoid these but how does this happen
I'd be happy if this is already addressed, I'd gladly delete my question and even if I've presumed something wrong please let me know
"why does scanf() stops scanning after we press enter." is not always true.
The "%s" directs scanf() as follows
char buffer[100];
scanf("%s", buffer);
Scan and consume all white-space including '\n' generated from multiple Enters. This data is not saved.
Input white-space characters (as specified by the isspace function) are skipped, unless the specification includes a [, c, or n specifier C11dr §7.21.6.2 8
Scan and save all non-white-space characters. Continue doing so until a white-space is encountered.
Matches a sequence of non-white-space characters §7.21.6.2 12
This white-space is put back into stdin for the next input function. (OP's 2nd question)
A null character is appended to buffer.
Operations may stop short if EOF occurs.
If too much data is save in buffer, it is UB.
If some non-white-space data is saved, return 1. If EOF encountered, return EOF.
Note: stdin is usually line buffered, so no keyboard data is given to stdin until a '\n' occurs.
From my reading of your question, both of your numbered questions are the same:
Why does scanf with a format specifier of %s stop reading after encountering a space or newline.
And the answer to both of your questions is: Because that is what scanf with the %s format specifier is documented to do.
From the documentation:
%s Matches a sequence of bytes that are not white-space characters.
A space and a newline character (generated by the enter key) are white-space characters.
I made miniprogram with scanf for get multiple name without stop on space or ever enter.
i use while
Scanf("%s",text);
While (1)
{
Scanf("%s",text1)
If (text1=='.'){break;}
//here i simple add text1 to text
}
This way i get one line if use the .
Now i use
scanf("%[^\n]",text);
It work great.

What is the behavior of %(limit)[^\n] in scanf ? It is safety from overflow?

The format %(limit)[^\n] for scanf function is unsafe ? (where (limit) is the length -1 of the string)
If it is unsafe, why ?
And there is a safe way to implement a function that catch strings just using scanf() ?
On Linux Programmer's Manual, (typing man scanf on terminal), the s format said:
Matches a sequence of non-white-space characters; the next pointer must be a pointer to character array that is long enough to hold the input sequence and the terminating null byte ('\0'),which is added automatically. The input string stops at white space or at the maximum field width, whichever occurs first.
The input string stops at maximum field width always ? Or is just on GCC ?
Thanks.
%(limit)[^\n] for scanf" is usually safe.
In the below example, at most 99 char will be read and saved into buf. If any char are saved, a '\0' will be appended and cnt will be 1.
char buf[100];
int cnt = scanf("%99[^\n]", buf);
This functionality is certainly safe, but what about others?
Problems occur when the input is a lone "\n".
In this case, nothing is saved in buf and 0 is returned. Had the next line of code been the following, the output is Undefined Behavior as buf is not initialized to anything.
puts(buf);
A better following line would be
if (cnt == 1) puts(buf);
else printf("Return count = %d\n", cnt);
Problems because the '\n' was not consumed.
The '\n' is still waiting to be read and another call to scanf("%99[^\n]", buf); will not read the '\n'.
Q: is a safe way to implement a function that catch strings just using scanf()
A: Pedantically: Not easily.
scanf(), fgets(), etc. are best used for reading text, not strings. In C a string is an array of char terminated with a '\0'. Input via scanf(), fgets(), etc. typically have issues reading '\0' and typically that char is not in the input anyways. Usually input is thought of as groups of char terminated by '\n' or other white-space.
If code is reading input terminated with '\n', using fgets() works well and is portable. fgets() too has it weakness that are handled in various ways . getline() is a nice alternative.
A close approximate would be scanf(" %99[^\n]", buf) (note the added " "), but alone that does not solve handing excessive long lines, reading multiple empty lines, embedded '\0' detection, loss of ability to report length read (strlen() does not work due to embedded '\0') and its leaving the trailing '\n' in stdin.
Short of using scanf("%c", &ch) with lots of surrounding code (which is silly, just use fgetc()) , I see no way to use a single scanf() absolutely safely when reading a line of user input.
Q: The input string stops at maximum field width always ?
A: With scanf("%99[^\n]", input stops 1) when a '\n' is encountered - the '\n' is not saved and remains in the file input buffer 2) 99 char have been read 3) EOF occurs or 4) IO error occurs (rare).
The [^\n] is to make scanf read input until it meets a new line character...while the limit is the maximum number of characters scanf should read...

how to read scanf with spaces

I'm having a weird problem
i'm trying to read a string from a console with scanf()
like this
scanf("%[^\n]",string1);
but it doesnt read anything. it just skips the entire scanf.
I'm trying it in gcc compiler
Trying to use scanf to read strings with spaces can bring unnecessary problems of buffer overflow and stray newlines staying in the input buffer to be read later. gets() is often suggested as a solution to this, however,
From the manpage:
Never use gets(). Because it is
impossible to tell without knowing the
data in advance how many characters
gets() will read, and because gets()
will continue to store characters past
the end of the buffer, it is extremely
dangerous to use. It has been used to
break computer security. Use fgets()
instead.
So instead of using gets, use fgets with the STDIN stream to read strings from the keyboard
That should work fine, so something else is going wrong. As hobbs suggests, you might have a newline on the input, in which case this won't match anything. It also won't consume a newline, so if you do this in a loop, the first call will get up to the newline and then the next call will get nothing. If you want to read the newline, you need another call, or use a space in the format string to skip whitespace. Its also a good idea to check the return value of scanf to see if it actually matched any format specifiers.
Also, you probably want to specify a maximum length in order to avoid overflowing the buffer. So you want something like:
char buffer[100];
if (scanf(" %99[^\n]", buffer) == 1) {
/* read something into buffer */
This will skip (ignore) any blank lines and whitespace on the beginning of a line and read up to 99 characters of input up to and not including a newline. Trailing or embedded whitespace will not be skipped, only leading whitespace.
I'll bet your scanf call is inside a loop. I'll bet it works the first time you call it. I'll bet it only fails on the second and later times.
The first time, it will read until it reaches a newline character. The newline character will remain unread. (Odds are that the library internally does read it and calls ungetc to unread it, but that doesn't matter, because from your program's point of view the newline is unread.)
The second time, it will read until it reaches a newline character. That newline character is still waiting at the front of the line and scanf will read all 0 of the characters that are waiting ahead of it.
The third time ... the same.
You probably want this:
if (scanf("%99[^\n]%*c", buffer) == 1) {
Edit: I accidentally copied and pasted from another answer instead of from the question, before inserting the %*c as intended. This resulting line of code will behave strangely if you have a line of input longer than 100 bytes, because the %*c will eat an ordinary byte instead of the newline.
However, notice how dangerous it would be to do this:
scanf("%[^n]%*c", string1);
because there, if you have a line of input longer than your buffer, the input will walk all over your other variables and stack and everything. This is called buffer overflow (even if the overflow goes onto the stack).
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char *text(int n);
int main()
{
char str[10];
printf("enter username : ");
scanf(text(9),str);
printf("username = %s",str);
return 0;
}
char *text(int n)
{
fflush(stdin);fflush(stdout);
char str[50]="%",buf[50],st2[10]="[^\n]s";
char *s;itoa(n,buf,10);
// n == -1 no buffer protection
if(n != -1) strcat(str,buf);
strcat(str,st2);s=strdup(str);
fflush(stdin);fflush(stdout);
return s;
}

Resources