formatted input using scanf() - c

I need to take a string as input, discard eveything that is not a space, hyphen or numbers. In other words I only want positive and negative integer numbers to be read in.
I'm not deadset on using scanf but I would prefer it.
What I've tried so far is:
char buffer[200];
scanf("%[0-9 ' ']*%c", buffer); /*this works perfectly, except the hyphen part*/
scanf("%[0-9 - ' ']%*c", buffer); /*no change*/
scanf("%[0-9 '-' ' ']%*c", buffer); /*still no change*/
Obviously only tried one of them at a time.
Grateful for any insight or help you can offer.

I'm surprised this works for you:
scanf("%[0-9 ' ']*%c", buffer);
^--- There's a typo, here. The * should probably be after the %.
Onto more pressing matters... In the C11 standard, at section §7.21.6.2p12, under the [ conversion specifier there are some sentences that explain:
If a - character is in the scanlist and is not the first, nor the
second where the first character is a ^, nor the last character, the
behavior is implementation-defined.
As a result, I would suggest two things:
You can't rely on 0-9 portably representing the range of characters 0123456789 within your scanset. You're better off explicitly stating 0123456789.
Your implementation isn't interpreting - as you'd like it to.
I suggest something like this:
assert(scanf("%[- 0123456789]%*c", buffer) == 1);
You need to check the return value. If scanf returns 0, or EOF, then you can't expect anything well-defined by using buffer. I put the - at the start of the scanset, so that it's well defined, and expanded your 0-9 to 0123456789 as mentioned earlier.
I hope that helps :)

Related

Why there should not be type specifier like s or c after [0-9A-Z^%]?

For example consider the following code -
fscanf(fp,"%d:%d:%[^:]:%[^\n]\n",&pow->no,&pow->seen,pow->word,pow->means);
printf("\ntthis is what i read--\n%d:%d:%s:%s:\n",pow->no,pow->seen,pow->word,pow->means);
here pow is pointer to an object declared before,
when I put s as in fscanf(fp,"%d:%d:%[^:]s:%[^\n]\n" the 3rd one is read but not the last one
output is --
4:0:Abridge::
but when i do fscanf(fp,"%d:%d:%[^:]:%[^\n]s\n" all are read
output is --
4:0:Abridge:To condense:
AND without s anywhere fscanf(fp,"%d:%d:%[^:]:%[^\n]\n" all are read
output is --
`4:0:Abridge:To condense:
WHY??
To answer your question what is the meaning of %[^\n]s there are two format specifier one is [] and another is s.
Now the first one will scan anything other than \n and then it gets a \n and keeps it in stdin. And move on. But it doesn't stop here - it basically then tries to find a match for the letter s. In case it doesn't find it - it fails. (The explanation with %[^:]s will be same as this one).
Now decide if this is what you really want.[^\n] is the right one which will scan until \n is found (and yes it doesn't skip whitespace like %s do). scanset covers the letter including s also. And more than that %[^\n]s is self contradictory. So no use of it either.
%d:%d:%[^:]s:%[^\n]
%d - Matches an optionally signed decimal integer. (Ignore whitespace)
: - Then looks for ':'
%d - Matches an optionally signed decimal integer. (Ignore whitespace)
: - Then looks for ':'
%[^:] - No white space ignored - everything is taken into input except `:`
':' is unread.
s - Tries to match 's'. No white space ignored.
%[^\n] - Everything except '\n' inputted. `\n` left unread.
The specifier IS "%[]", you don't need the "s" there.
Read the manual page for scanf()
Your format string doesn't match the input because you the "s" is not part of the specifier and it's not present in the input where the format is expecting it.
By reading the documentation in the link above, you will find out — if you don't already know — that you should also check the return value of scanf() before calling printf() or otherwise your code will invoke undefined behavior, because some of the passed pointers don't get initialized.

take a specific number from a txt file in c program

I have this .txt file that contains only:
THN1234 54
How can I take only the number 54, to isolate it from the rest and to use it as an integer variable in my program?
If the input is from standard input, then you could use:
int value;
if (scanf("%*s %d", &value) != 1)
…Oops - incorrectly formatted data…
…use value…
The %*s reads but discards optional leading blanks and a sequence of one or more non-blanks (THN1234); the blank skips more optional blanks; the %d reads the integer, leaving a newline behind in the input buffer. If what follows the blank is not convertible to a number, or if you get EOF, you get to detect it in the if condition and report it in the body of the if.
Hmmm…and I see that BLUEPIXY said basically the same (minus the explanation) in their comment, even down to the choice of integer variable name.
Wow. It's been a long time since I have used C. However, I think the answer is similar for C and C++ in this case. You can use strtok_r to split the string into tokens then take the second token and parse it into an int. See http://www.cplusplus.com/reference/clibrary/cstring/strtok/.
You might also want to look at this question as well.

What is [^\n] in C? [duplicate]

I have run into some code and was wondering what the original developer was up to. Below is a simplified program using this pattern:
#include <stdio.h>
int main() {
char title[80] = "mytitle";
char title2[80] = "mayataiatale";
char mystring[80];
/* hugh ? */
sscanf(title,"%[^a]",mystring);
printf("%s\n",mystring); /* Output is "mytitle" */
/* hugh ? */
sscanf(title2,"%[^a]",mystring); /* Output is "m" */
printf("%s\n",mystring);
return 0;
}
The man page for scanf has relevant information, but I'm having trouble reading it. What is the purpose of using this sort of notation? What is it trying to accomplish?
The main reason for the character classes is so that the %s notation stops at the first white space character, even if you specify field lengths, and you quite often don't want it to. In that case, the character class notation can be extremely helpful.
Consider this code to read a line of up to 10 characters, discarding any excess, but keeping spaces:
#include <ctype.h>
#include <stdio.h>
int main(void)
{
char buffer[10+1] = "";
int rc;
while ((rc = scanf("%10[^\n]%*[^\n]", buffer)) >= 0)
{
int c = getchar();
printf("rc = %d\n", rc);
if (rc >= 0)
printf("buffer = <<%s>>\n", buffer);
buffer[0] = '\0';
}
printf("rc = %d\n", rc);
return(0);
}
This was actually example code for a discussion on comp.lang.c.moderated (circa June 2004) related to getline() variants.
At least some confusion reigns. The first format specifier, %10[^\n], reads up to 10 non-newline characters and they are assigned to buffer, along with a trailing null. The second format specifier, %*[^\n] contains the assignment suppression character (*) and reads zero or more remaining non-newline characters from the input. When the scanf() function completes, the input is pointing at the next newline character. The body of the loop reads and prints that character, so that when the loop restarts, the input is looking at the start of the next line. The process then repeats. If the line is shorter than 10 characters, then those characters are copied to buffer, and the 'zero or more non-newlines' format processes zero non-newlines.
The constructs like %[a] and %[^a] exist so that scanf() can be used as a kind of lexical analyzer. These are sort of like %s, but instead of collecting a span of as many "stringy" characters as possible, they collect just a span of characters as described by the character class. There might be cases where writing %[a-zA-Z0-9] might make sense, but I'm not sure I see a compelling use case for complementary classes with scanf().
IMHO, scanf() is simply not the right tool for this job. Every time I've set out to use one of its more powerful features, I've ended up eventually ripping it out and implementing the capability in a different way. In some cases that meant using lex to write a real lexical analyzer, but usually doing line at a time I/O and breaking it coarsely into tokens with strtok() before doing value conversion was sufficient.
Edit: I ended ripping out scanf() typically because when faced with users insisting on providing incorrect input, it just isn't good at helping the program give good feedback about the problem, and having an assembler print "Error, terminated." as its sole helpful error message was not going over well with my user. (Me, in that case.)
It's like character sets from regular expressions; [0-9] matches a string of digits, [^aeiou] matches anything that isn't a lowercase vowel, etc.
There are all sorts of uses, such as pulling out numbers, identifiers, chunks of whitespace, etc.
You can read about it in the ISO/IEC9899 standard available online.
Here is a paragraph I quote from the document about [ (Page 286):
Matches a nonempty sequence of characters from a set of expected
characters.
The conversion specifier includes all subsequent characters in the
format string, up to and including the matching right bracket (]). The
characters between the brackets (the scanlist) compose the scanset,
unless the character after the left bracket is a circumflex (^), in
which case the scanset contains all characters that do not appear in
the scanlist between the circumflex and the right bracket. If the
conversion specifier begins with [] or [^], the right bracket
character is in the scanlist and the next following right bracket
character is the matching right bracket that ends the specification;
otherwise the first following right bracket character is the one that
ends the specification. If a - character is in the scanlist and is not
the first, nor the second where the first character is a ^, nor the
last character, the behavior is implementation-defined.

reading the remainder of a string with sscanf

I'm trying to read a string which consists of a set of numbers followed by a string, wrapped with some other basic text.
In other words, the format of the line is something like this:
Stuff<5,10,-5,8,"Test string here.">
Naively, I tried:
sscanf(str,"Stuff<%d,%d,%d,%d,\"%s\">",&i1,&i2,&i3,&i4,str2);
But after some research I discovered %s is supposed to stop parsing when it gets to a whitespace character. I found this question, but none of the answers addresses the problem I have: the string could contain any character in it, including newline characters and properly escaped quotes. The latter is not a problem, if I can just get sscanf to put everything after the first quote in the pre-allocated buffer I provide, I can strip the end off myself.
But how do I do this? I can't use %[] because it requires something in it to terminate the string, and the only thing I want to terminate it is the null terminator. So I thought, "Hey, I'll just use the null terminator!" But %[\0] made the compiler grumpy:
warning: no closing ‘]’ for ‘%[’ format
warning: embedded ‘\0’ in format
warning: no closing ‘]’ for ‘%[’ format
warning: embedded ‘\0’ in format
Using something like %*c won't work either, because I don't know exactly how many characters need to be taken. I tried passing strlen(str) since it will be less than that, but sscanf returns 4 and nothing is put into str2, suggesting that perhaps because the length was too long it gave up and didn't bother.
Update: I guess I could do something like:
sscanf(str,"Stuff<%d,%d,%d,%d,\"%n",&i1,&i2,&i3,&i4,&n);
str2 = str+n;
Your update seems to be a good answer. I was going to suggest strchr to find the location of the first quote char, after using sscanf to get i1 thru i4. Side note, you should always check the return value from sscanf to make sure that the conversions worked. This is even more important with your suggested answer, since n will be left uninitialized if the first four conversions aren't successful.
Scan for '\"', then for everything not '\"', then '\"' again.
Be sure to check sscanf() result and limit how long the test string may be.
char test_string[100];
int n = 0;
if (sscanf(str, "Stuff<%d,%d,%d,%d, \"%99[^\"]\"> %n",
&i1, &i2, &i3, &i4, test_string, &n) == 5 && str[n] == '\0') Good();
Your attempt using "...%[\0]...", from sscanf() point-of-view, is "...%[".
Everything in the format from "\0" on is ignored.
Using the int n = 0, appending " %n" to the format string, appending &n to the parameters and checking str[n] == '\0' is a neat trick with sscanf() to insure the entire line parsed correctly. Note: "%n" does not add to sscanf() result.
This is not the only way to achieve what you want to achieve, but probably the neatest way to do it: You'll need to use the scansets. I won't tell you the solution directly with this answer, I'll explain how to use scansets as far as I know them, and you'll hopefully be able to do it yourself.
Scansets %[...] are like %s when it comes to assignment, they interpret values as characters and store them into character arrays. %s is whitespace-terminated, %[...] is the flexible version of that.
There are two ways of using the scanset, first one being without a preceding caret ^, second one being with a preceding caret ^.
When you use scanset without the preceding caret ^, the characters you put inside the brackets will be the only ones that will be read, stored and then left behind. As soon as scanf encounters a non-matching character, that %[...] will be over. For example:
// input: asdasdasdwasdasd
char s[100] = { 0 };
scanf( "%[das]", s );
printf( "%s", s );
// output: asdasdasd
When you use scanset with the preceding caret ^, the search is inversed. It reads, stores and leaves behind every character until it reaches any one of the characters that you've put down after the preceding caret ^. Example:
// input: abcdefgh^kekQ
char s[100] = { 0 };
scanf( "%[^Q^]", s );
printf( "%s", s );
// output: abcdefgh
Beware, remaining characters is still to be read inside the stream, the file pointer won't get beyond the character which caused termination. I.e. for the first one, getchar( ); would give a 'w', and for the second one it would give a '^'.
I hope this will be enough. If you still cannot find your way out, ask away, I can give you a solution.

What is the purpose of using the [^ notation in scanf?

I have run into some code and was wondering what the original developer was up to. Below is a simplified program using this pattern:
#include <stdio.h>
int main() {
char title[80] = "mytitle";
char title2[80] = "mayataiatale";
char mystring[80];
/* hugh ? */
sscanf(title,"%[^a]",mystring);
printf("%s\n",mystring); /* Output is "mytitle" */
/* hugh ? */
sscanf(title2,"%[^a]",mystring); /* Output is "m" */
printf("%s\n",mystring);
return 0;
}
The man page for scanf has relevant information, but I'm having trouble reading it. What is the purpose of using this sort of notation? What is it trying to accomplish?
The main reason for the character classes is so that the %s notation stops at the first white space character, even if you specify field lengths, and you quite often don't want it to. In that case, the character class notation can be extremely helpful.
Consider this code to read a line of up to 10 characters, discarding any excess, but keeping spaces:
#include <ctype.h>
#include <stdio.h>
int main(void)
{
char buffer[10+1] = "";
int rc;
while ((rc = scanf("%10[^\n]%*[^\n]", buffer)) >= 0)
{
int c = getchar();
printf("rc = %d\n", rc);
if (rc >= 0)
printf("buffer = <<%s>>\n", buffer);
buffer[0] = '\0';
}
printf("rc = %d\n", rc);
return(0);
}
This was actually example code for a discussion on comp.lang.c.moderated (circa June 2004) related to getline() variants.
At least some confusion reigns. The first format specifier, %10[^\n], reads up to 10 non-newline characters and they are assigned to buffer, along with a trailing null. The second format specifier, %*[^\n] contains the assignment suppression character (*) and reads zero or more remaining non-newline characters from the input. When the scanf() function completes, the input is pointing at the next newline character. The body of the loop reads and prints that character, so that when the loop restarts, the input is looking at the start of the next line. The process then repeats. If the line is shorter than 10 characters, then those characters are copied to buffer, and the 'zero or more non-newlines' format processes zero non-newlines.
The constructs like %[a] and %[^a] exist so that scanf() can be used as a kind of lexical analyzer. These are sort of like %s, but instead of collecting a span of as many "stringy" characters as possible, they collect just a span of characters as described by the character class. There might be cases where writing %[a-zA-Z0-9] might make sense, but I'm not sure I see a compelling use case for complementary classes with scanf().
IMHO, scanf() is simply not the right tool for this job. Every time I've set out to use one of its more powerful features, I've ended up eventually ripping it out and implementing the capability in a different way. In some cases that meant using lex to write a real lexical analyzer, but usually doing line at a time I/O and breaking it coarsely into tokens with strtok() before doing value conversion was sufficient.
Edit: I ended ripping out scanf() typically because when faced with users insisting on providing incorrect input, it just isn't good at helping the program give good feedback about the problem, and having an assembler print "Error, terminated." as its sole helpful error message was not going over well with my user. (Me, in that case.)
It's like character sets from regular expressions; [0-9] matches a string of digits, [^aeiou] matches anything that isn't a lowercase vowel, etc.
There are all sorts of uses, such as pulling out numbers, identifiers, chunks of whitespace, etc.
You can read about it in the ISO/IEC9899 standard available online.
Here is a paragraph I quote from the document about [ (Page 286):
Matches a nonempty sequence of characters from a set of expected
characters.
The conversion specifier includes all subsequent characters in the
format string, up to and including the matching right bracket (]). The
characters between the brackets (the scanlist) compose the scanset,
unless the character after the left bracket is a circumflex (^), in
which case the scanset contains all characters that do not appear in
the scanlist between the circumflex and the right bracket. If the
conversion specifier begins with [] or [^], the right bracket
character is in the scanlist and the next following right bracket
character is the matching right bracket that ends the specification;
otherwise the first following right bracket character is the one that
ends the specification. If a - character is in the scanlist and is not
the first, nor the second where the first character is a ^, nor the
last character, the behavior is implementation-defined.

Resources