C programming language (scanf) - c

I have read strings with spaces in them using the following scanf() statement.
scanf("%[^\n]", &stringVariableName);
What is the meaning of the control string [^\n]?
Is is okay way to read strings with white space like this?

This mean "read anything until you find a '\n'"
This is OK, but would be better to do this "read anything until you find a '\n', or read more characters than my buffer support"
char stringVariableName[256] = {}
if (scanf("%255[^\n]", stringVariableName) == 1)
...
Edit: removed & from the argument, and check the result of scanf.

The format specifier "%[^\n]" instructs scanf() to read up to but not including the newline character. From the linked reference page:
matches a non-empty sequence of character from set of characters.
If the first character of the set is ^, then all characters not
in the set are matched. If the set begins with ] or ^] then the ]
character is also included into the set.
If the string is on a single line, fgets() is an alternative but the newline must be removed as fgets() writes it to the output buffer. fgets() also forces the programmer to specify the maximum number of characters that can be read into the buffer, making it less likely for a buffer overrun to occur:
char buffer[1024];
if (fgets(buffer, 1024, stdin))
{
/* Remove newline. */
char* nl = strrchr(buffer, '\n');
if (nl) *nl = '\0';
}
It is possible to specify the maximum number of characters to read via scanf():
scanf("%1023[^\n]", buffer);
but it is impossible to forget to do it for fgets() as the compiler will complain. Though, of course, the programmer could specify the wrong size but at least they are forced to consider it.

Technically, this can't be well defined.
Matches a nonempty sequence of characters from a set of expected
characters (the scanset).
If no l length modifier is present, the corresponding argument shall
be a pointer to the initial element of a character array large enough
to accept the sequence and a terminating null character, which will be
added automatically.
Supposing the declaration of stringVariableName looks like char stringVariableName[x];, then &stringVariableName is a char (*)[x];, not a char *. The type is wrong. The behaviour is undefined. It might work by coincidence, but anything that relies on coincidence doesn't work by my definition.
The only way to form a char * using &stringVariableName is if stringVariableName is a char! This implies that the character array is only large enough to accept a terminating null character. In the event where the user enters one or more characters before pressing enter, scanf would be writing beyond the end of the character array and invoking undefined behaviour. In the event where the user merely presses enter, the %[...] directive will fail and not even a '\0' will be written to your character array.
Now, with that all said and done, I'll assume you meant this: scanf("%[^\n]", stringVariableName); (note the omitted ampersand)
You really should be checking the return value!!
A %[ directive causes scanf to retrieve a sequence of characters consisting of those specified between the [ square brackets ]. A ^ at the beginning of the set indicates that the desired set contains all characters except for those between the brackets. Hence, %[^\n] tells scanf to read as many non-'\n' characters as it can, and store them into the array pointed to by the corresponding char *.
The '\n' will be left unread. This could cause problems. An empty field will result in a match failure. In this situation, it's possible that no data will be copied into your array (not even a terminating '\0' character). For this reason (and others), you really need to check the return value!
Which manual contains information about the return values of scanf? The scanf manual.

Other people have explained what %[^\n] means.
This is not an okay way to read strings. It is just as dangerous as the notoriously unsafe gets, and for the same reason: it has no idea how big the buffer at stringVariableName is.
The best way to read one full line from a file is getline, but not all C libraries have it. If you don't, you should use fgets, which knows how big the buffer is, and be aware that you might not get a complete line (if the line is too long for the buffer).

Reading from the man pages for scanf()...
[ Matches a non-empty sequence of characters from the
specified set of accepted characters; the next pointer must be a
pointer to char, and there must be enough room for all the characters
in the string, plus a terminating null byte. The usual skip of
leading white space is suppressed. The string is to be made up of
characters in (or not in) a particular set; the set is defined by the
characters between the open bracket [ character and a close bracket ]
character. The set excludes those characters if the first character
after the open bracket is a circumflex (^). To include a close
bracket in the set, make it the first character after the open bracket
or the circumflex; any other position will end the set. The hyphen
character - is also special; when placed between two other
characters, it adds all intervening characters to the set. To
include a hyphen, make it the last character before the final close
bracket. For instance, [^]0-9-] means the set "everything except
close bracket, zero through nine, and hyphen". The string ends with
the appearance of a character not in the (or, with a
circumflex, in) set or when the field width runs out.
In a nutshell, the [^\n] means that read everything from the string that is not a \n and store that in the matching pointer in the argument list.

Related

C - format specifier for scanf?

float lat, lon;
char info[50];
scanf("%f, %f, %49[^\n]", &lat, &lon, info);
In the above snippet, what kind of format specifier is %49[^\n].
I do understand that it is the format specifier for the character array which is going to accept input upto 49 characters (+ the sentinal \0), and [^\n] looks like its a regex (although I had read somewhere that scanf doesn't support regex) OR a character set which is to expand to "any other character" that is NOT "newline" \n. Am I correct?
Also, why is there no s in the format specifier for writing into array info?
The program this snippet is from works. But is this good C style?
The specifier %[ is a different conversion specifier from %s, even if it also must be paired with an argument of type char * (or wchar_t *). See e.g. the table here
[set] matches a non-empty sequence of character from set of characters.
If the first character of the set is ^, then all characters not in the set are matched. If the set begins with ] or ^] then the ] character is also included into the set. It is implementation-defined whether the character - in the non-initial position in the scanset may be indicating a range, as in [0-9]. If width specifier is used, matches only up to width. Always stores a null character in addition to the characters matched (so the argument array must have room for at least width+1 characters)
My apologies, I incorrectly answered below. If you can skip to the end, I'll give you the correct answer.
*** Incorrect Answer Begins ***
It would not be a proper format specifier, as there is no type.
%[parameter][flags][width][.precision][length]type
are the rules for a format statement. As youc an see, the type is non-optional. The author of this format item is thinking they can combine regex with printf, when the two have entirely different processing rules (and printf doesn't follow regex's patterns)
*** Correct Answer Begins ***
scanf uses different format string rules than printf Within scanf's man page is this addition to printf's rules
[
Matches a nonempty sequence of characters from the specified set
of accepted characters; the next pointer must be a pointer to char,
and there must be enough room for all the characters in the string,
plus a terminating null byte. The usual skip of leading white space is
suppressed. The string is to be made up of characters in (or not in) a
particular set; the set is defined by the characters between the open
bracket [ character and a close bracket ] character. The set excludes
those characters if the first character after the open bracket is a
circumflex (^). To include a close bracket in the set, make it the
first character after the open bracket or the circumflex; any other
position will end the set. The hyphen character - is also special;
when placed between two other characters, it adds all intervening
characters to the set. To include a hyphen, make it the last character
before the final close bracket. For instance, [^]0-9-] means the set
"everything except close bracket, zero through nine, and hyphen". The
string ends with the appearance of a character not in the (or, with a
circumflex, in) set or when the field width runs out.
Which basically means that scanf can scan with a subset of regex's rules (the character set subset) but not all of regex's rules

What is [^\n] in C? [duplicate]

I have run into some code and was wondering what the original developer was up to. Below is a simplified program using this pattern:
#include <stdio.h>
int main() {
char title[80] = "mytitle";
char title2[80] = "mayataiatale";
char mystring[80];
/* hugh ? */
sscanf(title,"%[^a]",mystring);
printf("%s\n",mystring); /* Output is "mytitle" */
/* hugh ? */
sscanf(title2,"%[^a]",mystring); /* Output is "m" */
printf("%s\n",mystring);
return 0;
}
The man page for scanf has relevant information, but I'm having trouble reading it. What is the purpose of using this sort of notation? What is it trying to accomplish?
The main reason for the character classes is so that the %s notation stops at the first white space character, even if you specify field lengths, and you quite often don't want it to. In that case, the character class notation can be extremely helpful.
Consider this code to read a line of up to 10 characters, discarding any excess, but keeping spaces:
#include <ctype.h>
#include <stdio.h>
int main(void)
{
char buffer[10+1] = "";
int rc;
while ((rc = scanf("%10[^\n]%*[^\n]", buffer)) >= 0)
{
int c = getchar();
printf("rc = %d\n", rc);
if (rc >= 0)
printf("buffer = <<%s>>\n", buffer);
buffer[0] = '\0';
}
printf("rc = %d\n", rc);
return(0);
}
This was actually example code for a discussion on comp.lang.c.moderated (circa June 2004) related to getline() variants.
At least some confusion reigns. The first format specifier, %10[^\n], reads up to 10 non-newline characters and they are assigned to buffer, along with a trailing null. The second format specifier, %*[^\n] contains the assignment suppression character (*) and reads zero or more remaining non-newline characters from the input. When the scanf() function completes, the input is pointing at the next newline character. The body of the loop reads and prints that character, so that when the loop restarts, the input is looking at the start of the next line. The process then repeats. If the line is shorter than 10 characters, then those characters are copied to buffer, and the 'zero or more non-newlines' format processes zero non-newlines.
The constructs like %[a] and %[^a] exist so that scanf() can be used as a kind of lexical analyzer. These are sort of like %s, but instead of collecting a span of as many "stringy" characters as possible, they collect just a span of characters as described by the character class. There might be cases where writing %[a-zA-Z0-9] might make sense, but I'm not sure I see a compelling use case for complementary classes with scanf().
IMHO, scanf() is simply not the right tool for this job. Every time I've set out to use one of its more powerful features, I've ended up eventually ripping it out and implementing the capability in a different way. In some cases that meant using lex to write a real lexical analyzer, but usually doing line at a time I/O and breaking it coarsely into tokens with strtok() before doing value conversion was sufficient.
Edit: I ended ripping out scanf() typically because when faced with users insisting on providing incorrect input, it just isn't good at helping the program give good feedback about the problem, and having an assembler print "Error, terminated." as its sole helpful error message was not going over well with my user. (Me, in that case.)
It's like character sets from regular expressions; [0-9] matches a string of digits, [^aeiou] matches anything that isn't a lowercase vowel, etc.
There are all sorts of uses, such as pulling out numbers, identifiers, chunks of whitespace, etc.
You can read about it in the ISO/IEC9899 standard available online.
Here is a paragraph I quote from the document about [ (Page 286):
Matches a nonempty sequence of characters from a set of expected
characters.
The conversion specifier includes all subsequent characters in the
format string, up to and including the matching right bracket (]). The
characters between the brackets (the scanlist) compose the scanset,
unless the character after the left bracket is a circumflex (^), in
which case the scanset contains all characters that do not appear in
the scanlist between the circumflex and the right bracket. If the
conversion specifier begins with [] or [^], the right bracket
character is in the scanlist and the next following right bracket
character is the matching right bracket that ends the specification;
otherwise the first following right bracket character is the one that
ends the specification. If a - character is in the scanlist and is not
the first, nor the second where the first character is a ^, nor the
last character, the behavior is implementation-defined.

How To Read in Strings that only Contain Alphabet letters with fscanf?

I have been struggling to figure out the fscanf formatting. I just want to read in a file of words delimited by spaces. And I want to discard any strings that contain non-alphabetic characters.
char temp_text[100];
while(fscanf(fcorpus, "%101[a-zA-Z]s", temp_text) == 1) {
printf("%s\n", temp_text);
}
I've tried the above code both with and without the 's'. I read in another stackoverflow thread that the s when used like that will be interpreted as a literal 's' and not as a string. Either way - when I include the s and when I do not include the s - I can only get the first word from the file I am reading through to print out.
The %[ scan specifier does not skip leading spaces. Either add a space before it or at the end in place of your s. Also you have your 100 and 101 backwards and thus a serious buffer overflow bug.
The s isn't needed.
Here are a few things to try:
Print out the return value from fscanf, and make sure it is 1.
Make sure that the fscanf is consuming the whitespace by using fgetc to get the next character and printing it out.

How can I read a string with spaces in it in C?

scanf("%s",str) won't do it. It will stop reading at the first space.
gets(str) doesn't work either when the string is large. Any ideas?
use fgets with STDIN as the file stream. Then you can specify the amount of data you want to read and where to put it.
char str[100];
Try this
scanf("%[^\n]s",str);
or this
fgets(str, sizeof str, stdin))
Create your own function to read a line. Here's what you basically have to do:
1. fgets into allocated (growable) memory
2. if it was a full line you're done
3. grow the array
4. fgets more characters into the newly allocated memory
5. goto 2.
The implementation may be a bit tricky :-)
You need to think about what you need to pass to your function (at the very least the address of the array and its size); and what the function returns when everything "works" or when there is an error. You need to decide what is an error (is a string 10Gbytes long with no '\n' an error?). You need to decide on how to grow the array.
Edit
Actually it may be better to fgetc rather than fgets
get a character
it it EOF? DONE
add to array (update length), possible growing it (update size)
is it '\n'? DONE
repeat
When do you want to stop reading? At EOF, at a specific character, or what?
You can read a specific number of characters with %c
c Matches a sequence of width
count characters (default 1); the next
pointer must be a pointer to char, and there must be enough room
for all the characters (no terminating NUL is added). The usual
skip of leading white space is suppressed. To skip white space
first, use an explicit space in the format.
You can read specific characters (or up to excluded ones) with %[
[ Matches a nonempty sequence of
characters from the specified set of
accepted characters; the next pointer must be a pointer to
char,
and there must be enough room for all the characters in the
string,
plus a terminating NUL character. The usual skip of leading
white
space is suppressed. The string is to be made up of characters
in
(or not in) a particular set; the set is defined by the
characters
between the open bracket [ character and a close bracket ]
charac-
ter. The set excludes those characters if the first
character
after the open bracket is a circumflex ^. To include a close
bracket in the set, make it the first character after the open
bracket or the circumflex; any other position will end the set.
The hyphen character - is also special; when placed between two
other characters, it adds all intervening characters to the set.
To include a hyphen, make it the last character before the final
close bracket. For instance, `[^]0-9-]' means the set
``everything
except close bracket, zero through nine, and hyphen''. The
string
ends with the appearance of a character not in the (or, with a
cir-
cumflex, in) set or when the field width runs out
To read string with space you can do as follows:
char name[30],ch;
i=1;
while((ch=getchar())!='\n')
{
name[i]=ch;
i++;
}
i++;
name[i]='\n';
printf("String is %s",name);

What is the purpose of using the [^ notation in scanf?

I have run into some code and was wondering what the original developer was up to. Below is a simplified program using this pattern:
#include <stdio.h>
int main() {
char title[80] = "mytitle";
char title2[80] = "mayataiatale";
char mystring[80];
/* hugh ? */
sscanf(title,"%[^a]",mystring);
printf("%s\n",mystring); /* Output is "mytitle" */
/* hugh ? */
sscanf(title2,"%[^a]",mystring); /* Output is "m" */
printf("%s\n",mystring);
return 0;
}
The man page for scanf has relevant information, but I'm having trouble reading it. What is the purpose of using this sort of notation? What is it trying to accomplish?
The main reason for the character classes is so that the %s notation stops at the first white space character, even if you specify field lengths, and you quite often don't want it to. In that case, the character class notation can be extremely helpful.
Consider this code to read a line of up to 10 characters, discarding any excess, but keeping spaces:
#include <ctype.h>
#include <stdio.h>
int main(void)
{
char buffer[10+1] = "";
int rc;
while ((rc = scanf("%10[^\n]%*[^\n]", buffer)) >= 0)
{
int c = getchar();
printf("rc = %d\n", rc);
if (rc >= 0)
printf("buffer = <<%s>>\n", buffer);
buffer[0] = '\0';
}
printf("rc = %d\n", rc);
return(0);
}
This was actually example code for a discussion on comp.lang.c.moderated (circa June 2004) related to getline() variants.
At least some confusion reigns. The first format specifier, %10[^\n], reads up to 10 non-newline characters and they are assigned to buffer, along with a trailing null. The second format specifier, %*[^\n] contains the assignment suppression character (*) and reads zero or more remaining non-newline characters from the input. When the scanf() function completes, the input is pointing at the next newline character. The body of the loop reads and prints that character, so that when the loop restarts, the input is looking at the start of the next line. The process then repeats. If the line is shorter than 10 characters, then those characters are copied to buffer, and the 'zero or more non-newlines' format processes zero non-newlines.
The constructs like %[a] and %[^a] exist so that scanf() can be used as a kind of lexical analyzer. These are sort of like %s, but instead of collecting a span of as many "stringy" characters as possible, they collect just a span of characters as described by the character class. There might be cases where writing %[a-zA-Z0-9] might make sense, but I'm not sure I see a compelling use case for complementary classes with scanf().
IMHO, scanf() is simply not the right tool for this job. Every time I've set out to use one of its more powerful features, I've ended up eventually ripping it out and implementing the capability in a different way. In some cases that meant using lex to write a real lexical analyzer, but usually doing line at a time I/O and breaking it coarsely into tokens with strtok() before doing value conversion was sufficient.
Edit: I ended ripping out scanf() typically because when faced with users insisting on providing incorrect input, it just isn't good at helping the program give good feedback about the problem, and having an assembler print "Error, terminated." as its sole helpful error message was not going over well with my user. (Me, in that case.)
It's like character sets from regular expressions; [0-9] matches a string of digits, [^aeiou] matches anything that isn't a lowercase vowel, etc.
There are all sorts of uses, such as pulling out numbers, identifiers, chunks of whitespace, etc.
You can read about it in the ISO/IEC9899 standard available online.
Here is a paragraph I quote from the document about [ (Page 286):
Matches a nonempty sequence of characters from a set of expected
characters.
The conversion specifier includes all subsequent characters in the
format string, up to and including the matching right bracket (]). The
characters between the brackets (the scanlist) compose the scanset,
unless the character after the left bracket is a circumflex (^), in
which case the scanset contains all characters that do not appear in
the scanlist between the circumflex and the right bracket. If the
conversion specifier begins with [] or [^], the right bracket
character is in the scanlist and the next following right bracket
character is the matching right bracket that ends the specification;
otherwise the first following right bracket character is the one that
ends the specification. If a - character is in the scanlist and is not
the first, nor the second where the first character is a ^, nor the
last character, the behavior is implementation-defined.

Resources