sscanf function usage in c - c

I'm trying to parse xxxxxx(xxxxx) format string using sscanf as following:
sscanf(command, "%s(%s)", part1, part2)
but it seems like sscanf does not support this format and as a result, part1 actually contains the whole string.
anyone has experience with this please share...
Thank you

Converting your code into a program:
#include <stdio.h>
int main(void)
{
char part1[32];
char part2[32];
char command[32] = "xxxxx(yyyy)";
int n;
if ((n = sscanf(command, "%s(%s)", part1, part2)) != 2)
printf("Problem! n = %d\n", n);
else
printf("Part1 = <<%s>>; Part2 = <<%s>>\n", part1, part2);
return 0;
}
When run, it produces 'Problem! n = 1'.
This is because the first %s conversion specifier skips leading white space and then scans for 'non white-space' characters up to the next white space character (or, in this case, end of string).
You would need to use (negated) character classes or scansets to get the result you want:
#include <stdio.h>
int main(void)
{
char part1[32];
char part2[32];
char command[32] = "xxxxx(yyyy)";
int n;
if ((n = sscanf(command, "%31[^(](%31[^)])", part1, part2)) != 2)
printf("Problem! n = %d\n", n);
else
printf("Part1 = <<%s>>; Part2 = <<%s>>\n", part1, part2);
return 0;
}
This produces:
Part1 = <<xxxxx>>; Part2 = <<yyyy>>
Note the 31's in the format; they prevent overflows.
I'm wondering how does %31 works. Does it work as %s and prevent overflow or does it just prevent overflow?
With the given data, these two lines are equivalent and both safe enough:
if ((n = sscanf(command, "%31[^(](%31[^)])", part1, part2)) != 2)
if ((n = sscanf(command, "%[^(](%[^)])", part1, part2)) != 2)
The %[...] notation is a conversion specification; so is %31[...].
The C standard says:
Each conversion specification is introduced by the character %.
After the %, the following appear in sequence:
An optional assignment-suppressing character *.
An optional decimal integer greater than zero that specifies the maximum field width
(in characters).
An optional length modifier that specifies the size of the receiving object.
A conversion specifier character that specifies the type of conversion to be applied.
The 31 is an example of the (optional) maximum field width. The [...] part is a scanset, which could perhaps be regarded as a special case of the s conversion specifier. The %s conversion specifier is approximately equivalent to %[^ \t\n].
The 31 is one less than the length of the string; the null at the end is not counted in that length. Since part1 and part2 are each an array of 32 char, the %31[^(] or %31[^)] conversion specifiers prevent buffer overflows. If the first string of characters was more than 31 characters before the (, you'd get a return value of 1 because of a mismatch on the literal open parenthesis. Similarly, the second string would be limited to 31 characters, but you'd not easily be able to tell whether the ) was in the correct place or not.

If you know exactly how long are the parts of your "command", then the simplest option is:
sscanf(command, "%6s(%5s)", part1, part2);
This assumes that 'part1' is always 6 characters long and 'part2' is always 5 characters long (as in your code sample).

Try this instead:
#include <stdio.h>
int main(void)
{
char str1[20];
char str2[20];
sscanf("Hello(World!)", "%[^(](%[^)])", str1, str2);
printf("str1=\"%s\", str2=\"%s\"\n", str1, str2);
return 0;
}
Output (ideone):
str1="Hello", str2="World!"

Related

Detecting mismatches against constants in scanf format string

From the man page of scanf:
A directive is one of the following:
A sequence of white-space characters (space, tab, newline, etc.; see isspace(3)). This directive matches any amount of
white space, including none, in the input.
An ordinary character (i.e., one other than white space or '%'). This character must exactly match the next character of
input. (emphasis mine)
A conversion specification, which commences with a '%' (percent) character. A sequence of characters from the input is
converted according to this specification, and the result is placed in
the corresponding pointer
argument. If the next item of input does not match the conversion specification, the conversion fails—this is a matching
failure.
Now, consider the following code:
#include <stdio.h>
int main(void)
{
const char* fmt = "A %49s B";
char buf[50];
printf("%d\n", sscanf("A foo B", fmt, buf)); // 1
printf("%d\n", sscanf("blah blaaah blah", fmt, buf)); // 0
printf("%d\n", sscanf("A blah blah", fmt, buf)); // 1
return 0;
}
Lines 1 and 3 print 1 because matching "A" with "A" succeeds, as does matching "foo"/"blah" with %s. Line 2 prints 0 because "A" cannot be matched with "blah", so parsing stops there.
This is all fine and logical, but is there any way for me to detect that a matching failure occurred after all conversion specifications have been successfully matched and assigned? In that case, the value returned by scanf will be the number of conversion specifiers in my format string, so I can't use it to tell if matching succeeded till the very end.
In other words: the string fed to sscanf in line 3 is not "valid" in the sense that it's not in the format A [something] B. Can I use scanf to detect this, or is strtok my only option?
Employ a " %n" at the end of the format.
Directives:
" " scans 0 or more white-space. It does not fail.
"%n" saves the count of the number of characters parsed so far (as an int). It does not fail.
Set n to 0 and test to see that it changed. The change would only happen if the entire preceding format succeeded. Also test that the scan ended on a null character - thus detecting trail unwanted text.
The added " ", though optional, if very useful as typically a trailing white-space, which is often a '\n', is not offensive. It negates the needed for a scanned line of text to be preprocessed to have its line ending removed.
#include <stdio.h>
void test(const char *s) {
const char* fmt = "A %49s B %n";
char buf[50];
int n = 0;
int cnt = sscanf(s, fmt, buf, &n);
int success = n > 0 && s[n] == '\0';
printf("sscanf():%2d n:%2d success:%d '%s'\n", cnt, n, success, s);
}
int main(void) {
test("A foo B");
test("blah blaaah blah");
test("A blah blah");
test("A foo B ");
test("A foo B x");
test("");
return 0;
}
Output
sscanf(): 1 n: 7 success:1 'A foo B'
sscanf(): 0 n: 0 success:0 'blah blaaah blah'
sscanf(): 1 n: 0 success:0 'A blah blah'
sscanf(): 1 n: 8 success:1 'A foo B '
sscanf(): 1 n: 8 success:0 'A foo B x'
sscanf():-1 n: 0 success:0 ''
Note that success is determined by n alone. On lack of success, the destination scanned variables like buf should not be used. If a partial result is needed, then use the return value of sscanf().
If you want to parse more complex input, use a proper parser/lexxer. Otherwise, have a look at the %n conversion specifier:
No input is consumed. The corresponding argument shall be a pointer to signed integer into which is to be written the number of characters read from the input stream so far by this call to the fscanf function. Execution of a %n directive does not increment the assignment count returned at the completion of execution of the fscanf function. No argument is converted, but one is consumed. If the conversion specification includes an assignment- suppressing character or a field width, the behavior is undefined.
You can use this multiple times: after the last variable conversion and one at the end.
For OP's use case, regex can equally be used to match the pattern.
/* see http://linux.die.net/man/3/regex */
#include <stdio.h>
#include <stdlib.h>
#include <regex.h>
int regexp(const char *);
int main(int argc, char **argv){
printf("%d\n", regexp("A foo B"));
printf("%d\n", regexp("blah blaaah blah"));
printf("%d\n", regexp("A blah blah"));
return EXIT_SUCCESS;
}
int regexp(const char *input_str){
char buf[100];
regex_t regex;
int rcval;
/* compile regexp - see http://linux.die.net/man/3/regcomp */
rcval = regcomp(&regex, "^A\\s.*\\sB$", 0);
if (rcval) {
fprintf(stderr, "Could not compile regex\n");
return -1;
}
/* execute regexp - see http://linux.die.net/man/3/regexec */
rcval = regexec(&regex, input_str, 0, NULL, 0);
if (!rcval) {
fprintf(stdout, "Match\n");
regfree(&regex);
return 1;
}else{
if (rcval == REG_NOMATCH) {
fprintf(stdout, "No match\n");
regfree(&regex);
return 0;
}else{
regerror(rcval, &regex, msgbuf, sizeof(msgbuf));
fprintf(stderr, "Regex match failed: %s\n", msgbuf);
regfree(&regex);
return -1;
}
}
return 0; // default to no match
}
If you are only interested in whether the entire format has been scanned, but do not care about trailing text in the stream, you can use this terse method:
int success = 0;
fscanf(stream, "A %49s B%n", buf, &success);
if(!success) handleError();
The trick is, that the "%n" conversion does not happen unless the entire format is matched successfully. Since any matching of the format will consume one or more characters from the stream, the value that the "%n" writes will always be non-zero. Thus, success will be set to a truthy value if, and only if the entire format was matched.

My function goes over the length of string

I am trying to make function that compares all the letters from alphabet to string I insert, and prints letters I didn't use. But when I print those letters it goes over and gives me random symbols at end. Here is link to function, how I call the function and result: http://imgur.com/WJRZvqD,U6Z861j,PXCQa4V#0
Here is code: (http://pastebin.com/fCyzFVAF)
void getAvailableLetters(char lettersGuessed[], char availableLetters[])
{
char alphabet[]={'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z'};
int LG,LG2,LA=0;
for (LG=0;LG<=strlen(alphabet)-1;LG++)
{
for(LG2=0;LG2<=strlen(lettersGuessed)-1;LG2++)
{
if (alphabet[LG]==lettersGuessed[LG2])
{
break;
}
else if(alphabet[LG]!=lettersGuessed[LG2] &&LG2==strlen(lettersGuessed)-1)
{
availableLetters[LA]=alphabet[LG];
LA++;
}
}
}
}
Here is program to call the function:
#include <stdio.h>
#include <string.h>
#include "hangman.c"
int main()
{
int i = 0;
char result[30];
char text[30];
scanf("%s", text);
while(i != strlen(text))
{
i++;
}
getAvailableLetters(text, result);
printf("%s\n", result);
printf ("%d", i);
printf ("\n");
}
Here is result when I typed in abcd: efghijklmnopqrstuvwxyzUw▒ˉ
If you want to print result as a string, you need to include a terminating null at the end of it (that's how printf knows when to stop).
for %s printf stops printing when it reaches a null character '\0', because %s expects the string to be null terminated, but result not null terminated and that's why you get random symbols at the end
just add availableLetters[LA] = '\0' at the last line in the function getAvailableLetters
http://pastebin.com/fCyzFVAF
Make sure your string is NULL-terminated (e.g. has a '\0' character at the end). And that also implies ensuring the buffer that holds the string is large enough to contain the null terminator.
Sometimes one thinks they've got a null terminated string but the string has overflowed the boundary in memory and truncated away the null-terminator. That's a reason you always want to use the form of functions (not applicable in this case) that read data, like, for example, sprintf() which should be calling snprintf() instead, and any other functions that can write into a buffer to be the form that let's you explicitly limit the length, so you don't get seriously hacked with a virus or exploit.
char alphabet[]={'a','b','c', ... ,'x','y','z'}; is not a string. It is simply an "array 26 of char".
In C, "A string is a contiguous sequence of characters terminated by and including the first null character. ...". C11 §7.1.1 1
strlen(alphabet) expects a string. Since code did not provide a string, the result is undefined.
To fix, insure alphabet is a string.
char alphabet[]={'a','b','c', ... ,'x','y','z', 0};
// or
char alphabet[]={"abc...xyz"}; // compiler appends a \0
Now alphabet is "array 27 of char" and also a string.
2nd issue: for(LG2=0;LG2<=strlen(lettersGuessed)-1;LG2++) has 2 problems.
1) Each time through the loop, code recalculates the length of the string. Better to calculate the string length once since the string length does not change within the loop.
size_t len = strlen(lettersGuessed);
for (LG2 = 0; LG2 <= len - 1; LG2++)
2) strlen() returns the type size_t. This is some unsigned integer type. Should lettersGuessed have a length of 0 (it might have been ""), the string length - 1 is not -1, but some very large number as unsigned arithmetic "wraps around" and the loop may never stop. A simple solution follows. This solution would only fail is the length of the string exceeded INT_MAX.
int len = (int) strlen(lettersGuessed);
for (LG2 = 0; LG2 <= len - 1; LG2++)
A solution without this limitation would use size_t throughout.
size_t LG2;
size_t len = strlen(lettersGuessed);
for (LG2 = 0; LG2 < len; LG2++)

Printing specific character from a string in C

I'm working on prefixing of a string for example : com should give me c co com.
I know to print a character in this was printf("%.5s",string) to print the first five values. I want to do this in a loop instead of 5 how can I replace it with i which is a incrementing value,something like this printf("%.is",string). how can I obtain this?
In printf format specifiers, all field widths (before the dot) and precisions (after the dot) can be given as asterisk *. For each asterisk, there must be one additional int argument before the printed object.
So, for your problem:
printf("%.*s", i, string);
Note that the additional parameter must be an int, so if you have another integer type, you should cast it:
size_t len = strlen(len);
if (len > 2) printf("%.*s", (int) (len - 2), string);
This is the simplest way of achieving what you want.
printf("%.*s\n", i, string);
If you want to generate the format string, you can do it too
char format[100]; /* the size should be estimated by you */
snprintf(format, sizeof(format), "%%.%ds", i);
printf(format, string)
check the snprintf() return value to ensure that the string was not truncated, if you choos a reasonable size for the format string it will be unlikely, but you should check anyway.
Above, the format specifier means
A literal "%"
Then a "."
Then the integer "%d"
Then the letter "s"
so the resulting string will be the format string you need to pass to printf().
Try this:
char s[] = "com";
for(size_t i = 1; i <= strlen(s); i++)
{
for(int j = 0; j < i; j++)
printf("%c", s[j]);
printf(" ");
}

Input/Output scanset in c

#include<stdio.h>
int main()
{
char str[50]={'\0'};
scanf("%[A-Z]s",str);
printf("%s",str);
return 0;
}
1)
Input:
helloWORLD
output:
2)
Input:
HELLoworlD
output:
HELL
In output 1, i expected the output as "WORLD" but it didnt give any outout.
From output 2, i understood that this is working only if the first few characters are in upper case.
Can you please explain how it actually works?
Interpretation of scansets
When it is given helloWORLD, the conversion specification %[A-Z] fails immediately because the h is not an upper-case letter. Therefore, scanf() returns 0, indicating that it did not successfully convert anything. If you tested the return value, you'd know that.
When it is given HELLoworlD, the scanset matches the HELL and stops at the first o. The format string also attempts to match a literal s, but there's no way for scanf() to report that it fails to match that after matching HELL.
Buffer overflow
Note that %[A-Z] is in general dangerous (as is %s) because there is no constraint on the number of characters read. If you have:
char str[50];
then you should use:
if (scanf("%49[A-Z]", str) != 1)
...some problem in the scan...
Also note that there is a 'difference by one' between the declared length of str and the number in the format string. This is awkward; there's no way to provide that number as an argument to scanf() separate from the format string (unlike printf()), so you may end up creating the format string on the fly:
int scan_upper(char *buffer, size_t buflen)
{
char format[16];
if (buflen < 2)
return EOF; // Or other error indication
snprintf(format, sizeof(format), "%%%zu[A-Z]", buflen-1); // Check this too!?
return scanf(format, buffer);
}
When you do
scanf("%[A-Z]s",str);
It takes input as long as you enter upper-case letters.
And since you set all the array to '\0', printf() will stop printing when it meets one.
Therefore, the first input is blank, and the second is printing until the end of the upper-case string.

What does the n stand for in `sscanf(s, "%d %n", &i, &n)`?

The man page states that the signature of sscanf is
sscanf(const char *restrict s, const char *restrict format, ...);
I have seen an answer on SO where a function in which sscanf is used like this to check if an input was an integer.
bool is_int(char const* s) {
int n;
int i;
return sscanf(s, "%d %n", &i, &n) == 1 && !s[n];
}
Looking at !s[n] it seems to suggest that we check if sscanf scanned the character sequence until the termination character \0. So I assume n stands for the index where sscanf will be in the string s when the function ends.
But what about the variable i? What does it mean?
Edit:
To be more explicit: I see the signature of sscanf wants a pointer of type char * as first parameter. A format specifier as seconf parameter so it knows how to parse the character sequence and as much variables as conversion specifiers as next parameters. I understand now that i is for holding the parsed integer.
Since there is only one format specifier, I tried to deduce the function of n.
Is my assumption above for n correct?
Looks like the op has his answer already, but since I bothered to look this up for myself and run the code...
From "C The Pocket Reference" (2nd Ed by Herbert Shildt) scanf() section:
%n Receives an integer of value equal to the number of characters read so far
and for the return value:
The scanf() function returns a number equal to the number of the number of fields
that were successfully assigned values
The sscanf() function works the same, it just takes it's input from the supplied buffer argument ( s in this case ). The "== 1" test makes sure that only one integer was parsed and the !s[n] makes sure the input buffer is well terminated after the parsed integer and/or that there's really only one integer in the string.
Running this code, an s value like "32" gives a "true" value ( we don't have bool defined as a type on our system ) but s as "3 2" gives a "false" value because s[n] in that case is "2" and n has the value 2 ( "3 " is parsed to create the int in that case ). If s is " 3 " this function will still return true as all that white space is ingored and n has the value of 3.
Another example input, "3m", gives a "false" value as you'd expect.
Verbatim from sscanf()'s man page:
Conversions
[...]
n
Nothing is expected; instead, the number of characters
consumed thus far from the input is stored through the next pointer,
which must be a pointer to int. This is not a
conversion, although it can be suppressed with the * assignment-suppression character. The C
standard says: "Execution of
a %n directive does not increment the assignment count returned at the completion of
execution" but the Corrigendum seems to contradict this. Probably it is wise not
to make any assumptions on the effect of %n conversions on the return value.
I would like to point out that the original code is buggy:
bool is_int(char const* s) {
int n;
int i;
return sscanf(s, "%d %n", &i, &n) == 1 && !s[n];
}
I will explain why. And I will interpret the sscanf format string.
First, buggy:
Given input "1", which is the integer one, sscanf will store 1 into i. Then, since there is no white space after, sscanf will not touch n. And n is uninitialized. Because sscanf set i to 1, the value returned by sscanf will be 1, meaning 1 field scanned. Since sscanf returns 1, the part of the expression
sscanf(s, "%d %n", &i, &n) == 1
will be true. Therefore the other part of the && expression will execute. And s[n] will access some random place in memory because n is uninitialized.
Interpreting the format:
"%d %n"
Attempts to scan a number which may be a decimal number or an integer or a scientific notation number. The number is an integer, it must be followed by at least one white space. White space would be a space, \n, \t, and certain other non-printable characters. Only if it is followed by white space will it set n to the number of characters scanned to that point, including the white space.
This code might be what is intended:
static bool is_int(char const* s)
{
int i;
int fld;
return (fld = sscanf(s, "%i", &i)) == 1;
}
int main(int argc, char * argv[])
{
bool ans = false;
ans = is_int("1");
ans = is_int("m");
return 0;
}
This code is based on, if s is an integer, then sscanf will scan it and fld will be exactly one. If s is not an integer, then fld will be zero or -1. Zero if something else is there, like a word; and -1 if nothing is there but an empty string.
variable i there means until it has read an integer vaalue.
what are you trying to ask though? Its not too clear! the code will (try to ) read an integer from the string into 'i'

Resources