Working of sscanf - c

There is a big string. I want to store different parts of this in different variables. But it seems that either my understanding is not clear or there is a bug. Please help.
here is my section of code.
char sample[] = "abc,batsman,2,28.0,1800";
char name[10] ,speciality[10];
float batavg;
int pos, runs,j;
j = sscanf(sample,"%s,%s,%d,%f,%d", name, speciality, pos, batavg, runs);
printf("%s,%s,%d,%f,%d", name, speciality, pos, batavg, runs);
printf("\n%d\n",j);
Output
Some garbage values with the value of j = 1 in the above case is shown.
How can I settle this?

The scanf() family of functions require you to pass pointers to the locations where the scanned fields should be stored. That just works when you're scanning into a char array (field descriptor %s) because the name of a char array is converted to a pointer automatically, but for other kinds of fields you need to use address-of operator (&).
Additionally, as iharob first observed, the %s descriptor expects fields to be delimited by whitespace. You can get what you want via the %[] descriptor:
j=sscanf(sample,"%[^,],%[^,],%d,%f,%d",name,speciality,&pos,&batavg,&runs);

The "%s" specifier in *scanf() family of functions scans all the characters until a white space happens.
So the first "%s" is consuming the whole string, that's why j == 1, you must check the value of j before printing, since all the other parameters are uninitialized at the moment of printing.
You need a different format specifier, namely
sscanf("%[^,],%[^,],%d,%f,%d", name, speciality, &pos, &batavg, &runs);

Related

Why in C, when reading files, do we have to use a character array?

I've literally just started programming in C. Coming from a little understanding of Python.
Just had a lecture on C, the lecture was about this:
#include <stdio.h>
int main() {
FILE *file;
char name[10], degree[5];
int mark;
file = fopen("file.txt", "r");
while (fscan(file("%s %s %d", name, degree, &mark) != EOF);
printf("%s %s %d", name, degree, mark);
fclose(file);
}
I'm specifically asking why we the lecturer would have used an array rather than just declaring two string variables. Searching for a deeper answer than just, "that's just C for you".
There are multiple typos on this line:
while (fscan(file("%s %s %d", name, degree, &mark) != EOF);
printf("%s %s %d", name, degree, mark);
It should read:
while (fscanf(file, "%s %s %d", name, degree, &mark) == 3)
printf("%s %s %d", name, degree, mark);
Can you spot the 4 mistakes?
The function is called fscanf
file is an argument followed by ,, not a function name followed by (
you should keep looping for as long as fscanf converts 3 values. If conversion fails, for example because the third field is not a number, it will return a short count, not necessarily EOF.
you typed an extra ; after the condition. This is parsed as an empty statement: the loop keeps reading until end of file, doing nothing, and finally executes printf just once, with potentially invalid arguments.
The programmer uses char arrays and passes their address to fscanf. If he had used pointers (char *), he would have needed to allocate memory to make them point to something and pass their values to fscanf, a different approach that is not needed for fixed array sizes.
Note that the code should prevent potential buffer overflows by specifying the maximum number of characters to store into the arrays:
while (fscanf(file, "%9s %4s %d", name, degree, &mark) == 3) {
printf("%s %s %d", name, degree, mark);
}
Note also that these hard-coded numbers must match the array sizes minus 1, an there is no direct way to pass the array sizes to fscanf(), a common source of bugs when the code is modified. This function has many quirks and shortcomings, use with extreme care.
Take name[] for example. It's an array of chars, a collection of chars if you like.
There is no string type in c, so we use an array an array of chars when we want to use a string.
The code is written as such, so that we can read the actual string in a line of the file, in our array.
As a side note, this program will produce syntax errors.
What you have in every non-empty file is a series of bytes. Therefor, what your C program has to do is read bytes. Since the variable type char is used to represent a byte, and since you want to read multiple bytes at once for efficiency purposes, you read an array of chars. That's for the general understanding of what reading from a file means.
Going back to your example, there is no string type in C. A string is an array of bytes (char[]) ended by a nul character.
What the lecturer is doing is:
define an array of chars (which will contain a string) => char name[10] is an array that may contain a string of at most 9 characters (the last byte would be used for '\0').
ask fscanf to read a string, which means it will read multiple bytes (multiple chars) until it finds a nul character ('\0') and put all of that in the given array.
To understand what's happening, forget about string as an opaque data type (which might be true in other programming languages) and see them as what they really are: arrays of chars.

Sscanf not returning what I want

I have the following problem:
sscanf is not returning the way I want it to.
This is the sscanf:
sscanf(naru,
"%s[^;]%s[^;]%s[^;]%s[^;]%f[^';']%f[^';']%[^;]%[^;]%[^;]%[^;]"
"%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]"
"%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]",
&jokeri, &paiva1, &keskilampo1, &minlampo1, &maxlampo1,
&paiva2, &keskilampo2, &minlampo2, &maxlampo2, &paiva3,
&keskilampo3, &minlampo3, &maxlampo3, &paiva4, &keskilampo4,
&minlampo4, &maxlampo4, &paiva5, &keskilampo5, &minlampo5,
&maxlampo5, &paiva6, &keskilampo6, &minlampo6, &maxlampo6,
&paiva7, &keskilampo7, &minlampo7, &maxlampo7);
The string it's scanning:
const char *str = "city;"
"2014-04-14;7.61;4.76;7.61;"
"2014-04-15;5.7;5.26;6.63;"
"2014-04-16;4.84;2.49;5.26;"
"2014-04-17;2.13;1.22;3.45;"
"2014-04-18;3;2.15;3.01;"
"2014-04-19;7.28;3.82;7.28;"
"2014-04-20;10.62;5.5;10.62;";
All of the variables are stored as char paiva1[22] etc; however, the sscanf isn't storing anything except the city correctly. I've been trying to stop each variable at ;.
Any help how to get it to store the dates etc correctly would be appreciated.
Or if there's a smarter way to do this, I'm open to suggestions.
There are multiple problems, but BLUEPIXY hit the first one — the scan-set notation doesn't follow %s.
Your first line of the format is:
"%s[^;]%s[^;]%s[^;]%s[^;]%f[^';']%f[^';']%[^;]%[^;]%[^;]%[^;]"
As it stands, it looks for a space separated word, followed by a [, a ^, a ;, and a ] (which is self-contradictory; the character after the string is a space or end of string).
The first fixup would be to use scan-sets properly:
"%[^;]%[^;]%[^;]%[^;]%f[^';']%f[^';']%[^;]%[^;]%[^;]%[^;]"
Now you have a problem that the first %[^;] scans everything up to the end of string or first semicolon, leaving nothing for the second %[;] to match.
"%[^;]; %[^;]; %[^;]; %[^;]; %f[^';']%f[^';']%[^;]%[^;]%[^;]%[^;]"
This looks for a string up to a semicolon, then for the semicolon, then optional white space, then repeats for three items. Apart from adding a length to limit the size of string, preventing overflow, these are fine. The %f is OK. The following material looks for an odd sequence of characters again.
However, when the data is looked at, it seems to consist of a city, and then seven sets of 'a date plus three numbers'.
You'd do better with an array of structures (if you've worked with those yet), or a set of 4 parallel arrays, and a loop:
char jokeri[30];
char paiva[7][30];
float keskilampo[7];
float minlampo[7];
float maxlampo[7];
int eoc; // End of conversion
int offset = 0;
char sep;
if (fscanf(str + offset, "%29[^;]%c%n", jokeri, &sep, &eoc) != 2 || sep != ';')
...report error...
offset += eoc;
for (int i = 0; i < 7; i++)
{
if (fscanf(str + offset, "%29[^;];%f;%f;%f%c%n", paiva[i],
&keskilampo[i], &minlampo[i], &maxlampo[i], &sep, &eoc) != 5 ||
sep != ';')
...report error...
offset += eoc;
}
See also How to use sscanf() in loops.
Now you have data that can be managed. The set of 29 separately named variables is a ghastly thought; the code using them will be horrid.
Note that the scan-set conversion specifications limit the string to a maximum length one shorter than the size of jokeri and the paiva array elements.
You might legitimately be wondering about why the code uses %c%n and &sep before &eoc. There is a reason, but it is subtle. Suppose that the sscanf() format string is:
"%29[^;];%f;%f;%f;%n"
Further, suppose there's a problem in the data that the semicolon after the third number is missing. The call to sscanf() will report that it made 4 successful conversions, but it doesn't count the %n as an assignment, so you can't tell that sscanf() didn't find a semicolon and therefore did not set &eoc at all; the value is left over from a previous call to sscanf(), or simply uninitialized. By using the %c to scan a value into sep, we get 5 returned on success, and we can be sure the %n was successful too. The code checks that the value in sep is in fact a semicolon and not something else.
You might want to consider a space before the semi-colons, and before the %c. They'll allow some other data strings to be converted that would not be matched otherwise. Spaces in a format string (outside a scan-set) indicate where optional white space may appear.
I would use strtok function to break your string into pieces using ; as a delimiter. Such a long format string may be a source of problems in future.

Get number of characters read by sscanf?

I'm parsing a string (a char*) and I'm using sscanf to parse numbers from the string into doubles, like so:
// char* expression;
double value = 0;
sscanf(expression, "%lf", &value);
This works great, but I would then like to continue parsing the string through conventional means. I need to know how many characters have been parsed by sscanf so that I may resume my manual parsing from the new offset.
Obviously, the easiest way would be to somehow calculate the number of characters that sscanf parses, but if there's no simple way to do that, I am open to alternative double parsing options. However, I'm currently using sscanf because it's fast, simple, and readable. Either way, I just need a way to evaluate the double and continue parsing after it.
You can use the format specifier %n and provide an additional int * argument to sscanf():
int pos;
sscanf(expression, "%lf%n", &value, &pos);
Description for format specifier n from the C99 standard:
No input is consumed. The corresponding argument shall be a pointer to
signed integer into which is to be written the number of characters read from the input stream so far by this call to the fscanf function. Execution of a %n directive does not increment the assignment count returned at the completion of execution of the fscanf function. No argument is converted, but one is consumed. If the conversion specification includes an assignment suppressing character or a field width, the behavior is undefined.
Always check the return value of sscanf() to ensure that assignments were made, and subsequent code does not mistakenly process variables whose values were unchanged:
/* Number of assignments made is returned,
which in this case must be 1. */
if (1 == sscanf(expression, "%lf%n", &value, &pos))
{
/* Use 'value' and 'pos'. */
}
int i, j, k;
char s[20];
if (sscanf(somevar, "%d %19s %d%n", &i, s, &j, &k) != 3)
...something went wrong...
The variable k contains the character count up to the point where the end of the integer stored in j was scanned.
Note that the %n is not counted in the successful conversions. You can use %n several times in the format string if you need to.

how can I printf in c

How can I make this print properly without using two printf calls?
char* second = "Second%d";
printf("First%d"second,1,2);
The code you showed us is syntactically invalid, but I presume you want to do something that has the same effect as:
printf("First%dSecond%d", 1, 2);
As you know, the first argument to printf is the format string. It doesn't have to be a literal; you can build it any way you like.
Here's an example:
#include <stdio.h>
#include <string.h>
int main(void)
{
char *second = "Second%d";
char format[100];
strcpy(format, "First%d");
strcat(format, second);
printf(format, 1, 2);
putchar('\n');
return 0;
}
Some notes:
I've added a newline after the output. Output text should (almost) always be terminated by a newline.
I've set an arbitrary size of 100 bytes for the format string. More generally, you could declare
char *format;
and initialize it with a call to malloc(), allocating the size you actually need (and checking that malloc() didn't signal failure by returning a null pointer); you'd then want to call free(format); after you're done with it.
As templatetypedef says in a comment, this kind of thing can be potentially dangerous if the format string comes from an uncontrolled source.
(Or you could just call printf twice; it's not that much more expensive than calling it once.)
Use the preprocessor to concatenate the two strings.
#define second "Second%d"
printf("First%d"second,1,2);
Do not do this in a real program.
char *second = "Second %d";
char *first = "First %d";
char largebuffer[256];
strcpy (largebuffer, first);
strcat (largebuffer, second);
printf (largebuffer, 1, 2);
The problem with using generated formats such as the method above is that the printf() function, since it is a variable length argument list, has no way of knowing the number of arguments provided. What it does is to use the format string provided and using the types as described in the format string it will then pick that number and types of arguments from the argument list.
If you provide the correct number of arguments like in the example above in which there are two %d formats and there are two integers provided to be printed in those places, everything is fine. However what if you do something like the following:
char *second = "Second %s";
char *first = "First %d";
char largebuffer[256];
strcpy (largebuffer, first);
strcat (largebuffer, second);
printf (largebuffer, 1);
In this example the printf() function is expecting the format string as well as a variable number of arguments. The format string says that there will be two additional arguments, an integer and a zero terminated character string. However only one additional argument is provided so the printf() function will just use what ever is next on the stack as being a pointer to a zero terminated character string.
If you are lucky, the data that the printf() function interprets as a pointer will a valid memory address for your application and the memory area pointed to will be a couple of characters terminated by a zero. If you are less lucky the pointer will be zero or garbage and you will get an access violation right then and it will be easy to find the cause of the application crash. If you have no luck at all, the pointer will be good enough that it will point to a valid address that is about 2K of characters and the result is that printf() will totally mess up your stack and go into the weeds and the resulting crash data will be pretty useless.
char *second = "Second%d";
char tmp[256];
memset(tmp, 0, 256);
sprintf(tmp, second, 2);
printf("First%d%s", 1,tmp);
Or something like that
I'm assuming you want the output:
First 1 Second 2
To do this we need to understand printf's functionality a little better. The real reason that printf is so useful is that it not only prints strings, but also formats variables for you. Depending on how you want your variable formatted you need to use different formatting characters. %d tells printf to format the variable as a signed integer, which you already know. However, there are other formats, such as %f for floats and doubles, %l% for long integers, and %s for strings, or char*.
Using the %s formatting character to print your char* variable, second, our code looks like this:
char* second = "Second";
printf ( " First %d %s %d ", 1, second, 2 );
This tells printf that you want the first variable formatted as an integer, the second as a string, and the third as another integer.

Why does C's printf format string have both %c and %s?

Why does C's printf format string have both %c and %s?
I know that %c represents a single character and %s represents a null-terminated string of characters, but wouldn't the string representation alone be enough?
Probably to distinguish between null terminated string and a character. If they just had %s, then every single character must also be null terminated.
char c = 'a';
In the above case, c must be null terminated. This is my assumption though :)
%s prints out chars until it reaches a 0 (or '\0', same thing).
If you just have a char x;, printing it with printf("%s", &x); - you'd have to provide the address, since %s expects a char* - would yield unexpected results, as &x + 1 might not be 0.
So you couldn't just print a single character unless it was null-terminated (very inefficent).
EDIT: As other have pointed out, the two expect different things in the var args parameters - one a pointer, the other a single char. But that difference is somewhat clear.
The issue that is mentioned by others that a single character would have to be null terminated isn't a real one. This could be dealt with by providing a precision to the format %.1s would do the trick.
What is more important in my view is that for %s in any of its forms you'd have to provide a pointer to one or several characters. That would mean that you wouldn't be able to print rvalues (computed expressions, function returns etc) or register variables.
Edit: I am really pissed off by the reaction to this answer, so I will probably delete this, this is really not worth it. It seems that people react on this without even having read the question or knowing how to appreciate the technicality of the question.
To make that clear: I don't say that you should prefer %.1s over %c. I only say that reasons why %c cannot be replaced by that are different than the other answer pretend to tell. These other answers are just technically wrong. Null termination is not an issue with %s.
The printf function is a variadic function, meaning that it has variable number of arguments. Arguments are pushed on the stack before the function (printf) is called. In order for the function printf to use the stack, it needs to know information about what is in the stack, the format string is used for that purpose.
e.g.
printf( "%c", ch ); tells the function the argument 'ch'
is to be interpreted as a character and sizeof(char)
whereas
printf( "%s", s ); tells the function the argument 's' is a pointer
to a null terminated string sizeof(char*)
it is not possible inside the printf function to otherwise determine stack contents e.g. distinguishing between 'ch' and 's' because in C there is no type checking during runtime.
%s says print all the characters until you find a null (treat the variable as a pointer).
%c says print just one character (treat the variable as a character code)
Using %s for a character doesn't work because the character is going to be treated like a pointer, then it's going to try to print all the characters following that place in memory until it finds a null
Stealing from the other answers to explain it in a different way.
If you wanted to print a character using %s, you could use the following to properly pass it an address of a char and to keep it from writing garbage on the screen until finding a null.
char c = 'c';
printf('%.1s', &c);
For %s, we need provide the address of string, not its value.
For %c, we provide the value of characters.
If we used the %s instead of %c, how would we provide a '\0' after the characters?
Id like to add another point of perspective to this fun question.
Really this comes down to data typing. I have seen answers on here that state that you could provide a pointer to the char, and provide a
"%.1s"
This could indeed be true. But the answer lies in the C designer's trying to provide flexibility to the programmer, and indeed a (albeit small) way of decreasing footprint of your application.
Sometimes a programmer might like to run a series of if-else statements or a switch-case, where the need is to simply output a character based upon the state. For this, hard coding the the characters could indeed take less actual space in memory as the single characters are 8 bits versus the pointer which is 32 or 64 bits (for 64 bit computers). A pointer will take up more space in memory.
If you would like to decrease the size through using actual chars versus pointers to chars, then there are two ways one could think to do this within printf types of operators. One would be to key off of the .1s, but how is the routine supposed to know for certain that you are truly providing a char type versus a pointer to a char or pointer to a string (array of chars)? This is why they went with the "%c", as it is different.
Fun Question :-)
C has the %c and %s format specifiers because they handle different types.
A char and a string are about as different as night and 1.
%c expects a char, which is an integer value and prints it according to encoding rules.
%s expects a pointer to a location of memory that contains char values, and prints the characters in that location according to encoding rules until it finds a 0 (null) character.
So you see, under the hood, the two cases while they look alike they have not much in common, as one works with values and the other with pointers. One is instructions for interpreting a specific integer value as an ascii char, and the other is iterating the contents of a memory location char by char and interpreting them until a zero value is encountered.
I have done a experiment with printf("%.1s", &c) and printf("%c", c).
I used the code below to test, and the bash's time utility the get the runing time.
#include<stdio.h>
int main(){
char c = 'a';
int i;
for(i = 0; i < 40000000; i++){
//printf("%.1s", &c); get a result of 4.3s
//printf("%c", c); get a result of 0.67s
}
return 0;
}
The result says that using %c is 10 times faster than %.1s. So, althought %s can do the job of %c, %c is still needed for performance.
Since no one has provided an answer with ANY reference whatsoever, here is a printf specification from pubs.opengroup.com which is similar to the format definition from IBM
%c
The int argument shall be converted to an unsigned char, and the resulting byte shall be written.
%s
The argument shall be a pointer to an array of char. Bytes from the array shall be written up to (but not including) any terminating null byte. If the precision is specified, no more than that many bytes shall be written. If the precision is not specified or is greater than the size of the array, the application shall ensure that the array contains a null byte.

Resources