Invalid output with fscanf() - c

The language I am using is C
I am trying to scan data from a file, and the code segment is like:
char lsm;
long unsigned int address;
int objsize;
while(fscanf(mem_trace,"%c %lx,%d\n",&lsm,&address,&objsize)!=EOF){
printf("%c %lx %d\n",lsm,address,objsize);
}
The file which I read from has the first line as follows:
S 00600aa0,1
I 004005b6,5
I 004005bb,5
I 004005c0,5
S 7ff000398,8
The results that show in stdout is:
8048350 134524916
S 600aa0 1
I 4005b6 5
I 4005bb 5
I 4005c0 5
S 7ff000398,8
Obviously, the results had an extra line which comes nowhere.Is there anybody know how this could happen?
Thx!

This works for me on the data you supply:
#include <stdio.h>
int main(void)
{
char lsm[2];
long unsigned int address;
int objsize;
while (scanf("%1s %lx,%d\n", lsm, &address, &objsize) == 3)
printf("%s %9lx %d\n", lsm, address, objsize);
return 0;
}
There are multiple changes. The simplest and least consequential is the change from fscanf() to scanf(); that's for my convenience.
One important change is the type of lsm from a single char to an array of two characters. The format string then uses %1s reads one character (plus NUL '\0') into the string, but it also (and this is crucial) skips leading blanks.
Another change is the use of == 3 instead of != EOF in the condition. If something goes wrong, scanf() returns the number of successful matches. Suppose that it managed to read a letter but what followed was not a hex number; it would return 1 (not EOF). Further, it would return 1 on each iteration until it could find something that matched a hex number. Always test for the number of values you expect.
The output format was tidied up with the %9lx. I was testing on a 64-bit system, so the 9-digit hex converts fine. One problem with scanf() is that if you get an overflow on a conversion, the behaviour is undefined.
Output:
S 600aa0 1
I 4005b6 5
I 4005bb 5
I 4005c0 5
S 7ff000398 8
Why did you get the results you got?
The first conversion read a space into lsm, but then failed to convert S into a hex number, so it was left behind for the next cycle. So, you got the left-over garbage printed in the address and object size columns. The second iteration read the S and was then in synchrony with the data until the last line. The newline at the end of the format (like any other white space in the format string) eats white space, which is why the last line worked despite the leading blank.

A directive that is a conversion specification defines a set of
matching input sequences, as described below for each specifier. A
conversion specification is executed in the following steps:
Input white-space characters (as specified by the isspace function)
are skipped, unless the specification includes a [, c, or n specifier.
An input item is read from the stream, unless the specification
includes an n specifier.
[...]
The first time you call fscanf, your %c reads the first blank space in the file. Your white-space character reads zero or more characters of white-space, this time zero of them. Your %lx fails to match the S character in the file, so fscanf returns. You don't check the result. Your variables contain values that they had from earlier operations.
The second time you call fscanf, your %c reads the first S character in the file. From that point on, everything else succeeds too.
Added in editing, here is the simplest change to your format string to solve your problem:
" %c %lx,%d\n"
The space at the beginning will read zero or more characters of white-space and then %c will read the first non-white-space character in the file.
Here is another format string that will also solve your problem:
" %c %lx,%d"
The reason is that if you read and discard zero or more white-space characters twice in a row, the result is the same as doing it just once.

I think that fsanf reads the first character [space] into lsm then fails to read address and objsize because the format shift doesn't match for the rest of the line.
Then it prints a space then whatever happened to be in address and objsize when it was declared
EDIT--
fscanf consumes the whitespaces after each call, if you call ftell you'll see
printf("%c %lx %d %d\n",lsm,address,objsize,ftell(mem_trace));

Related

Input stream reads and push backs by fscanf and scanf

Regarding fscanf (and I assume similarly for scanf), C17 7.21.6.2.9 states the following:
"An input item is read from the stream... An input item is defined as
the longest sequence of input characters which does not exceed any
specified field width and which is, or is a prefix of, a matching
input sequence. The first character, if any, after the input item
remains unread..."
Before reading this I had always assumed that the first character after the input item was read too, then pushed back. For example, if the input was 5X and the conversion specification was %d, both the 5 and the X would be read but the X would be pushed back. However, the quote above seems to indicate that each successive character in the input stream is being "peeked" at before it is read, so the X would never be read in the first place and a push back would never be necessary. However, footnote 289 states that fscanf pushes back at most one input character onto the input stream. So I guess my question is about what all of this really means. Does "read" mean to remove a character from the stream or could it also mean to "peek" at a character without removing it?
Input stream can push back at least 1 character.
Scanning "5X" with "%d" results in "5" being read and converted to an int 5, then saved. The "X" is read, but pushed back.
Trouble occurs with input like "-a" as the "-" is read and so is "a". C guarantees a successful push-back of "a", but if "-" is successfully pushed back depends on the implementation.
int main() {
int i;
scanf("%d", &i); // Enter -a
printf("%c\n", getchar());
}
My output: -, not a as expected with only 1 push back. YMMV.
This is one of the reasons that it is better to read a line of user input with fgets() into a string and then parse the string, than to use (f)scanf().
The pushback is not always necessary. For example, if the conversion specification is %3d and the code reads three decimal digits successfully, it doesn't need to read anything more and there is no pushback.
The pushback is always the character that was read, so beyond recording where to read next, the input buffer doesn't need to change. (Using ungetc(), you can unget (push back) a character other than the one that was read.)
Reading a character means logically removing it from the stream. If it isn't a usable character, it is pushed back, so the effect is the same as peeking.

Using sscanf for an unknown number of variables

For a school assignment, I have to read in a string that has at least one but up to three variables(named command, one, and two). There is always a character at the beginning of the string, but it may or may not be followed by integers. The format could be like any of the following:
i 5 17
i 3
p
d 4
I am using fgets to read the string from the file, but I'm having trouble processing it. I've been trying to use sscanf, but I'm getting segfaults reading in a string that only has one or two variables instead of three.
Is there a different function I should be using?
Or is there a way to format sscanf to do what I need?
I've tried sscanf(buffer, "%c %d %d", command, one, two) and several variations with no luck.
sscanf is probably up to this task, depending on the exact requirements and ranges of inputs.
The key here is is that the scanf family functions returns a useful value which indicates how many conversions were made. This can be less than zero: the value EOF (a negative value) can be returned if the end of the input occurs or an I/O error, before the first conversion is even attempted.
Note that the %c conversion specifier doesn't produce a null-terminated string. By default, it reads only one character and stores it through the argument pointer. E.g.
char ch;
sscanf("abc", "%c", &ch);
this will write the character 'a' into ch.
Unless you have an iron-clad assurance that the first field is always one character wide, it's probably better to read it as a string with %s. Always use a maximum width with %s not to overflow the destination buffer. For instance:
char field1[64]; /* one larger than field width, for terminating null */
sscanf(..., "%63s", field1, ...);
sscanf doesn't perform any overflow checks on integers. If %d is used to scan a large negative or positive value that doesn't fit into int, the behavior is simply undefined according to ISO C. So, just like with %s, %d is best used with a field width limitation. For instance, %4d for reading a four digit year. Four decimal digits will not overflow int.

sscanf not extracting pattern

I am trying to figure out the pattern I should be giving to sscanf.
I have a string abcde(1GB). I want to extract 1 and GB. I am using
char list[]= "abcde(1GB)";
int memory_size =0;
char unit[3]={0} ;
sscanf(list, "%*s%d%s" , &memory_size, unit);
I do not see tokens extracted when I print I see memory_size =0 and NULL in unit.
Thanks
your sscanf() string format should be:
sscanf(list, "%*[^(](%d%[^)]" , &memory_size, unit);
%[^)] means catch charachters and stop ctaching when finding the charachter ) or end of the string
%*[^(] means:
[^\(] means catch charachters and stop ctaching when finding the charachter ( - as opposed to a more conventional %s - catching charachters and stop ctaching when finding space characters"
* means "read but not store"
The %s conversion specifier in the format string of scanf means that scanf will read a sequence of non-whitespace characters. The * in %*s is called the assignment suppression character. It means that scanf will read and discard the non-whitespace characters.
Since %s matches any sequence of non-whitespace characters, %*s in "%*s%d%s" means that sscanf will read an discard all the characters in the string list. Therefore, there's nothing left in the array list to be read and assigned to the arguments &memory_size and unit. This explains why memory_size and unit are unchanged. What you need is the format string
sscanf(list, "%*[^(](%d%2[^)]%*s", &memory_size, unit);
Here, in the format string "%*[^(](%d%2[^)]%*s" -
%*[^(] means that sscanf will first read and discard a sequence of character not containing the character (.
( means that sscanf will read and discard (.
%d means the sscanf will read a decimal integer.
%2[^)] means that sscanf will read at most a sequence of 2 characters not containing ) and store them in the corresponding argument. This is to ensure that sscanf does not overrun the buffer unit in case the string to be stored is too large. It's one less than the size of unit to save space for the terminating null byte which is automatically added by sscanf at the end of the buffer.
%*s means sscanf will read and discard any sequence of leftover non-whitespace characters.
plz go through the following link for the sscanf() function.
sscanf function description
Now,
as per MOHAMED said %[^(] is for catch the characters till "("
means %*[^(] truncate the string until ( after that %d as integer data and then capture the string until ).
After accept answer - supplemental
#MOHAMED answer is good, but below are candidate improvements.
1) Always check the sscanf() result to insure data was scanned as intended.
if (sscanf(list, "%*[^(](%d%[^)]", &memory_size, unit) != 2) ScanFailure();
2) When using the "%s" or "%[]" specifiers, include a limiting length. #ajay
char unit[3]={0} ;
// v--- 1 less than buffer size.
if (sscanf(list, "%*[^(](%d%2[^)]" , &memory_size, unit) != 2) ScanFailure();
3) Appending a "%n" (save scan position) is a sure-fire means of detecting the trailing ) was there and extra junk was not at the end of the input string.
"%n" does not add to sscaanf() result.
int n = 0;
// )%n <--- Look for ) then save scan position
if (sscanf(list, "%*[^(](%d%2[^)])%n", &memory_size, unit, &n) != 2) ||
list[n] != '\0') ScanFailure();
4) Making room for optional white-space may/may not be useful. Specifiers already allow optional leading white-space. 3 exceptions: %c %[] %n
"%*[^(](%d%2[^)])%n"
// v v v
" %*[^(](%d %2[^)]) %n"

splitting string in c

I have a file where each line looks like this:
cc ssssssss,n
where the two first 'c's are individual characters, possibly spaces, then a space after that, then the 's's are a string that is 8 or 9 characters long, then there's a comma and then an integer.
I'm really new to c and I'm trying to figure out how to put this into 4 seperate variables per line (each of the first two characters, the string, and the number)
Any suggestions? I've looked at fscanf and strtok but i'm not sure how to make them work for this.
Thank you.
I'm assuming this is a C question, as the question suggests, not C++ as the tags perhaps suggest.
Read the whole line in.
Use strchr to find the comma.
Do whatever you want with the first two characters.
Switch the comma for a zero, marking the end of a string.
Call strcpy from the fourth character on to extract the sssssss part.
Call atoi on one character past where the comma was to extract the integer.
A string is a sequence of characters that ends at the first '\0'. Keep this in mind. What you have in the file you described isn't a string.
I presume n is an integer that could span multiple decimal places and could be negative. If that's the case, I believe the format string you require is "%2[^ ] %9[^,\n],%d". You'll want to pass fscanf the following expressions:
Your FILE *,
The format string,
An array of 3 chars silently converted to a pointer,
An array of 9 chars silently converted to a pointer,
... and a pointer to int.
Store the return value of fscanf into an int. If fscanf returns negative, you have a problem such as EOF or some other read error. Otherwise, fscanf tells you how many objects it assigned values into. The "success" value you're looking for in this case is 3. Anything else means incorrectly formed input.
I suggest reading the fscanf manual for more information, and/or for clarification.
fscanf function is very powerful and can be used to solve your task:
We need to read two chars - the format is "%c%c".
Then skip a space (just add it to the format string) - "%c%c ".
Then read a string until we hit a comma. Don't forget to specify max string size. So, the format is "%c%c %10[^,]". 10 - max chars to read. [^,] - list of allowed chars. ^, - means all except a comma.
Then skip a comma - "%c%c %10[^,],".
And finally read an integer - "%c%c %10[^,],%d".
The last step is to be sure that all 4 tokens are read - check fscanf return value.
Here is the complete solution:
FILE *f = fopen("input_file", "r");
do
{
char c1 = 0;
char c2 = 0;
char str[11] = {};
int d = 0;
if (4 == fscanf(f, "%c%c %10[^,],%d", &c1, &c2, str, &d))
{
// successfully got 4 values from the file
}
}
while(!feof(f));
fclose(f);

C: fscanf - infinite loop when first character matches

I am attempting to parse a text (CSS) file using fscanf and pull out all statements that match this pattern:
#import "some/file/somewhere.css";
To do this, I have the following loop set up:
FILE *file = fopen(pathToSomeFile, "r");
char *buffer = (char *)malloc(sizeof(char) * 9000);
while(!feof(file))
{
// %*[^#] : Read and discard all characters up to a '#'
// %8999[^;] : Read up to 8999 characters starting at '#' to a ';'.
if(fscanf(file, "%*[^#] %8999[^;]", buffer) == 1)
{
// Do stuff with the matching characters here.
// This code is long and not relevant to the question.
}
}
This works perfectly SO LONG AS the VERY FIRST character in the file is not a '#'. (Literally, a single space before the first '#' character in the CSS file will make the code run fine.)
But if the very first character in the CSS file is a '#', then what I see in the debugger is an infinite loop -- execution enters the while loop, hits the fscanf statement, but does not enter the 'if' statement (fscanf fails), and then continues through the loop forever.
I believe my fscanf formatters may need some tweaking, but am unsure how to proceed. Any suggestions or explanations for why this is happening?
Thank you.
I'm not an expert on scanf pattern syntax, but my interpretation of yours is:
Match a non-empty sequence of non-'#' characters, then
Match a non-empty sequence of up to 8999 non-';' characters
So yes, if your string starts with a '#', then the first part will fail.
I think if you start your format string with some whitespace, then fscanf will eat any leading whitespace in your data string, i.e. simply " %8999[^;]".
Oli already said why fscanf failed. And since failure is a normal state for fscanf your busy loop is not the consequence of the fscanf failure but of the missing handling for it.
You have to handle a fscanf failure even if your format would be correct (in your special case), because you cannot be sure that the input always is matchable by the format. Actually you can be sure that much more nonmatching input exists than matching input.
Your format string does the following actions:
Read (and discard) 1 or more non-# characters
Read (and discard) 0 or more whitespace characters (due to the space in the format string)
Read and store 1 to 8999 non-; characters
Unfortunately, there is no format specifier for reading "zero or more" characters from a user-defined set.
If you don't care about multiple #include statements on a line, you could change your code to read a single line (with fgets), and then extract the #include statement from that (if the first character does not equal #, you can use your current format string with sscanf, otherwise, you could use sscanf(line, "%8999[^;]", buffer)).
If multiple #include statemens on a line should be handled correctly, you could inspect the next character to be read with getc and then put it back with ungetc.

Resources