I'm trying to read a JSON-format file that contains an integer array(e.g. [ 0, 1, 2, 3, 4 ])
I am wondering why fscanf skips the brackets in the file and go straight to the numbers when I use
// the type of value is integer
FILE* fp=fopen(file,"r");
fscanf( fp,"%d",&value);
I'm still new to file I/O and I have no idea why this happens. I thought whenever I call fscanf, the file pointer would move 1 position forward.
You should check return values. The fscanf in your example should return 0 since the first non-white-space character encountered is [ which can not start a number so that parsing fails there. The assumed return value of 0 indicates that no succcessful conversion took place. The value of value will probably stay unchanged (I couldn't find a specific statement for that in the man page). The reading position in the file will be before the [ so that subsequent attempts to read an int from the file will fail as well.
How to read the json array:
Note that I do not handle errors in the examples below. You must do that though...
scanf format strings can can contain conversion specifications which contain a list of allowed or forbidden characters. This can be used to read "away" anything which is not a number: char buf[some large enough value]; fscanf(" %[^0-9]", buf);. The reading position is now before the first number.
Then you'll have a loop doing two things:
The number ahead of us can be read trivially with fscanf("%d", &value);. This will also skip possible whitespace before the number in later iterations.
Now we must deal with the comma: fscanf(" %[,]", buf); ("read a comma which is optionally preceded by white space"). Now you can read the next number.
The last number will not be followed by a comma. The attempt to read a comma will therefore fail (i.e. return 0); this can be used as end-of-array indicator.
If more arrays or other stuff may follow you must read away the remaining whitespace and closing square bracket so that you leave the file position after the array for others.
Related
Regarding fscanf (and I assume similarly for scanf), C17 7.21.6.2.9 states the following:
"An input item is read from the stream... An input item is defined as
the longest sequence of input characters which does not exceed any
specified field width and which is, or is a prefix of, a matching
input sequence. The first character, if any, after the input item
remains unread..."
Before reading this I had always assumed that the first character after the input item was read too, then pushed back. For example, if the input was 5X and the conversion specification was %d, both the 5 and the X would be read but the X would be pushed back. However, the quote above seems to indicate that each successive character in the input stream is being "peeked" at before it is read, so the X would never be read in the first place and a push back would never be necessary. However, footnote 289 states that fscanf pushes back at most one input character onto the input stream. So I guess my question is about what all of this really means. Does "read" mean to remove a character from the stream or could it also mean to "peek" at a character without removing it?
Input stream can push back at least 1 character.
Scanning "5X" with "%d" results in "5" being read and converted to an int 5, then saved. The "X" is read, but pushed back.
Trouble occurs with input like "-a" as the "-" is read and so is "a". C guarantees a successful push-back of "a", but if "-" is successfully pushed back depends on the implementation.
int main() {
int i;
scanf("%d", &i); // Enter -a
printf("%c\n", getchar());
}
My output: -, not a as expected with only 1 push back. YMMV.
This is one of the reasons that it is better to read a line of user input with fgets() into a string and then parse the string, than to use (f)scanf().
The pushback is not always necessary. For example, if the conversion specification is %3d and the code reads three decimal digits successfully, it doesn't need to read anything more and there is no pushback.
The pushback is always the character that was read, so beyond recording where to read next, the input buffer doesn't need to change. (Using ungetc(), you can unget (push back) a character other than the one that was read.)
Reading a character means logically removing it from the stream. If it isn't a usable character, it is pushed back, so the effect is the same as peeking.
I'm trying to implement a cycle which will read lines off a file until it finds a line with a specific format. Namely, until it finds a line with:
number number character number
and nothing (but spaces or tabs) until the newline. Specifically, the stuff I have to go through until I find such a line will always be the contents of a 'lines.rows' matrix, but its data points are not necessarily neatly ordered in 'lines' lines of 'rows' elements. They can have any amount of spaces, tabs or newlines between each element.
However, after lines.rows elements there will always be a line in the format I'm scanning for, after an arbitrary number of spaces, tabs or newlines following the last element of the matrix.
I've been trying for several hours to use fgets and fscanf in different ways to achieve this, but the output is simply not correct. Right now I have this:
for (i=0;i<lines;i++)
{
for(j=0;j<rows;j++)
{
fscanf(in_fp, "%d", &temp);
}
}
, which is working but takes way too long in large matrices. Aside from this I've tried, for example,
for (i=0;i<lines;i++)
{
fscanf(in_fp, "%*[^\n]\n", NULL);
}
, which did not work. The idea was to skip to the end of each line and then also read the new line so as to start at the beginning of the following line. However, by the end of this cycle my file was not pointing to the correct line (which would be one with the format specific above - %d %d %c %d). Instead, it was pointing to a 'random' line in the middle of the matrix at hand.
I also tried the same code as above but with an fgets. When I ran an fscanf following a cycle of fgets for all the lines of the matrix, I still did not read the line that should be coming next (with the format specified above).
If you have any input on how to achieve this, or how to make this question more understandable, I would be very thankful.
I have a file where each line looks like this:
cc ssssssss,n
where the two first 'c's are individual characters, possibly spaces, then a space after that, then the 's's are a string that is 8 or 9 characters long, then there's a comma and then an integer.
I'm really new to c and I'm trying to figure out how to put this into 4 seperate variables per line (each of the first two characters, the string, and the number)
Any suggestions? I've looked at fscanf and strtok but i'm not sure how to make them work for this.
Thank you.
I'm assuming this is a C question, as the question suggests, not C++ as the tags perhaps suggest.
Read the whole line in.
Use strchr to find the comma.
Do whatever you want with the first two characters.
Switch the comma for a zero, marking the end of a string.
Call strcpy from the fourth character on to extract the sssssss part.
Call atoi on one character past where the comma was to extract the integer.
A string is a sequence of characters that ends at the first '\0'. Keep this in mind. What you have in the file you described isn't a string.
I presume n is an integer that could span multiple decimal places and could be negative. If that's the case, I believe the format string you require is "%2[^ ] %9[^,\n],%d". You'll want to pass fscanf the following expressions:
Your FILE *,
The format string,
An array of 3 chars silently converted to a pointer,
An array of 9 chars silently converted to a pointer,
... and a pointer to int.
Store the return value of fscanf into an int. If fscanf returns negative, you have a problem such as EOF or some other read error. Otherwise, fscanf tells you how many objects it assigned values into. The "success" value you're looking for in this case is 3. Anything else means incorrectly formed input.
I suggest reading the fscanf manual for more information, and/or for clarification.
fscanf function is very powerful and can be used to solve your task:
We need to read two chars - the format is "%c%c".
Then skip a space (just add it to the format string) - "%c%c ".
Then read a string until we hit a comma. Don't forget to specify max string size. So, the format is "%c%c %10[^,]". 10 - max chars to read. [^,] - list of allowed chars. ^, - means all except a comma.
Then skip a comma - "%c%c %10[^,],".
And finally read an integer - "%c%c %10[^,],%d".
The last step is to be sure that all 4 tokens are read - check fscanf return value.
Here is the complete solution:
FILE *f = fopen("input_file", "r");
do
{
char c1 = 0;
char c2 = 0;
char str[11] = {};
int d = 0;
if (4 == fscanf(f, "%c%c %10[^,],%d", &c1, &c2, str, &d))
{
// successfully got 4 values from the file
}
}
while(!feof(f));
fclose(f);
The language I am using is C
I am trying to scan data from a file, and the code segment is like:
char lsm;
long unsigned int address;
int objsize;
while(fscanf(mem_trace,"%c %lx,%d\n",&lsm,&address,&objsize)!=EOF){
printf("%c %lx %d\n",lsm,address,objsize);
}
The file which I read from has the first line as follows:
S 00600aa0,1
I 004005b6,5
I 004005bb,5
I 004005c0,5
S 7ff000398,8
The results that show in stdout is:
8048350 134524916
S 600aa0 1
I 4005b6 5
I 4005bb 5
I 4005c0 5
S 7ff000398,8
Obviously, the results had an extra line which comes nowhere.Is there anybody know how this could happen?
Thx!
This works for me on the data you supply:
#include <stdio.h>
int main(void)
{
char lsm[2];
long unsigned int address;
int objsize;
while (scanf("%1s %lx,%d\n", lsm, &address, &objsize) == 3)
printf("%s %9lx %d\n", lsm, address, objsize);
return 0;
}
There are multiple changes. The simplest and least consequential is the change from fscanf() to scanf(); that's for my convenience.
One important change is the type of lsm from a single char to an array of two characters. The format string then uses %1s reads one character (plus NUL '\0') into the string, but it also (and this is crucial) skips leading blanks.
Another change is the use of == 3 instead of != EOF in the condition. If something goes wrong, scanf() returns the number of successful matches. Suppose that it managed to read a letter but what followed was not a hex number; it would return 1 (not EOF). Further, it would return 1 on each iteration until it could find something that matched a hex number. Always test for the number of values you expect.
The output format was tidied up with the %9lx. I was testing on a 64-bit system, so the 9-digit hex converts fine. One problem with scanf() is that if you get an overflow on a conversion, the behaviour is undefined.
Output:
S 600aa0 1
I 4005b6 5
I 4005bb 5
I 4005c0 5
S 7ff000398 8
Why did you get the results you got?
The first conversion read a space into lsm, but then failed to convert S into a hex number, so it was left behind for the next cycle. So, you got the left-over garbage printed in the address and object size columns. The second iteration read the S and was then in synchrony with the data until the last line. The newline at the end of the format (like any other white space in the format string) eats white space, which is why the last line worked despite the leading blank.
A directive that is a conversion specification defines a set of
matching input sequences, as described below for each specifier. A
conversion specification is executed in the following steps:
Input white-space characters (as specified by the isspace function)
are skipped, unless the specification includes a [, c, or n specifier.
An input item is read from the stream, unless the specification
includes an n specifier.
[...]
The first time you call fscanf, your %c reads the first blank space in the file. Your white-space character reads zero or more characters of white-space, this time zero of them. Your %lx fails to match the S character in the file, so fscanf returns. You don't check the result. Your variables contain values that they had from earlier operations.
The second time you call fscanf, your %c reads the first S character in the file. From that point on, everything else succeeds too.
Added in editing, here is the simplest change to your format string to solve your problem:
" %c %lx,%d\n"
The space at the beginning will read zero or more characters of white-space and then %c will read the first non-white-space character in the file.
Here is another format string that will also solve your problem:
" %c %lx,%d"
The reason is that if you read and discard zero or more white-space characters twice in a row, the result is the same as doing it just once.
I think that fsanf reads the first character [space] into lsm then fails to read address and objsize because the format shift doesn't match for the rest of the line.
Then it prints a space then whatever happened to be in address and objsize when it was declared
EDIT--
fscanf consumes the whitespaces after each call, if you call ftell you'll see
printf("%c %lx %d %d\n",lsm,address,objsize,ftell(mem_trace));
I am attempting to parse a text (CSS) file using fscanf and pull out all statements that match this pattern:
#import "some/file/somewhere.css";
To do this, I have the following loop set up:
FILE *file = fopen(pathToSomeFile, "r");
char *buffer = (char *)malloc(sizeof(char) * 9000);
while(!feof(file))
{
// %*[^#] : Read and discard all characters up to a '#'
// %8999[^;] : Read up to 8999 characters starting at '#' to a ';'.
if(fscanf(file, "%*[^#] %8999[^;]", buffer) == 1)
{
// Do stuff with the matching characters here.
// This code is long and not relevant to the question.
}
}
This works perfectly SO LONG AS the VERY FIRST character in the file is not a '#'. (Literally, a single space before the first '#' character in the CSS file will make the code run fine.)
But if the very first character in the CSS file is a '#', then what I see in the debugger is an infinite loop -- execution enters the while loop, hits the fscanf statement, but does not enter the 'if' statement (fscanf fails), and then continues through the loop forever.
I believe my fscanf formatters may need some tweaking, but am unsure how to proceed. Any suggestions or explanations for why this is happening?
Thank you.
I'm not an expert on scanf pattern syntax, but my interpretation of yours is:
Match a non-empty sequence of non-'#' characters, then
Match a non-empty sequence of up to 8999 non-';' characters
So yes, if your string starts with a '#', then the first part will fail.
I think if you start your format string with some whitespace, then fscanf will eat any leading whitespace in your data string, i.e. simply " %8999[^;]".
Oli already said why fscanf failed. And since failure is a normal state for fscanf your busy loop is not the consequence of the fscanf failure but of the missing handling for it.
You have to handle a fscanf failure even if your format would be correct (in your special case), because you cannot be sure that the input always is matchable by the format. Actually you can be sure that much more nonmatching input exists than matching input.
Your format string does the following actions:
Read (and discard) 1 or more non-# characters
Read (and discard) 0 or more whitespace characters (due to the space in the format string)
Read and store 1 to 8999 non-; characters
Unfortunately, there is no format specifier for reading "zero or more" characters from a user-defined set.
If you don't care about multiple #include statements on a line, you could change your code to read a single line (with fgets), and then extract the #include statement from that (if the first character does not equal #, you can use your current format string with sscanf, otherwise, you could use sscanf(line, "%8999[^;]", buffer)).
If multiple #include statemens on a line should be handled correctly, you could inspect the next character to be read with getc and then put it back with ungetc.