i have a code snippet like this
while( fscanf(b,"%c,%[^,],%[^,],%f",&book.type,book.title,book.author,&book.price)!=EOF)
Reading up on the format string part of fscanf
Specifically this part
matches a non-empty sequence of character from set of characters. If
the first character of the set is ^, then all characters not in the
set are matched.
So, that format specifier matches all characters except the , character (which in your format string is matched afterward). So, if you had a struct like
typedef struct Book_t {
char type;
char title[100];
char author[100];
float price;
}Book ;
and then had a file that had the data in the schema:
BookType,BookTitle,BookAuthor,BookPrice
Then once could possibly read in each line into book as
fscanf(b,"%c,%[^,],%[^,],%f",&book.type,book.title,book.author,&book.price)
For a line of the file as:
A,old man and the sea,ernest hemingway,12.5
A would be read into book.type and then all characters not matching a comma would be read, so that would read in till sea and stop since the next character is a ,. This , would be matched with the , in the format string. The same process would repeat for the author field.
Note the caveat that reading in an unspecified number of characters till the matching stops before a comma is bad idea because the buffer that it's reading into is usually of a fixed length. That's why it's better to specify the maximum width (accounting for the null character) while doing so. Continuing with the above example, this would look something like
fscanf(b,"%c,%99[^,],%99[^,],%f",&book.type,book.title,book.author,&book.price)
To have the 99 denote that it should match only up to 99 characters at max to avoid any buffer overflows since the buffer title can hold only upto 100 characters and at least one byte would be required for the \0 character.
Related
I am attempting to read a line written in the format:
someword: .asciiz "want this as a char*"
There is an arbitrary amount of white space between words. I am curious if there is a simple way of getting the internal characters in the quotes into a char* variable using something like sscanf? I am guaranteed the quotes and that where will be no more than 32 characters (including spaces). There will also be a new line character immediately following the quotes.
Most scanf() field descriptors implicitly cause leading whitespace to be skipped and expect the field to be whitespace-terminated. To scan a string that may contain whitespace, however, you can use the %[] field descriptor with an appropriate scan set. Thus, you might scan sequence of lines following the pattern you describe like so by looping calls like this:
char keyword[32], value[32], description[32];
scanf("%s%s%*[ \t]\"%[^\"]\"", keyword, value, description);
That format string:
scans two whitespace-delimited strings into char arrays keyword and value,
scans but does not assign one or more whitespace characters followed by a quotation mark,
scans everything up to but not including the next quotation mark into char array description, and scans and discards a quotation mark.
It relies on the data to be correctly formatted; among other things, this is vulnerable to a buffer overflow if the data are malformed. You can address that by specifying maximum field widths in the format string.
Note, too, that you should check the return value of the function to ensure that all fields were successfully matched. That will allow you to terminate early in the event of malformed input, and even to present valid information about the location of the malformation.
You can use scanf ("%s%s%31[^\n]",s1,s2,s3);
Example:
#include <stdio.h>
int main()
{
char s1[32],s2[32],s3[32];
printf ("write something: ");
scanf ("%s%s%31[^\n]",s1,s2,s3);
printf ("%s %s %s",s1,s2,s3);
return 0;
}
s1 and s2 will ignore spaces but s3 won't
Use \"%32[^\"]\" to capture the quoted phrase. Use "%n" to detect success.
char w1[32+1];
char w2[32+1];
char w3[32+1];
int n = 0;
sscanf(buffer, "%32s%32s \"%32[^\"]\" %n", w1, w2, w3, &n);
if (n == 0) return fail; // format mis-match
if (buffer[n]) return fail; // Extra garbage detected
// else good to go.
"%32s" Skip white-space,then read & save up to 32 non-white-space char. Append '\0'.
" " Skip white space.
"\"" Match a '\"'.
"%32[^\"]" Read and save up to 32 non-'\"' char. Append '\0'.
"%n" Save the count of characters scanned.
I am trying to read a list of interfaces given in a config file.
char readList[3];
memset(readList,'\0',sizeof(readList));
fgets(linebuf, sizeof(linebuf),fp); //I get the file pointer(fp) without any error
sscanf(linebuf, "List %s, %s, %s \n",
&readList[0],
&readList[1],
&readList[2]);
Suppose the line the config file is something like this
list value1 value2 value3
I am not able to read this. Can someone please tell me the correct syntax for doing this. What am I doing wrong here.
The %s conversion specifier in the format string reads a sequence of non-whitespace characters and stores in buffer pointed to by the corresponding argument passed to sscanf. The buffer must be large enough to store the input string plus the terminating null byte which is added automatically by sscanf else it is undefined behaviour.
char readList[3];
The above statement defines readList to be an array of 3 characters. What you need is an array of characters arrays large enough to store the strings written by sscanf. Also the format string "List %s, %s, %s \n" in your sscanf call means that it must exactly match "List" and two commas ' in that order in the string linebuf else sscanf will fail due to matching failure. Make sure that you string linebuf is formatted accordingly else here is what I suggest. Also, you must guard against sscanf overrunning the buffer it writes into by specifying the maximum field width else it will cause undefined behaviour. The maximum field width should be one less than the buffer size to accommodate the terminating null byte.
// assuming the max length of a string in linebuf is 40
// +1 for the terminating null byte which is added by sscanf
char readList[3][40+1];
fgets(linebuf, sizeof linebuf, fp);
// "%40s" means that sscanf will write at most 40 chars
// in the buffer and then append the null byte at the end.
sscanf(linebuf,
"%40s%40s%40s",
readList[0],
readList[1],
readList[2]);
Your char readlist[3] is an array of three chars. What you need however is an array of three strings, like char readlist[3][MUCH]. Then, reading them like
sscanf(linebuf, "list %s %s %s",
readList[0],
readList[1],
readList[2]);
will perhaps succeed. Note that the alphabetic strings in scanf need to match character-by-character (hence list, not List), and any whitespace in the format string is a signal to skip all whitespace in the string up to the next non-whitespace character. Also note the absence of & in readList arguments since they are already pointers.
I am reading input from file (line by line) Each line is a state of a game board. Below is example of input:
(8,7,1,0,0,0,b,b,b,b,b,b,b,b,b,b,b,b,s,s,r,r,g,b,r,g,r,r,r,r,b,r,r,s,b,b,b,b,r,s,s,r,b,b,r,s,s,s,r,b,g,b,r,r,r,r,r,r,r,r,r,s) 0
I have used fgets() and strtok() to split the string at (),
My problem:
I want the first 6 integers in their individual variables such as:
int column = 8
int row = 7
so on..
I want to get rid of the last integer at the end of input- 0
and the chars should be stored in an array, because they represent pieces of a board.
Right now, I have an array with all the integers and chars stored together.
I can iterate through my array, and copy the integers to their variables and then chars to a new array. But that's inefficient.
Is there another way to do it?
I used fscanf() but don't know how to split the string using delimiters.
Thanks
WELL-FORMED INPUT ONLY
if (fscanf(FILE_PTR, "(%d,%d,...,%c,%c,%c,...,%c) %*d", &column, &row, ..., &chars[0], &chars[1], ...) == 60)
or something like that
the %*d specifier will discard that input (you didn't want the last number)
for the chars, give pointers to their indices for a preallocated array
for the ints, give the variable pointer/ref
Thank you to Jon Leffler for reminding that you should test the output of *scanf (number of things read)!
More information
REEDIT nope, it was right -
int fscanf ( FILE * stream, const char * format, ... );
format: C string that contains a sequence of characters that control how characters extracted from the stream are treated:
Whitespace character: the function will read and ignore any whitespace characters encountered before the next non-whitespace character (whitespace characters include spaces, newline and tab characters -- see isspace). A single whitespace in the format string validates any quantity of whitespace characters extracted from the stream (including none).
Non-whitespace character, except format specifier (%): Any character that is not either a whitespace character (blank, newline or tab) or part of a format specifier (which begin with a % character) causes the function to read the next character from the stream, compare it to this non-whitespace character and if it matches, it is discarded and the function continues with the next character of format. If the character does not match, the function fails, returning and leaving subsequent characters of the stream unread.
Format specifiers: A sequence formed by an initial percentage sign (%) indicates a format specifier, which is used to specify the type and format of the data to be retrieved from the stream and stored into the locations pointed by the additional arguments.
Above quote from here. I am aware of the hostility towards cplusplus.com here but I do not have access to the standard. please feel free to edit if you do
I have used fgets() and strtok() to split the string at "()"
later
I used fscanf() but don't know how to split the string using delimiters.
I guess if strtok() worked for parenthesis, it would work for commas too.
Apart from that: you have several possibilities for doing what you want. Without much context provided, I can't really tell you which one you want, but here we go:
Grab a pointer to the first non-integer, and use it as if it was a pointer to the first element of another array, containing the integers only. This avoids all copying and/or moving overhead.
Use memcpy() to copy only the necessary parts of the array to another array. memcpy() is generally highly optimized and faster than the naive for-loop-with-assignment approach.
if you have a char * you can think of it as an array or as a string, as the memory layout is the same...
char * input = "(8,7,1,0,0,0,b,b,b,b,b,b,b,b,b,b,b,b,s,s,r,r,g,b,r,g,r,r,r,r,b,r,r,s,b,b,b,b,r,s,s,r,b,b,r,s,s,s,r,b,g,b,r,r,r,r,r,r,r,r,r,s) 0";
size_t len = strlen(input);
int currentIndex = 0;
char * output = calloc(1,len);
for (int i = 0 ; i<len ; i++)
{
if (input[i] == '(' || input[i] == ')' || input[i] == ','|| input[i] == ' ') {
continue;
}
output[currentIndex++] = input[i];
}
assert(strlen(output) == 63); //well formatted?
char a = output[0];
char b = output[1];
char (* board)[60] = malloc(60); //pointer to array or is it a mal-formed string.
memcpy(board, output+2, 60);
char last = output[62];
the main thing that I would add, if you want to use it more like a string, then you have to make the array 1 bigger and set board[60] = \0;
I have read strings with spaces in them using the following scanf() statement.
scanf("%[^\n]", &stringVariableName);
What is the meaning of the control string [^\n]?
Is is okay way to read strings with white space like this?
This mean "read anything until you find a '\n'"
This is OK, but would be better to do this "read anything until you find a '\n', or read more characters than my buffer support"
char stringVariableName[256] = {}
if (scanf("%255[^\n]", stringVariableName) == 1)
...
Edit: removed & from the argument, and check the result of scanf.
The format specifier "%[^\n]" instructs scanf() to read up to but not including the newline character. From the linked reference page:
matches a non-empty sequence of character from set of characters.
If the first character of the set is ^, then all characters not
in the set are matched. If the set begins with ] or ^] then the ]
character is also included into the set.
If the string is on a single line, fgets() is an alternative but the newline must be removed as fgets() writes it to the output buffer. fgets() also forces the programmer to specify the maximum number of characters that can be read into the buffer, making it less likely for a buffer overrun to occur:
char buffer[1024];
if (fgets(buffer, 1024, stdin))
{
/* Remove newline. */
char* nl = strrchr(buffer, '\n');
if (nl) *nl = '\0';
}
It is possible to specify the maximum number of characters to read via scanf():
scanf("%1023[^\n]", buffer);
but it is impossible to forget to do it for fgets() as the compiler will complain. Though, of course, the programmer could specify the wrong size but at least they are forced to consider it.
Technically, this can't be well defined.
Matches a nonempty sequence of characters from a set of expected
characters (the scanset).
If no l length modifier is present, the corresponding argument shall
be a pointer to the initial element of a character array large enough
to accept the sequence and a terminating null character, which will be
added automatically.
Supposing the declaration of stringVariableName looks like char stringVariableName[x];, then &stringVariableName is a char (*)[x];, not a char *. The type is wrong. The behaviour is undefined. It might work by coincidence, but anything that relies on coincidence doesn't work by my definition.
The only way to form a char * using &stringVariableName is if stringVariableName is a char! This implies that the character array is only large enough to accept a terminating null character. In the event where the user enters one or more characters before pressing enter, scanf would be writing beyond the end of the character array and invoking undefined behaviour. In the event where the user merely presses enter, the %[...] directive will fail and not even a '\0' will be written to your character array.
Now, with that all said and done, I'll assume you meant this: scanf("%[^\n]", stringVariableName); (note the omitted ampersand)
You really should be checking the return value!!
A %[ directive causes scanf to retrieve a sequence of characters consisting of those specified between the [ square brackets ]. A ^ at the beginning of the set indicates that the desired set contains all characters except for those between the brackets. Hence, %[^\n] tells scanf to read as many non-'\n' characters as it can, and store them into the array pointed to by the corresponding char *.
The '\n' will be left unread. This could cause problems. An empty field will result in a match failure. In this situation, it's possible that no data will be copied into your array (not even a terminating '\0' character). For this reason (and others), you really need to check the return value!
Which manual contains information about the return values of scanf? The scanf manual.
Other people have explained what %[^\n] means.
This is not an okay way to read strings. It is just as dangerous as the notoriously unsafe gets, and for the same reason: it has no idea how big the buffer at stringVariableName is.
The best way to read one full line from a file is getline, but not all C libraries have it. If you don't, you should use fgets, which knows how big the buffer is, and be aware that you might not get a complete line (if the line is too long for the buffer).
Reading from the man pages for scanf()...
[ Matches a non-empty sequence of characters from the
specified set of accepted characters; the next pointer must be a
pointer to char, and there must be enough room for all the characters
in the string, plus a terminating null byte. The usual skip of
leading white space is suppressed. The string is to be made up of
characters in (or not in) a particular set; the set is defined by the
characters between the open bracket [ character and a close bracket ]
character. The set excludes those characters if the first character
after the open bracket is a circumflex (^). To include a close
bracket in the set, make it the first character after the open bracket
or the circumflex; any other position will end the set. The hyphen
character - is also special; when placed between two other
characters, it adds all intervening characters to the set. To
include a hyphen, make it the last character before the final close
bracket. For instance, [^]0-9-] means the set "everything except
close bracket, zero through nine, and hyphen". The string ends with
the appearance of a character not in the (or, with a
circumflex, in) set or when the field width runs out.
In a nutshell, the [^\n] means that read everything from the string that is not a \n and store that in the matching pointer in the argument list.
The language I am using is C
I am trying to scan data from a file, and the code segment is like:
char lsm;
long unsigned int address;
int objsize;
while(fscanf(mem_trace,"%c %lx,%d\n",&lsm,&address,&objsize)!=EOF){
printf("%c %lx %d\n",lsm,address,objsize);
}
The file which I read from has the first line as follows:
S 00600aa0,1
I 004005b6,5
I 004005bb,5
I 004005c0,5
S 7ff000398,8
The results that show in stdout is:
8048350 134524916
S 600aa0 1
I 4005b6 5
I 4005bb 5
I 4005c0 5
S 7ff000398,8
Obviously, the results had an extra line which comes nowhere.Is there anybody know how this could happen?
Thx!
This works for me on the data you supply:
#include <stdio.h>
int main(void)
{
char lsm[2];
long unsigned int address;
int objsize;
while (scanf("%1s %lx,%d\n", lsm, &address, &objsize) == 3)
printf("%s %9lx %d\n", lsm, address, objsize);
return 0;
}
There are multiple changes. The simplest and least consequential is the change from fscanf() to scanf(); that's for my convenience.
One important change is the type of lsm from a single char to an array of two characters. The format string then uses %1s reads one character (plus NUL '\0') into the string, but it also (and this is crucial) skips leading blanks.
Another change is the use of == 3 instead of != EOF in the condition. If something goes wrong, scanf() returns the number of successful matches. Suppose that it managed to read a letter but what followed was not a hex number; it would return 1 (not EOF). Further, it would return 1 on each iteration until it could find something that matched a hex number. Always test for the number of values you expect.
The output format was tidied up with the %9lx. I was testing on a 64-bit system, so the 9-digit hex converts fine. One problem with scanf() is that if you get an overflow on a conversion, the behaviour is undefined.
Output:
S 600aa0 1
I 4005b6 5
I 4005bb 5
I 4005c0 5
S 7ff000398 8
Why did you get the results you got?
The first conversion read a space into lsm, but then failed to convert S into a hex number, so it was left behind for the next cycle. So, you got the left-over garbage printed in the address and object size columns. The second iteration read the S and was then in synchrony with the data until the last line. The newline at the end of the format (like any other white space in the format string) eats white space, which is why the last line worked despite the leading blank.
A directive that is a conversion specification defines a set of
matching input sequences, as described below for each specifier. A
conversion specification is executed in the following steps:
Input white-space characters (as specified by the isspace function)
are skipped, unless the specification includes a [, c, or n specifier.
An input item is read from the stream, unless the specification
includes an n specifier.
[...]
The first time you call fscanf, your %c reads the first blank space in the file. Your white-space character reads zero or more characters of white-space, this time zero of them. Your %lx fails to match the S character in the file, so fscanf returns. You don't check the result. Your variables contain values that they had from earlier operations.
The second time you call fscanf, your %c reads the first S character in the file. From that point on, everything else succeeds too.
Added in editing, here is the simplest change to your format string to solve your problem:
" %c %lx,%d\n"
The space at the beginning will read zero or more characters of white-space and then %c will read the first non-white-space character in the file.
Here is another format string that will also solve your problem:
" %c %lx,%d"
The reason is that if you read and discard zero or more white-space characters twice in a row, the result is the same as doing it just once.
I think that fsanf reads the first character [space] into lsm then fails to read address and objsize because the format shift doesn't match for the rest of the line.
Then it prints a space then whatever happened to be in address and objsize when it was declared
EDIT--
fscanf consumes the whitespaces after each call, if you call ftell you'll see
printf("%c %lx %d %d\n",lsm,address,objsize,ftell(mem_trace));