sscanf not extracting pattern - c

I am trying to figure out the pattern I should be giving to sscanf.
I have a string abcde(1GB). I want to extract 1 and GB. I am using
char list[]= "abcde(1GB)";
int memory_size =0;
char unit[3]={0} ;
sscanf(list, "%*s%d%s" , &memory_size, unit);
I do not see tokens extracted when I print I see memory_size =0 and NULL in unit.
Thanks

your sscanf() string format should be:
sscanf(list, "%*[^(](%d%[^)]" , &memory_size, unit);
%[^)] means catch charachters and stop ctaching when finding the charachter ) or end of the string
%*[^(] means:
[^\(] means catch charachters and stop ctaching when finding the charachter ( - as opposed to a more conventional %s - catching charachters and stop ctaching when finding space characters"
* means "read but not store"

The %s conversion specifier in the format string of scanf means that scanf will read a sequence of non-whitespace characters. The * in %*s is called the assignment suppression character. It means that scanf will read and discard the non-whitespace characters.
Since %s matches any sequence of non-whitespace characters, %*s in "%*s%d%s" means that sscanf will read an discard all the characters in the string list. Therefore, there's nothing left in the array list to be read and assigned to the arguments &memory_size and unit. This explains why memory_size and unit are unchanged. What you need is the format string
sscanf(list, "%*[^(](%d%2[^)]%*s", &memory_size, unit);
Here, in the format string "%*[^(](%d%2[^)]%*s" -
%*[^(] means that sscanf will first read and discard a sequence of character not containing the character (.
( means that sscanf will read and discard (.
%d means the sscanf will read a decimal integer.
%2[^)] means that sscanf will read at most a sequence of 2 characters not containing ) and store them in the corresponding argument. This is to ensure that sscanf does not overrun the buffer unit in case the string to be stored is too large. It's one less than the size of unit to save space for the terminating null byte which is automatically added by sscanf at the end of the buffer.
%*s means sscanf will read and discard any sequence of leftover non-whitespace characters.

plz go through the following link for the sscanf() function.
sscanf function description
Now,
as per MOHAMED said %[^(] is for catch the characters till "("
means %*[^(] truncate the string until ( after that %d as integer data and then capture the string until ).

After accept answer - supplemental
#MOHAMED answer is good, but below are candidate improvements.
1) Always check the sscanf() result to insure data was scanned as intended.
if (sscanf(list, "%*[^(](%d%[^)]", &memory_size, unit) != 2) ScanFailure();
2) When using the "%s" or "%[]" specifiers, include a limiting length. #ajay
char unit[3]={0} ;
// v--- 1 less than buffer size.
if (sscanf(list, "%*[^(](%d%2[^)]" , &memory_size, unit) != 2) ScanFailure();
3) Appending a "%n" (save scan position) is a sure-fire means of detecting the trailing ) was there and extra junk was not at the end of the input string.
"%n" does not add to sscaanf() result.
int n = 0;
// )%n <--- Look for ) then save scan position
if (sscanf(list, "%*[^(](%d%2[^)])%n", &memory_size, unit, &n) != 2) ||
list[n] != '\0') ScanFailure();
4) Making room for optional white-space may/may not be useful. Specifiers already allow optional leading white-space. 3 exceptions: %c %[] %n
"%*[^(](%d%2[^)])%n"
// v v v
" %*[^(](%d %2[^)]) %n"

Related

Getting a char* with spaces in C from sscanf

I am attempting to read a line written in the format:
someword: .asciiz "want this as a char*"
There is an arbitrary amount of white space between words. I am curious if there is a simple way of getting the internal characters in the quotes into a char* variable using something like sscanf? I am guaranteed the quotes and that where will be no more than 32 characters (including spaces). There will also be a new line character immediately following the quotes.
Most scanf() field descriptors implicitly cause leading whitespace to be skipped and expect the field to be whitespace-terminated. To scan a string that may contain whitespace, however, you can use the %[] field descriptor with an appropriate scan set. Thus, you might scan sequence of lines following the pattern you describe like so by looping calls like this:
char keyword[32], value[32], description[32];
scanf("%s%s%*[ \t]\"%[^\"]\"", keyword, value, description);
That format string:
scans two whitespace-delimited strings into char arrays keyword and value,
scans but does not assign one or more whitespace characters followed by a quotation mark,
scans everything up to but not including the next quotation mark into char array description, and scans and discards a quotation mark.
It relies on the data to be correctly formatted; among other things, this is vulnerable to a buffer overflow if the data are malformed. You can address that by specifying maximum field widths in the format string.
Note, too, that you should check the return value of the function to ensure that all fields were successfully matched. That will allow you to terminate early in the event of malformed input, and even to present valid information about the location of the malformation.
You can use scanf ("%s%s%31[^\n]",s1,s2,s3);
Example:
#include <stdio.h>
int main()
{
char s1[32],s2[32],s3[32];
printf ("write something: ");
scanf ("%s%s%31[^\n]",s1,s2,s3);
printf ("%s %s %s",s1,s2,s3);
return 0;
}
s1 and s2 will ignore spaces but s3 won't
Use \"%32[^\"]\" to capture the quoted phrase. Use "%n" to detect success.
char w1[32+1];
char w2[32+1];
char w3[32+1];
int n = 0;
sscanf(buffer, "%32s%32s \"%32[^\"]\" %n", w1, w2, w3, &n);
if (n == 0) return fail; // format mis-match
if (buffer[n]) return fail; // Extra garbage detected
// else good to go.
"%32s" Skip white-space,then read & save up to 32 non-white-space char. Append '\0'.
" " Skip white space.
"\"" Match a '\"'.
"%32[^\"]" Read and save up to 32 non-'\"' char. Append '\0'.
"%n" Save the count of characters scanned.

Using fscanf to get 2 string in 2 columns

I'm trying to make my program read a file with 2 columns and the first column contains some strings and i cant make it to store into an array
Here is my code:
fp2=fopen("Symbol Table.txt","r");char str[100];
while(fscanf(fp2,"%s %s",str,stemp[scnt])!=NULL) {
puts(stemp[scnt++]);getch(); //This is just here to display conents of second col
}
fclose(fp2)
and here is my txt file:
void void
main Main
( Left Parenthesis
) Right Parenthesis
{ Left Brace
S Identifier
: Colon
$% Start of Block Comment
This program is a simple calculatorFuctions:ADD,SUB,MULT,DIV String
%$ End of Block Comment
Unsigned Noise Words
int Integer
the code store the long string is divided before going into the array
When you try to read strings with scanf or fscanf it reads the string until it encounters a space, a tab, or a newline. In this case, as I can see, in the first loop the str and stemp will be assigned to the stings "void" and "void", in the second loop, it will be "main" and "Main" and in the third loop, "(" and "Left" and so on.
You need to separate the two columns with multiple strings with a special character like the tab (\t) character so that one knows when the 1st column ends and read with the getc function instead of scanf until you reach that special character in your line and then combine all the characters (i.e. letters) in to a string. You can adopt the same method to read the second column with a getc function until you encounter the newline (\n).
while(fscanf(fp2,"%s %s",str,stemp[scnt])!=NULL) {
is not going to work
"%s" scans for non- white-space text. "This program is" and "Left Parenthesis" have embedded white-space.
Need to read a line-at-a-time and then parse the line into 2 strings for the 2 columns.
char buf[100];
if (fgets(buf, sizeof buf, fp2) == NULL) Handle_EOForIOError();
column[2][100];
size_t len = strlen(buf);
if (len < 62) Handle_ShortLine();
buf[61] = = '\0'; // cut the line in 2
column[0][0] = '\0'; // In case column is all white-space
sscanf(buf, " %[^\n]", column[0]);
column[1][0] = '\0';
sscanf(&buf[62], " %[^\n]", column[1]);
Other answers contain statements which are not true. Actually,
fscanf does not always read strings only until a white-space character. The conversion specifier [ also reads a string, and includes the set of expected characters.
You do not need to separate the two columns with a special character, since the columns are of known width 62.
You do not need to read a line at a time and then parse it; though, that might be easier.
So, if you prefer fscanf, you can use
while (fscanf(fp2, "%62[^\n]%[^\n]\n", str, stemp[scnt]) == 2)
(your comparing with NULL is wrong, because NULL is a pointer constant and fscanf doesn't return a pointer but the number of input items assigned or EOF).
A version without fscanf:
while (fgets(str, 62+1, fp2) && fgets(stemp[scnt], 62+1, fp2) && getc(fp2))
(The getc reads the \n.)
The Nature of while(fscanf(fp2,"%s %s",str,stemp[scnt])!=NULL) is will read only the two string separated by a space. if your text file contains long strings like this
This program is a simple calculatorFuctions:ADD,SUB,MULT,DIV
While fetching it will divide this strings into parts of %s %s and fetch. So change your text file in the following way and try:
void void
main Main
( LeftParenthesis
) RightParenthesis
{ LeftBrace
S Identifier
: Colon
$% StartofBlockComment
ThisProgramIsASimpleCalculatorFuctions:ADD,SUB,MULT,DIV String
%$ EndofBlockComment
Unsigned NoiseWords
int Integer

sscanf reading multiple characters

I am trying to read a list of interfaces given in a config file.
char readList[3];
memset(readList,'\0',sizeof(readList));
fgets(linebuf, sizeof(linebuf),fp); //I get the file pointer(fp) without any error
sscanf(linebuf, "List %s, %s, %s \n",
&readList[0],
&readList[1],
&readList[2]);
Suppose the line the config file is something like this
list value1 value2 value3
I am not able to read this. Can someone please tell me the correct syntax for doing this. What am I doing wrong here.
The %s conversion specifier in the format string reads a sequence of non-whitespace characters and stores in buffer pointed to by the corresponding argument passed to sscanf. The buffer must be large enough to store the input string plus the terminating null byte which is added automatically by sscanf else it is undefined behaviour.
char readList[3];
The above statement defines readList to be an array of 3 characters. What you need is an array of characters arrays large enough to store the strings written by sscanf. Also the format string "List %s, %s, %s \n" in your sscanf call means that it must exactly match "List" and two commas ' in that order in the string linebuf else sscanf will fail due to matching failure. Make sure that you string linebuf is formatted accordingly else here is what I suggest. Also, you must guard against sscanf overrunning the buffer it writes into by specifying the maximum field width else it will cause undefined behaviour. The maximum field width should be one less than the buffer size to accommodate the terminating null byte.
// assuming the max length of a string in linebuf is 40
// +1 for the terminating null byte which is added by sscanf
char readList[3][40+1];
fgets(linebuf, sizeof linebuf, fp);
// "%40s" means that sscanf will write at most 40 chars
// in the buffer and then append the null byte at the end.
sscanf(linebuf,
"%40s%40s%40s",
readList[0],
readList[1],
readList[2]);
Your char readlist[3] is an array of three chars. What you need however is an array of three strings, like char readlist[3][MUCH]. Then, reading them like
sscanf(linebuf, "list %s %s %s",
readList[0],
readList[1],
readList[2]);
will perhaps succeed. Note that the alphabetic strings in scanf need to match character-by-character (hence list, not List), and any whitespace in the format string is a signal to skip all whitespace in the string up to the next non-whitespace character. Also note the absence of & in readList arguments since they are already pointers.

moving elements of array in c, is there a better option

I am reading input from file (line by line) Each line is a state of a game board. Below is example of input:
(8,7,1,0,0,0,b,b,b,b,b,b,b,b,b,b,b,b,s,s,r,r,g,b,r,g,r,r,r,r,b,r,r,s,b,b,b,b,r,s,s,r,b,b,r,s,s,s,r,b,g,b,r,r,r,r,r,r,r,r,r,s) 0
I have used fgets() and strtok() to split the string at (),
My problem:
I want the first 6 integers in their individual variables such as:
int column = 8
int row = 7
so on..
I want to get rid of the last integer at the end of input- 0
and the chars should be stored in an array, because they represent pieces of a board.
Right now, I have an array with all the integers and chars stored together.
I can iterate through my array, and copy the integers to their variables and then chars to a new array. But that's inefficient.
Is there another way to do it?
I used fscanf() but don't know how to split the string using delimiters.
Thanks
WELL-FORMED INPUT ONLY
if (fscanf(FILE_PTR, "(%d,%d,...,%c,%c,%c,...,%c) %*d", &column, &row, ..., &chars[0], &chars[1], ...) == 60)
or something like that
the %*d specifier will discard that input (you didn't want the last number)
for the chars, give pointers to their indices for a preallocated array
for the ints, give the variable pointer/ref
Thank you to Jon Leffler for reminding that you should test the output of *scanf (number of things read)!
More information
REEDIT nope, it was right -
int fscanf ( FILE * stream, const char * format, ... );
format: C string that contains a sequence of characters that control how characters extracted from the stream are treated:
Whitespace character: the function will read and ignore any whitespace characters encountered before the next non-whitespace character (whitespace characters include spaces, newline and tab characters -- see isspace). A single whitespace in the format string validates any quantity of whitespace characters extracted from the stream (including none).
Non-whitespace character, except format specifier (%): Any character that is not either a whitespace character (blank, newline or tab) or part of a format specifier (which begin with a % character) causes the function to read the next character from the stream, compare it to this non-whitespace character and if it matches, it is discarded and the function continues with the next character of format. If the character does not match, the function fails, returning and leaving subsequent characters of the stream unread.
Format specifiers: A sequence formed by an initial percentage sign (%) indicates a format specifier, which is used to specify the type and format of the data to be retrieved from the stream and stored into the locations pointed by the additional arguments.
Above quote from here. I am aware of the hostility towards cplusplus.com here but I do not have access to the standard. please feel free to edit if you do
I have used fgets() and strtok() to split the string at "()"
later
I used fscanf() but don't know how to split the string using delimiters.
I guess if strtok() worked for parenthesis, it would work for commas too.
Apart from that: you have several possibilities for doing what you want. Without much context provided, I can't really tell you which one you want, but here we go:
Grab a pointer to the first non-integer, and use it as if it was a pointer to the first element of another array, containing the integers only. This avoids all copying and/or moving overhead.
Use memcpy() to copy only the necessary parts of the array to another array. memcpy() is generally highly optimized and faster than the naive for-loop-with-assignment approach.
if you have a char * you can think of it as an array or as a string, as the memory layout is the same...
char * input = "(8,7,1,0,0,0,b,b,b,b,b,b,b,b,b,b,b,b,s,s,r,r,g,b,r,g,r,r,r,r,b,r,r,s,b,b,b,b,r,s,s,r,b,b,r,s,s,s,r,b,g,b,r,r,r,r,r,r,r,r,r,s) 0";
size_t len = strlen(input);
int currentIndex = 0;
char * output = calloc(1,len);
for (int i = 0 ; i<len ; i++)
{
if (input[i] == '(' || input[i] == ')' || input[i] == ','|| input[i] == ' ') {
continue;
}
output[currentIndex++] = input[i];
}
assert(strlen(output) == 63); //well formatted?
char a = output[0];
char b = output[1];
char (* board)[60] = malloc(60); //pointer to array or is it a mal-formed string.
memcpy(board, output+2, 60);
char last = output[62];
the main thing that I would add, if you want to use it more like a string, then you have to make the array 1 bigger and set board[60] = \0;

Invalid output with fscanf()

The language I am using is C
I am trying to scan data from a file, and the code segment is like:
char lsm;
long unsigned int address;
int objsize;
while(fscanf(mem_trace,"%c %lx,%d\n",&lsm,&address,&objsize)!=EOF){
printf("%c %lx %d\n",lsm,address,objsize);
}
The file which I read from has the first line as follows:
S 00600aa0,1
I 004005b6,5
I 004005bb,5
I 004005c0,5
S 7ff000398,8
The results that show in stdout is:
8048350 134524916
S 600aa0 1
I 4005b6 5
I 4005bb 5
I 4005c0 5
S 7ff000398,8
Obviously, the results had an extra line which comes nowhere.Is there anybody know how this could happen?
Thx!
This works for me on the data you supply:
#include <stdio.h>
int main(void)
{
char lsm[2];
long unsigned int address;
int objsize;
while (scanf("%1s %lx,%d\n", lsm, &address, &objsize) == 3)
printf("%s %9lx %d\n", lsm, address, objsize);
return 0;
}
There are multiple changes. The simplest and least consequential is the change from fscanf() to scanf(); that's for my convenience.
One important change is the type of lsm from a single char to an array of two characters. The format string then uses %1s reads one character (plus NUL '\0') into the string, but it also (and this is crucial) skips leading blanks.
Another change is the use of == 3 instead of != EOF in the condition. If something goes wrong, scanf() returns the number of successful matches. Suppose that it managed to read a letter but what followed was not a hex number; it would return 1 (not EOF). Further, it would return 1 on each iteration until it could find something that matched a hex number. Always test for the number of values you expect.
The output format was tidied up with the %9lx. I was testing on a 64-bit system, so the 9-digit hex converts fine. One problem with scanf() is that if you get an overflow on a conversion, the behaviour is undefined.
Output:
S 600aa0 1
I 4005b6 5
I 4005bb 5
I 4005c0 5
S 7ff000398 8
Why did you get the results you got?
The first conversion read a space into lsm, but then failed to convert S into a hex number, so it was left behind for the next cycle. So, you got the left-over garbage printed in the address and object size columns. The second iteration read the S and was then in synchrony with the data until the last line. The newline at the end of the format (like any other white space in the format string) eats white space, which is why the last line worked despite the leading blank.
A directive that is a conversion specification defines a set of
matching input sequences, as described below for each specifier. A
conversion specification is executed in the following steps:
Input white-space characters (as specified by the isspace function)
are skipped, unless the specification includes a [, c, or n specifier.
An input item is read from the stream, unless the specification
includes an n specifier.
[...]
The first time you call fscanf, your %c reads the first blank space in the file. Your white-space character reads zero or more characters of white-space, this time zero of them. Your %lx fails to match the S character in the file, so fscanf returns. You don't check the result. Your variables contain values that they had from earlier operations.
The second time you call fscanf, your %c reads the first S character in the file. From that point on, everything else succeeds too.
Added in editing, here is the simplest change to your format string to solve your problem:
" %c %lx,%d\n"
The space at the beginning will read zero or more characters of white-space and then %c will read the first non-white-space character in the file.
Here is another format string that will also solve your problem:
" %c %lx,%d"
The reason is that if you read and discard zero or more white-space characters twice in a row, the result is the same as doing it just once.
I think that fsanf reads the first character [space] into lsm then fails to read address and objsize because the format shift doesn't match for the rest of the line.
Then it prints a space then whatever happened to be in address and objsize when it was declared
EDIT--
fscanf consumes the whitespaces after each call, if you call ftell you'll see
printf("%c %lx %d %d\n",lsm,address,objsize,ftell(mem_trace));

Resources