Getting a char* with spaces in C from sscanf - c

I am attempting to read a line written in the format:
someword: .asciiz "want this as a char*"
There is an arbitrary amount of white space between words. I am curious if there is a simple way of getting the internal characters in the quotes into a char* variable using something like sscanf? I am guaranteed the quotes and that where will be no more than 32 characters (including spaces). There will also be a new line character immediately following the quotes.

Most scanf() field descriptors implicitly cause leading whitespace to be skipped and expect the field to be whitespace-terminated. To scan a string that may contain whitespace, however, you can use the %[] field descriptor with an appropriate scan set. Thus, you might scan sequence of lines following the pattern you describe like so by looping calls like this:
char keyword[32], value[32], description[32];
scanf("%s%s%*[ \t]\"%[^\"]\"", keyword, value, description);
That format string:
scans two whitespace-delimited strings into char arrays keyword and value,
scans but does not assign one or more whitespace characters followed by a quotation mark,
scans everything up to but not including the next quotation mark into char array description, and scans and discards a quotation mark.
It relies on the data to be correctly formatted; among other things, this is vulnerable to a buffer overflow if the data are malformed. You can address that by specifying maximum field widths in the format string.
Note, too, that you should check the return value of the function to ensure that all fields were successfully matched. That will allow you to terminate early in the event of malformed input, and even to present valid information about the location of the malformation.

You can use scanf ("%s%s%31[^\n]",s1,s2,s3);
Example:
#include <stdio.h>
int main()
{
char s1[32],s2[32],s3[32];
printf ("write something: ");
scanf ("%s%s%31[^\n]",s1,s2,s3);
printf ("%s %s %s",s1,s2,s3);
return 0;
}
s1 and s2 will ignore spaces but s3 won't

Use \"%32[^\"]\" to capture the quoted phrase. Use "%n" to detect success.
char w1[32+1];
char w2[32+1];
char w3[32+1];
int n = 0;
sscanf(buffer, "%32s%32s \"%32[^\"]\" %n", w1, w2, w3, &n);
if (n == 0) return fail; // format mis-match
if (buffer[n]) return fail; // Extra garbage detected
// else good to go.
"%32s" Skip white-space,then read & save up to 32 non-white-space char. Append '\0'.
" " Skip white space.
"\"" Match a '\"'.
"%32[^\"]" Read and save up to 32 non-'\"' char. Append '\0'.
"%n" Save the count of characters scanned.

Related

Getting particular strings in scanf

I was wondering if it is possible to only read in particular parts of a string using scanf.
For example since I am reading from a file i use fscanf
if I wanted to read name and number (where number is the 111-2222) when they are in a string such as:
Bob Hardy:sometext:111-2222:sometext:sometext
I use this but its not working:
(fscanf(read, "%23[^:] %27[^:] %10[^:] %27[^:] %d\n", name,var1, number, var2, var3))
Your initial format string fails because it does not consume the : delimiters.
If you want scanf() to read a portion of the input, but you don't care what is actually read, then you should use a field descriptor with the assignment-suppression flag (*):
char nl;
fscanf(read, "%23[^:]:%*[^:]:%10[^:]%*[^\n]%c", name, number, &nl);
As a bonus, you don't need to worry about buffer overruns for fields with assignment suppressed.
You should not attempt to match a single newline via a trailing newline character in the format, because a literal newline (or space or tab) in the format will match any run of whitespace. In this particular case, it would consume not just the line terminator but also any leading whitespace on the next line.
The last field is not suppressed, even though it will almost always receive a newline, because that way you can tell from the return value if you've scanned the last line of the file and it is not newline-terminated.
Check fscanf() return value.
fscanf(read, "%23[^:] %27[^:] ... is failing because after scanning the first field with %23[^:], fscanf() encounters a ':'. Since that does not match the next part of the format, a white-space as in ' ', scanning stops.
Had code checked the returned value of fscanf(), which was certainly 1, it may have been self-evident the source of the problem. So the scanning needs to consume the ':', add it to the format: "%23[^:]: %27[^:]: ...
Better to use fgets()
Using fscanf() to read data and detect properly and improperly formatted data is very challenging. It can be done correctly to scan expected input. Yet it rarely works to handle some incorrectly formated input.
Instead, simple read a line of data and then parse it. Using '%n' is an easy way to detect complete conversion as it saves the char scan count - if scanning gets there.
char buffer[200];
if (fgets(buffer, sizeof buffer, read) == NULL) {
return EOF;
}
int n = 0;
sscanf(buffer, " %23[^:]: %27[^:]: %10[^:]: %27[^:]:%d %n",
name, var1, number, var2, &var3, &n);
if (n == 0) {
return FAIL; // scan incomplete
}
if (buffer[n]) {
return FAIL; // Extra data on line
}
// Success!
Note: sample input ended with text, but original format used "%d". Unclear on OP's intent.

Taking formatted input using fscanf()

I am trying to read non-numeric words from a text file, which can be separated by comma,dot,colon or quotes or some combination of this like ". The code I'm trying so far is reading non-numeric words
correctly, but leaving the delimiters.Am i using fscanf() right ?
int ReadWords(FILE* fp, char *words[])
{
int i=0;
char temp[50],tmp[50]; // assuming the words cannot be too long
while (fscanf(fp,"%s%*[,.\":]",temp)==1) //ignore punctuation
{
if (isNumeric(temp))
continue;
printf("%s\n",temp);
words[i] = strdup(temp);
i++;
}
fclose(fp);
// The result of this function is the number of words in the file
return i;
}
I am getting output like
emergency,"
"an
unknown
car
entered,
I need like
emergency
an
unknown
car
entered
The %s format scans "words", i.e. chunks of contiguous non-space. This includes punctuation.
You want to scan non-numeric words, i.e. alphabetic characters only. You could use the %[...] format as you already do for punctuation, for these characters:
while (fscanf(fp, "%49[a-zA-Z]%*[^a-zA-Z]", temp) == 1) ...
Things to note:
The minus sign defines ranges of characters in brackets unless it's the first or last character, so %[a-zA-Z] scans unaccented Latin letters.
I've added a maximum word length of 49 in the format so that you don't overflow the char buffer.
I treat anything except letters as punctuation. That's a simple assumption, but it divides your input neatly into letter/punctuation sequences. You can negate the letters you want to include with a caret ^ as the first letter inside the brackets.
You should probably do a (possible empty) scan of punctuation first, so that the real scanning starts with a letter.
The reason is that you're consuming the punctuation before you get to the ignore clause.
Try: "%[^,.\":]%*[,.\":]"

Using fscanf to get 2 string in 2 columns

I'm trying to make my program read a file with 2 columns and the first column contains some strings and i cant make it to store into an array
Here is my code:
fp2=fopen("Symbol Table.txt","r");char str[100];
while(fscanf(fp2,"%s %s",str,stemp[scnt])!=NULL) {
puts(stemp[scnt++]);getch(); //This is just here to display conents of second col
}
fclose(fp2)
and here is my txt file:
void void
main Main
( Left Parenthesis
) Right Parenthesis
{ Left Brace
S Identifier
: Colon
$% Start of Block Comment
This program is a simple calculatorFuctions:ADD,SUB,MULT,DIV String
%$ End of Block Comment
Unsigned Noise Words
int Integer
the code store the long string is divided before going into the array
When you try to read strings with scanf or fscanf it reads the string until it encounters a space, a tab, or a newline. In this case, as I can see, in the first loop the str and stemp will be assigned to the stings "void" and "void", in the second loop, it will be "main" and "Main" and in the third loop, "(" and "Left" and so on.
You need to separate the two columns with multiple strings with a special character like the tab (\t) character so that one knows when the 1st column ends and read with the getc function instead of scanf until you reach that special character in your line and then combine all the characters (i.e. letters) in to a string. You can adopt the same method to read the second column with a getc function until you encounter the newline (\n).
while(fscanf(fp2,"%s %s",str,stemp[scnt])!=NULL) {
is not going to work
"%s" scans for non- white-space text. "This program is" and "Left Parenthesis" have embedded white-space.
Need to read a line-at-a-time and then parse the line into 2 strings for the 2 columns.
char buf[100];
if (fgets(buf, sizeof buf, fp2) == NULL) Handle_EOForIOError();
column[2][100];
size_t len = strlen(buf);
if (len < 62) Handle_ShortLine();
buf[61] = = '\0'; // cut the line in 2
column[0][0] = '\0'; // In case column is all white-space
sscanf(buf, " %[^\n]", column[0]);
column[1][0] = '\0';
sscanf(&buf[62], " %[^\n]", column[1]);
Other answers contain statements which are not true. Actually,
fscanf does not always read strings only until a white-space character. The conversion specifier [ also reads a string, and includes the set of expected characters.
You do not need to separate the two columns with a special character, since the columns are of known width 62.
You do not need to read a line at a time and then parse it; though, that might be easier.
So, if you prefer fscanf, you can use
while (fscanf(fp2, "%62[^\n]%[^\n]\n", str, stemp[scnt]) == 2)
(your comparing with NULL is wrong, because NULL is a pointer constant and fscanf doesn't return a pointer but the number of input items assigned or EOF).
A version without fscanf:
while (fgets(str, 62+1, fp2) && fgets(stemp[scnt], 62+1, fp2) && getc(fp2))
(The getc reads the \n.)
The Nature of while(fscanf(fp2,"%s %s",str,stemp[scnt])!=NULL) is will read only the two string separated by a space. if your text file contains long strings like this
This program is a simple calculatorFuctions:ADD,SUB,MULT,DIV
While fetching it will divide this strings into parts of %s %s and fetch. So change your text file in the following way and try:
void void
main Main
( LeftParenthesis
) RightParenthesis
{ LeftBrace
S Identifier
: Colon
$% StartofBlockComment
ThisProgramIsASimpleCalculatorFuctions:ADD,SUB,MULT,DIV String
%$ EndofBlockComment
Unsigned NoiseWords
int Integer

sscanf not extracting pattern

I am trying to figure out the pattern I should be giving to sscanf.
I have a string abcde(1GB). I want to extract 1 and GB. I am using
char list[]= "abcde(1GB)";
int memory_size =0;
char unit[3]={0} ;
sscanf(list, "%*s%d%s" , &memory_size, unit);
I do not see tokens extracted when I print I see memory_size =0 and NULL in unit.
Thanks
your sscanf() string format should be:
sscanf(list, "%*[^(](%d%[^)]" , &memory_size, unit);
%[^)] means catch charachters and stop ctaching when finding the charachter ) or end of the string
%*[^(] means:
[^\(] means catch charachters and stop ctaching when finding the charachter ( - as opposed to a more conventional %s - catching charachters and stop ctaching when finding space characters"
* means "read but not store"
The %s conversion specifier in the format string of scanf means that scanf will read a sequence of non-whitespace characters. The * in %*s is called the assignment suppression character. It means that scanf will read and discard the non-whitespace characters.
Since %s matches any sequence of non-whitespace characters, %*s in "%*s%d%s" means that sscanf will read an discard all the characters in the string list. Therefore, there's nothing left in the array list to be read and assigned to the arguments &memory_size and unit. This explains why memory_size and unit are unchanged. What you need is the format string
sscanf(list, "%*[^(](%d%2[^)]%*s", &memory_size, unit);
Here, in the format string "%*[^(](%d%2[^)]%*s" -
%*[^(] means that sscanf will first read and discard a sequence of character not containing the character (.
( means that sscanf will read and discard (.
%d means the sscanf will read a decimal integer.
%2[^)] means that sscanf will read at most a sequence of 2 characters not containing ) and store them in the corresponding argument. This is to ensure that sscanf does not overrun the buffer unit in case the string to be stored is too large. It's one less than the size of unit to save space for the terminating null byte which is automatically added by sscanf at the end of the buffer.
%*s means sscanf will read and discard any sequence of leftover non-whitespace characters.
plz go through the following link for the sscanf() function.
sscanf function description
Now,
as per MOHAMED said %[^(] is for catch the characters till "("
means %*[^(] truncate the string until ( after that %d as integer data and then capture the string until ).
After accept answer - supplemental
#MOHAMED answer is good, but below are candidate improvements.
1) Always check the sscanf() result to insure data was scanned as intended.
if (sscanf(list, "%*[^(](%d%[^)]", &memory_size, unit) != 2) ScanFailure();
2) When using the "%s" or "%[]" specifiers, include a limiting length. #ajay
char unit[3]={0} ;
// v--- 1 less than buffer size.
if (sscanf(list, "%*[^(](%d%2[^)]" , &memory_size, unit) != 2) ScanFailure();
3) Appending a "%n" (save scan position) is a sure-fire means of detecting the trailing ) was there and extra junk was not at the end of the input string.
"%n" does not add to sscaanf() result.
int n = 0;
// )%n <--- Look for ) then save scan position
if (sscanf(list, "%*[^(](%d%2[^)])%n", &memory_size, unit, &n) != 2) ||
list[n] != '\0') ScanFailure();
4) Making room for optional white-space may/may not be useful. Specifiers already allow optional leading white-space. 3 exceptions: %c %[] %n
"%*[^(](%d%2[^)])%n"
// v v v
" %*[^(](%d %2[^)]) %n"

Sscanf delimiters for parsing?

I am trying to parse the following string with sscanf:
query=testword&diskimg=simple.img
How can I use sscanf to parse out "testword" and "simple.img"? The delimiter arguments for sscanf really confuse me :/
Thank you!
If you know that the length of "testword" will always be 8 characters, you can do it like this:
char str[] = "query=testword&diskimg=simple.img";
char buf1[100];
char buf2[100];
sscanf(str, "query=%8s&diskimg=%s", buf1, buf2);
buf1 will now contain "testword" and buf2 will contain "simple.img".
Alternatively, if you know that testword will always be preceded by = and followed by &, and that simple.img will always be preceded by =, you can use this:
sscanf(str, "%*[^=]%*c%[^&]%*[^=]%*c%s", buf1, buf2);
It's pretty cryptic, so here's the summary: each % designates the start of a chunk of text. If there's a * following the %, that means that we ignore that chunk and don't store it in one of our buffers. The ^ within the brackets means that this chunk contains any number of characters that are not the characters within the brackets (excepting ^ itself). %s reads a string of arbitrary length, and %c reads a single character.
So to sum up:
We keep reading and ignoring characters if they are not =.
We read and ignore another character (the equal sign).
Now we're at testword, so we keep reading and storing characters into buf1 until we encounter the & character.
More characters to read and ignore; we keep going until we hit = again.
We read and ignore a single character (again, the equal sign).
Finally, we store what's left ("simple.img") into buf2.

Resources