How to read string until two consecutive spaces? - c

A well known function of the scanf() functions is that you can pass a format to scan input according to this format.
For my case, I cannot seem to find a solution searching this and this documentation.
I have a string (sInput) as the following:
#something VAR1 this is a constant string //some comment
Where VAR1 is meant to be the name of the constant string this is a constant.
Now I am scanning this string like this:
if(sscanf(sInput, "%*s %s %s", paramname, constantvalue) != 2)
//do something
And of course, when I output paramname and constantvalue I get:
VAR1
this
But I would like to have constantvalue to contain the string until two consecutive spaces are found (so it would contain the part this is a constant string).
So therefore I tried:
sscanf(sInput, "%*s %s %[^( )]s", paramname, constantvalue)
sscanf(sInput, "%*s %s %[^ ]s", paramname, constantvalue)
But without luck. Is there a way to achieve my goal with sscanf()? Or should I implement another way of storing the string?

The scanf family of functions are good for simple parsing, but not for more complicated things like you seem to do.
You could probably solve it by using e.g. strstr to find the comment starter "//", terminate the string there, and then remove trailing space.

Related

Extract formatted input from user

I have some user input following this format:
Playa Raco#path#5#39.244|-0.257#0-23
The # here acts as a separator, and the | is also a separator for the latitude and longitude. I would like to extract this information. Note that the strings could have spaces.
I tried using the %[^\n]%*c formatter with scanf and adding # and |, but it doesn't work because it matches the whole line.
I would like to keep this as simple as possible, I know that I could do this reading each char, but I'm curious to see best practices and check if there is a scanf or similar alternative for this.
As mentioned in the comments, there are many ways you can parse the information from the string. You can walk a pair of pointers down the string, testing each character and taking the appropriate action, you can use strtok(), but note strtok() modifies the original string, so it cannot be used on a string-literal, you can use sscanf() to parse the values from the string, or you can use any combination of strcspn(), strspn(), strchr(), etc. and then manually copy each field between a start and end pointer.
However, your question also imposes "I would like to keep this as simple as possible..." and that points directly to sscanf(). You simply need to validate the return and you are done. For example, you could do:
#include <stdio.h>
#define MAXC 16 /* adjust as necessary */
int main (void) {
const char *str = "Playa Raco#path#5#39.244|-0.257#0-23";
char name[MAXC], path[MAXC], last[MAXC];
int num;
double lat, lon;
if (sscanf (str, "%15[^#]#%15[^#]#%d#%lf|%lf#%15[^\n]",
name, path, &num, &lat, &lon, last) == 6) {
printf ("name : %s\npath : %s\nnum : %d\n"
"lat : %f\nlon : %f\nlast : %s\n",
name, path, num, lat, lon, last);
}
else
fputs ("error: parsing values from str.\n", stderr);
}
(note: the %[..] conversion does not consume leading whitespace, so if there is a possibility of leading whitespace or a space following '#' before a string conversion, include a space in the format string, e.g. " %15[^#]# %15[^#]#%d#%lf|%lf# %15[^\n]")
Where each string portion of the input to be split is declared as a 16 character array. Looking at the format-string, you will note the read of each string is limited to 15 characters (plus the nul-terminating) character to ensure you do not attempt to store more characters than your arrays can hold. (that would invoke Undefined Behavior). Since there are six conversions requested, you validate the conversion by ensuring the return is 6.
Example Use/Output
Taking this approach, the output above would be:
./bin/parse_sscanf
name : Playa Raco
path : path
num : 5
lat : 39.244000
lon : -0.257000
last : 0-23
No one way is necessarily "better" than another so long as you validate the conversions and protect the array bounds for any character arrays filled. However, as far as simple as possible goes, it's hard to beat sscanf() here -- and it doesn't modify your original string, so it is safe to use with string-literals.

How to combine multiple arrays as a string

I have multiple different parts of data that I am trying to combine into one single array, as though I'm writing a string to it and am trying to find a method of doing so. My current attempt is shown below, but of course doesn't work. I was hoping someone could point me into the correct direction
newline = "%s %s\t%d\t%d %d %d \t%.2f\n",
arr_student[printing].fname, arr_student[printing].sname, arr_student[printing].UP_no, arr_student[printing].marks_1,
arr_student[printing].marks_2, arr_student[printing].marks_3, arr_student[printing].average_mark;
If you're trying to create a string with that information, you should use the sprintf function which will generate a string according to your format string and format parameters:
Edit: As pointed out by #PeteKirkham, you should use the snprintf function instead, which allows you to specify the maximum number of bytes (or characters) to write to the output string
char newline[100]; // or however many characters you want to allocate for
snprintf(newline, 100, "%s %s\t%d\t%d %d %d \t%.2f\n",
arr_student[printing].fname, arr_student[printing].sname, arr_student[printing].UP_no, arr_student[printing].marks_1,
arr_student[printing].marks_2, arr_student[printing].marks_3, arr_student[printing].average_mark); // again, replace 100 with however many characters you are expecting to write

Using sscanf to read strings with white spaces in C

I am trying to get use sscanf to scan a string of text and store the values into an array. When it comes to storing the last string it stops scanning when it comes to a white space. For example in the below string it would only store the word "STRING". I have tried using %[^ \t\n] and the other various specifiers but it seems I am missing something.
I just cant get the function to include white space, im sure its probably something simple.
string test = "9999:STRING OF TEXT";
scan = sscanf(test, "%d:%s", rec[i].ref, rec[i].string);
You should have posted a minimal working code.
However, the issue is most likely that %s does not skip white space as do the numerical formats such as %f and %d. Use something like sscanf(test, "%d:%[^\n]", rec[i].ref, rec[i].string); to capture whatever is after :.
Look here for details: [http://www.cplusplus.com/reference/cstdio/sscanf/][1]
So this does not work?
sscanf(test, "%d:%[^\t\n]", rec[i].ref, rec[i].string);
check out this answer : reading a string with spaces with sscanf
Basically this will will match the number, followed by anything that is not in the brackets (tab, newline), note the ^ symbol.

fscanf problem with reading in String

I'm reading in a .txt file. I'm using fscanf to get the data as it is formatted.
The line I'm having problems with is this:
result = fscanf(fp, "%s", ap->name);
This is fine until I have a name with a whitespace eg: St Ives
So I use this to read in the white space:
result = fscanf(fp, "%[^\n]s", ap->name);
However, when I try to read in the first name (with no white space) it just doesn't work and messes up the other fscanf.
But I use the [^\n] it works fine within a different file I'm using. Not sure what is happening.
If I use fgets in the place of the fscanf above I get "\n" in the variable.
Edit//
Ok, so if I use:
result = fscanf(fp, "%s", ap->name);
result = fscanf(fp, "%[^\n]s", ap->name);
This allows me to read in a string with no white space. But When I get a "name" with whitespace it doesn't work.
One problem with this:
result = fscanf(fp, "%[^\n]s", ap->name);
is that you have an extra s at the end of your format specifier. The entire format specifier should just be %[^\n], which says "read in a string which consists of characters which are not newlines". The extra s is not part of the format specifier, so it's interpreted as a literal: "read the next character from the input; if it's an "s", continue, otherwise fail."
The extra s doesn't actually hurt you, though. You know exactly what the next character of input: a newline. It doesn't match, and input processing stops there, but it doesn't really matter since it's the end of your format specifier. This would cause problems, though, if you had other format specifiers after this one in the same format string.
The real problem is that you're not consuming the newline: you're only reading in all of the characters up to the newline, but not the newline itself. To fix that, you should do this:
result = fscanf(fp, "%[^\n]%*c", ap->name);
The %*c specifier says to read in a character (c), but don't assign it to any variable (*). If you omitted the *, you would have to pass fscanf() another parameter containing a pointer to a character (a char*), where it would then store the resulting character that it read in.
You could also use %[^\n]\n, but that would also read in any whitespace which followed the newline, which may not be what you want. When fscanf finds whitespace in its format specifier (a space, newline, or tab), it consumes as much whitespace as it can (i.e. you can think of it consuming the longest string that matches the regular expression [ \t\n]*).
Finally, you should also specify a maximum length to avoid buffer overruns. You can do this by placing the buffer length in between the % and the [. For example, if ap->name is a buffer of 256 characters, you should do this:
result = fscanf(fp, "%255[^\n]%*c", ap->name);
This works great for statically allocated arrays; unfortunately, if the array is dyamically sized at runtime, there's no easy to way to pass the buffer size to fscanf. You'll have to create the format string with sprintf, e.g.:
char format[256];
snprintf(format, sizeof(format), "%%%d[^\n]%%*c", buffer_size - 1);
result = fscanf(fp, format, ap->name);
Jumm wrote:
If I use fgets in the place of the fscanf above I get "\n" in the variable.
Which is a far easier problem to solve so go with it:
fgets( ap->name, MAX, fp ) ;
nlptr = strrchr ( ap->name, '\n' ) ;
if( nlptr != 0 )
{
*nlptr = '\0' ;
}
I'm not sure how you mean [^\n] is suppose to work. [] is a modifier which says "accept one character except any of the characters which is inside this block". The ^ inverts the condition. %s with fscanf only reads until it comes across a delimiter. For strings with spaces and newlines in them, use a combination of fgets and sscanf instead, and specify a restriction on the length.
There is no such thing as I gather you are trying to imply a regular expression in the fscanf function which does not exist, not that to my knowledge nor have I seen it anywhere - enlighten me on this.
The format specifier for reading a string is %s, it could be that you need to do it this way, %s\n which will pick up the newline.
But for pete's sake do not use the standard old gets family functions as specified by Clifford's answer above as that is where buffer overflows happen and was used in a infamous worm of the 1990's - the Morris Worm, more specifically in the fingerd daemon, that used to call gets that caused chaos. Fortunately, now, that has now been patched. And furthermore, a lot of programmers have been drilled into the mentality not to use the function.
Even Microsoft has adopted a safe version of gets family of functions, that specifies a parameter to indicate the length of buffer instead.
EDIT
My bad - I did not realize that Clifford indeed has specified the max length for input...Whoops! Sorry! Clifford's answer is correct! So +1 to Clifford's answer.
Thanks Neil for pointing out my error...
Hope this helps,
Best regards,
Tom.
I found the problem.
As Paul Tomblin said, I had an extra new line character in the field above. So using what tommieb75 said I used:
result = fscanf(fp, "%s\n", ap->code);
result = fscanf(fp, "%[^\n]s", ap->name);
And this fixed it!
Thanks for your help.

Can scanf identify a format character within a string?

Let's say that I expect a list of items from the standard input which are separated buy commas, like this:
item1, item2, item3,...,itemn
and I also want to permit the user to emit white-spaces between items and commas, so this kind of input is legal in my program:
item1,item2,item3,...,itemn
If I use scanf like this:
scanf("%s,%s,%s,%s,...,%s", s1, s2, s3, s4,...,sn);
it will fail when there are no white-spaces (I tested it) because it will refer to the whole input as one string. So how can I solve this problem only with C standard library functions?
The quick answer is never, ever use scanf to read user input. It is intended for reading strictly formatted input from files, and even then isn't much good. At the least, you should be reading entire lines and then parsing them with sscanf(), which gives you some chance to correct errors. at best you should be writing your own parsing functions
If you are actually using C++, investigate the use of the c++ string and stream classes, which are much more powerful and safe.
You could have a look at strtok. First read the line into a buffer, then tokenize:
const int BUFFERSIZE = 32768;
char buffer[BUFFERSIZE];
fgets(buffer, sizeof(buffer), stdin);
const char* delimiters = " ,\n";
char* p = strtok(buffer, delimiters);
while (p != NULL)
{
printf("%s\n", pch);
p = strtok(NULL, delimiters);
}
However, with strtok you'll need to be aware of the potential issues related to reentrance.
I guess it is better to write your own parsing function for this. But if you still prefer scanf despite of its pitfalls, you can do some workaround, just substitute %s with %[^, \t\r\n].
The problem that %s match sequence of non white space characters, so it swallows comma too. So if you replace %s with %[^, \t\r\n] it will work almost the same (difference is that %s uses isspace(3) to match space characters but in this case you explicitly specify which space characters to match and this list probably not the same as for isspace).
Please note, if you want to allow spaces before and after comma you must add white space to your format string. Format string "%[^, \t\r\n] , %[^, \t\r\n]" matches strings like "hello,world", "hello, world", "hello , world".

Resources