How do I split a string into two strings (array name, index number) only if the string is matching the following string structure: "ArrayName[index]".
The array name can be 31 characters at most and the index 3 at most.
I found the following example which suppose to work with "Matrix[index1][index2]". I really couldn't understand how it does it in order to take apart the part I need to get my strings.
sscanf(inputString, "%32[^[]%*[[]%3[^]]%*[^[]%*[[]%3[^]]", matrixName, index1,index2) == 3
This try over here wasn't a success, what am I missing?
sscanf(inputString, "%32[^[]%*[[]%3[^]]", arrayName, index) == 2
How do I split a string into two strings (array name, index number) only if the string is matching the following string structure: "ArrayName[index]".
With sscanf, you don't. Not if you mean that you can rely on nothing being modified in the event that the input does not match the pattern. This is because sscanf, like the rest of the scanf family, processes its input and format linearly, without backtracking, and by design it fills input fields as they are successfully matched. Thus, if you scan with a format that assigns multiple fields or has trailing literal characters then it is possible for results to be stored for some fields despite a matching failure occurring.
But if that's ok with you then #gsamaras's answer provides a nearly-correct approach to parsing and validating a string according to your specified format, using sscanf. That answer also presents a nice explanation of the meaning of the format string. The problem with it is that it provides no way to distinguish between the input fully matching the format and the input failing to match at the final ], or including additional characters after.
Here is a variation on that code that accounts for those tail-end issues, too:
char array_name[32] = {0}, idx[4] = {0}, c = 0;
int n;
if (sscanf(str, "%31[^[][%3[^]]%c%n", array_name, idx, &c, &n) >= 3
&& c == ']' && str[n] == '\0')
printf("arrayName = %s\nindex = %s\n", array_name, idx);
else
printf("Not in the expected format \"ArrayName[idx]\"\n");
The difference in the format is the replacement of the literal terminating ] with a %c directive, which matches any one character, and the addition of a %n directive, which causes the number of characters of input read so far to be stored, without itself consuming any input.
With that, if the return value is at least 3 then we know that the whole format was matched (a %n never produces a matching failure, but docs are unclear and behavior is inconsistent on whether it contributes to the returned field count). In that event, we examine variable c to determine whether there was a closing ] where we expected to find one, and we use the character count recorded in n to verify that all characters of the string were parsed (so that str[n] refers to a string terminator).
You may at this point be wondering at how complicated and cryptic that all is. And you would be right to do so. Parsing structured input is a complicated and tricky proposition, for one thing, but also the scanf family functions are pretty difficult to use. You would be better off with a regex matcher for cases like yours, or maybe with a machine-generated lexical analyzer (see lex), possibly augmented by machine-generated parser (see yacc). Even a hand-written parser that works through the input string with string functions and character comparisons might be an improvement. It's still complicated any way around, but those tools can at least make it less cryptic.
Note: the above assumes that the index can be any string of up to three characters. If you meant that it must be numeric, perhaps specifically a decimal number, perhaps specifically non-negative, then the format can be adjusted to serve that purpose.
A naive example to get you started:
#include <stdio.h>
#include <string.h>
int main(void)
{
char str[] = "myArray[123]";
char array_name[32] = {0}, idx[4] = {0};
if(sscanf(str, "%31[^[][%3[^]]]", array_name, idx) == 2)
printf("arrayName = %s\nindex = %s\n", array_name, idx);
else
printf("Not in the expected format \"ArrayName[idx]\"\n");
return 0;
}
Output:
arrayName = myArray
index = 123
which will find easy not-in-the-expected format cases, such as "ArrayNameidx]" and "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOP[idx]", but not "ArrayName[idx".
The essence of sscanf() is to tell it where to stop, otherwise %s would read until the next whitespace.
This negated scanset %[^[] means read until you find an opening bracket.
This negated scanset %[^]] means read until you find a closing bracket.
Note: I used 31 and 3 as the width specifiers respectively, since we want to reserve the last slot for the NULL terminator, since the name of the array is assumed to be 31 characters at the most, and the index 3 at the most. The size of the array for its token is the max allowed length, plus one.
How can I use sscanf to analyze string data?
Use "%n" to detect a completed scan.
array name can be 31 characters at most and the index 3 at most.
For illustration, let us assume the index needs to limit to a numeric value [0 - 999].
Use string literal concatenation to present the format more clearly.
char name[32]; // array name can be 31 characters
#define NAME_FMT "%31[^[]"
char idx[4]; //
#define IDX_FMT "%3[0-9]"
int n = 0; // be sure to initialize
sscanf(str, NAME_FMT "[" IDX_FMT "]" "%n", array_name, idx, &n);
// Did scan complete (is `n` non-zero) with no extra text?
if (n && str[n] == '\0') {
printf("arrayName = %s\nindex = %d\n", array_name, atoi(idx));
} else {
printf("Not in the expected format \"ArrayName[idx]\"\n");
}
Related
I have some user input following this format:
Playa Raco#path#5#39.244|-0.257#0-23
The # here acts as a separator, and the | is also a separator for the latitude and longitude. I would like to extract this information. Note that the strings could have spaces.
I tried using the %[^\n]%*c formatter with scanf and adding # and |, but it doesn't work because it matches the whole line.
I would like to keep this as simple as possible, I know that I could do this reading each char, but I'm curious to see best practices and check if there is a scanf or similar alternative for this.
As mentioned in the comments, there are many ways you can parse the information from the string. You can walk a pair of pointers down the string, testing each character and taking the appropriate action, you can use strtok(), but note strtok() modifies the original string, so it cannot be used on a string-literal, you can use sscanf() to parse the values from the string, or you can use any combination of strcspn(), strspn(), strchr(), etc. and then manually copy each field between a start and end pointer.
However, your question also imposes "I would like to keep this as simple as possible..." and that points directly to sscanf(). You simply need to validate the return and you are done. For example, you could do:
#include <stdio.h>
#define MAXC 16 /* adjust as necessary */
int main (void) {
const char *str = "Playa Raco#path#5#39.244|-0.257#0-23";
char name[MAXC], path[MAXC], last[MAXC];
int num;
double lat, lon;
if (sscanf (str, "%15[^#]#%15[^#]#%d#%lf|%lf#%15[^\n]",
name, path, &num, &lat, &lon, last) == 6) {
printf ("name : %s\npath : %s\nnum : %d\n"
"lat : %f\nlon : %f\nlast : %s\n",
name, path, num, lat, lon, last);
}
else
fputs ("error: parsing values from str.\n", stderr);
}
(note: the %[..] conversion does not consume leading whitespace, so if there is a possibility of leading whitespace or a space following '#' before a string conversion, include a space in the format string, e.g. " %15[^#]# %15[^#]#%d#%lf|%lf# %15[^\n]")
Where each string portion of the input to be split is declared as a 16 character array. Looking at the format-string, you will note the read of each string is limited to 15 characters (plus the nul-terminating) character to ensure you do not attempt to store more characters than your arrays can hold. (that would invoke Undefined Behavior). Since there are six conversions requested, you validate the conversion by ensuring the return is 6.
Example Use/Output
Taking this approach, the output above would be:
./bin/parse_sscanf
name : Playa Raco
path : path
num : 5
lat : 39.244000
lon : -0.257000
last : 0-23
No one way is necessarily "better" than another so long as you validate the conversions and protect the array bounds for any character arrays filled. However, as far as simple as possible goes, it's hard to beat sscanf() here -- and it doesn't modify your original string, so it is safe to use with string-literals.
I have the following problem:
sscanf is not returning the way I want it to.
This is the sscanf:
sscanf(naru,
"%s[^;]%s[^;]%s[^;]%s[^;]%f[^';']%f[^';']%[^;]%[^;]%[^;]%[^;]"
"%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]"
"%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]",
&jokeri, &paiva1, &keskilampo1, &minlampo1, &maxlampo1,
&paiva2, &keskilampo2, &minlampo2, &maxlampo2, &paiva3,
&keskilampo3, &minlampo3, &maxlampo3, &paiva4, &keskilampo4,
&minlampo4, &maxlampo4, &paiva5, &keskilampo5, &minlampo5,
&maxlampo5, &paiva6, &keskilampo6, &minlampo6, &maxlampo6,
&paiva7, &keskilampo7, &minlampo7, &maxlampo7);
The string it's scanning:
const char *str = "city;"
"2014-04-14;7.61;4.76;7.61;"
"2014-04-15;5.7;5.26;6.63;"
"2014-04-16;4.84;2.49;5.26;"
"2014-04-17;2.13;1.22;3.45;"
"2014-04-18;3;2.15;3.01;"
"2014-04-19;7.28;3.82;7.28;"
"2014-04-20;10.62;5.5;10.62;";
All of the variables are stored as char paiva1[22] etc; however, the sscanf isn't storing anything except the city correctly. I've been trying to stop each variable at ;.
Any help how to get it to store the dates etc correctly would be appreciated.
Or if there's a smarter way to do this, I'm open to suggestions.
There are multiple problems, but BLUEPIXY hit the first one — the scan-set notation doesn't follow %s.
Your first line of the format is:
"%s[^;]%s[^;]%s[^;]%s[^;]%f[^';']%f[^';']%[^;]%[^;]%[^;]%[^;]"
As it stands, it looks for a space separated word, followed by a [, a ^, a ;, and a ] (which is self-contradictory; the character after the string is a space or end of string).
The first fixup would be to use scan-sets properly:
"%[^;]%[^;]%[^;]%[^;]%f[^';']%f[^';']%[^;]%[^;]%[^;]%[^;]"
Now you have a problem that the first %[^;] scans everything up to the end of string or first semicolon, leaving nothing for the second %[;] to match.
"%[^;]; %[^;]; %[^;]; %[^;]; %f[^';']%f[^';']%[^;]%[^;]%[^;]%[^;]"
This looks for a string up to a semicolon, then for the semicolon, then optional white space, then repeats for three items. Apart from adding a length to limit the size of string, preventing overflow, these are fine. The %f is OK. The following material looks for an odd sequence of characters again.
However, when the data is looked at, it seems to consist of a city, and then seven sets of 'a date plus three numbers'.
You'd do better with an array of structures (if you've worked with those yet), or a set of 4 parallel arrays, and a loop:
char jokeri[30];
char paiva[7][30];
float keskilampo[7];
float minlampo[7];
float maxlampo[7];
int eoc; // End of conversion
int offset = 0;
char sep;
if (fscanf(str + offset, "%29[^;]%c%n", jokeri, &sep, &eoc) != 2 || sep != ';')
...report error...
offset += eoc;
for (int i = 0; i < 7; i++)
{
if (fscanf(str + offset, "%29[^;];%f;%f;%f%c%n", paiva[i],
&keskilampo[i], &minlampo[i], &maxlampo[i], &sep, &eoc) != 5 ||
sep != ';')
...report error...
offset += eoc;
}
See also How to use sscanf() in loops.
Now you have data that can be managed. The set of 29 separately named variables is a ghastly thought; the code using them will be horrid.
Note that the scan-set conversion specifications limit the string to a maximum length one shorter than the size of jokeri and the paiva array elements.
You might legitimately be wondering about why the code uses %c%n and &sep before &eoc. There is a reason, but it is subtle. Suppose that the sscanf() format string is:
"%29[^;];%f;%f;%f;%n"
Further, suppose there's a problem in the data that the semicolon after the third number is missing. The call to sscanf() will report that it made 4 successful conversions, but it doesn't count the %n as an assignment, so you can't tell that sscanf() didn't find a semicolon and therefore did not set &eoc at all; the value is left over from a previous call to sscanf(), or simply uninitialized. By using the %c to scan a value into sep, we get 5 returned on success, and we can be sure the %n was successful too. The code checks that the value in sep is in fact a semicolon and not something else.
You might want to consider a space before the semi-colons, and before the %c. They'll allow some other data strings to be converted that would not be matched otherwise. Spaces in a format string (outside a scan-set) indicate where optional white space may appear.
I would use strtok function to break your string into pieces using ; as a delimiter. Such a long format string may be a source of problems in future.
I want to compare my string to a giving format.
the format that I want to use in the check is :
"xxx://xxx:xxx#xxxxxx" // all the xxx are with variable length
so I used the sscanf() as follow :
if (sscanf(stin,"%*[^:]://%*[^:]:%*[^#]#") == 0) { ... }
is it correct to compare the return of scanf to 0 in this case?
You will only get zero back if all the fields match; but that won't tell you diddly-squat in practice. It might have failed with a colon in the first character and it would still return 0.
You need at least one conversion in there that is counted (%n is not counted), and that occurs at the end so you know that what went before also matched. You can never tell if trailing context (data after the last conversion specification) matched, and sscanf() won't backup if it has converted data, even if backing up would allow the trailing context to match.
For your scenario, that might be:
char c;
int n;
if (sscanf(stin, "%*[^:]://%*[^:]:%*[^#]#%n%c", &n, &c) == 1)
This requires at least one character after the #. It also tells you how many characters there were up to and including the #.
OP's suggestion is close.
#Jonathan Leffler is correct in that comparing the result of a specifier-less sscanf() against 0 does not distinguish between a match and no-match.
To test against "xxx://xxx:xxx#xxxxxx", (and assuming any part with "x" needs at least 1 matching), use
int n = 0;
sscanf(stin, "%*[^:]://%*[^:]:%*[^#]#%*c%n", &n);
if (n > 0) {
match();
}
There is a obscure hole using this method with fscanf(). A stream of data with a \0 is a problem.
I am reading input from file (line by line) Each line is a state of a game board. Below is example of input:
(8,7,1,0,0,0,b,b,b,b,b,b,b,b,b,b,b,b,s,s,r,r,g,b,r,g,r,r,r,r,b,r,r,s,b,b,b,b,r,s,s,r,b,b,r,s,s,s,r,b,g,b,r,r,r,r,r,r,r,r,r,s) 0
I have used fgets() and strtok() to split the string at (),
My problem:
I want the first 6 integers in their individual variables such as:
int column = 8
int row = 7
so on..
I want to get rid of the last integer at the end of input- 0
and the chars should be stored in an array, because they represent pieces of a board.
Right now, I have an array with all the integers and chars stored together.
I can iterate through my array, and copy the integers to their variables and then chars to a new array. But that's inefficient.
Is there another way to do it?
I used fscanf() but don't know how to split the string using delimiters.
Thanks
WELL-FORMED INPUT ONLY
if (fscanf(FILE_PTR, "(%d,%d,...,%c,%c,%c,...,%c) %*d", &column, &row, ..., &chars[0], &chars[1], ...) == 60)
or something like that
the %*d specifier will discard that input (you didn't want the last number)
for the chars, give pointers to their indices for a preallocated array
for the ints, give the variable pointer/ref
Thank you to Jon Leffler for reminding that you should test the output of *scanf (number of things read)!
More information
REEDIT nope, it was right -
int fscanf ( FILE * stream, const char * format, ... );
format: C string that contains a sequence of characters that control how characters extracted from the stream are treated:
Whitespace character: the function will read and ignore any whitespace characters encountered before the next non-whitespace character (whitespace characters include spaces, newline and tab characters -- see isspace). A single whitespace in the format string validates any quantity of whitespace characters extracted from the stream (including none).
Non-whitespace character, except format specifier (%): Any character that is not either a whitespace character (blank, newline or tab) or part of a format specifier (which begin with a % character) causes the function to read the next character from the stream, compare it to this non-whitespace character and if it matches, it is discarded and the function continues with the next character of format. If the character does not match, the function fails, returning and leaving subsequent characters of the stream unread.
Format specifiers: A sequence formed by an initial percentage sign (%) indicates a format specifier, which is used to specify the type and format of the data to be retrieved from the stream and stored into the locations pointed by the additional arguments.
Above quote from here. I am aware of the hostility towards cplusplus.com here but I do not have access to the standard. please feel free to edit if you do
I have used fgets() and strtok() to split the string at "()"
later
I used fscanf() but don't know how to split the string using delimiters.
I guess if strtok() worked for parenthesis, it would work for commas too.
Apart from that: you have several possibilities for doing what you want. Without much context provided, I can't really tell you which one you want, but here we go:
Grab a pointer to the first non-integer, and use it as if it was a pointer to the first element of another array, containing the integers only. This avoids all copying and/or moving overhead.
Use memcpy() to copy only the necessary parts of the array to another array. memcpy() is generally highly optimized and faster than the naive for-loop-with-assignment approach.
if you have a char * you can think of it as an array or as a string, as the memory layout is the same...
char * input = "(8,7,1,0,0,0,b,b,b,b,b,b,b,b,b,b,b,b,s,s,r,r,g,b,r,g,r,r,r,r,b,r,r,s,b,b,b,b,r,s,s,r,b,b,r,s,s,s,r,b,g,b,r,r,r,r,r,r,r,r,r,s) 0";
size_t len = strlen(input);
int currentIndex = 0;
char * output = calloc(1,len);
for (int i = 0 ; i<len ; i++)
{
if (input[i] == '(' || input[i] == ')' || input[i] == ','|| input[i] == ' ') {
continue;
}
output[currentIndex++] = input[i];
}
assert(strlen(output) == 63); //well formatted?
char a = output[0];
char b = output[1];
char (* board)[60] = malloc(60); //pointer to array or is it a mal-formed string.
memcpy(board, output+2, 60);
char last = output[62];
the main thing that I would add, if you want to use it more like a string, then you have to make the array 1 bigger and set board[60] = \0;
I have a file where each line looks like this:
cc ssssssss,n
where the two first 'c's are individual characters, possibly spaces, then a space after that, then the 's's are a string that is 8 or 9 characters long, then there's a comma and then an integer.
I'm really new to c and I'm trying to figure out how to put this into 4 seperate variables per line (each of the first two characters, the string, and the number)
Any suggestions? I've looked at fscanf and strtok but i'm not sure how to make them work for this.
Thank you.
I'm assuming this is a C question, as the question suggests, not C++ as the tags perhaps suggest.
Read the whole line in.
Use strchr to find the comma.
Do whatever you want with the first two characters.
Switch the comma for a zero, marking the end of a string.
Call strcpy from the fourth character on to extract the sssssss part.
Call atoi on one character past where the comma was to extract the integer.
A string is a sequence of characters that ends at the first '\0'. Keep this in mind. What you have in the file you described isn't a string.
I presume n is an integer that could span multiple decimal places and could be negative. If that's the case, I believe the format string you require is "%2[^ ] %9[^,\n],%d". You'll want to pass fscanf the following expressions:
Your FILE *,
The format string,
An array of 3 chars silently converted to a pointer,
An array of 9 chars silently converted to a pointer,
... and a pointer to int.
Store the return value of fscanf into an int. If fscanf returns negative, you have a problem such as EOF or some other read error. Otherwise, fscanf tells you how many objects it assigned values into. The "success" value you're looking for in this case is 3. Anything else means incorrectly formed input.
I suggest reading the fscanf manual for more information, and/or for clarification.
fscanf function is very powerful and can be used to solve your task:
We need to read two chars - the format is "%c%c".
Then skip a space (just add it to the format string) - "%c%c ".
Then read a string until we hit a comma. Don't forget to specify max string size. So, the format is "%c%c %10[^,]". 10 - max chars to read. [^,] - list of allowed chars. ^, - means all except a comma.
Then skip a comma - "%c%c %10[^,],".
And finally read an integer - "%c%c %10[^,],%d".
The last step is to be sure that all 4 tokens are read - check fscanf return value.
Here is the complete solution:
FILE *f = fopen("input_file", "r");
do
{
char c1 = 0;
char c2 = 0;
char str[11] = {};
int d = 0;
if (4 == fscanf(f, "%c%c %10[^,],%d", &c1, &c2, str, &d))
{
// successfully got 4 values from the file
}
}
while(!feof(f));
fclose(f);