I am parsing a file with formatted strings. Need some help with parsing.
Consider below example.
int main()
{
char value[32], name[32];
int buff_ret1, buff_ret2;
char *buff = "1000000:Hello";
char *buff_other ="200000:\0";
buff_ret1 = sscanf(buff,"%[^:]:%s", value, name);
printf("buff_ret1 is %d\n", buff_ret1);
buff_ret2 = sscanf(buff_other,"%[^:]:%s", value, name);
printf("buff_ret2 is %d\n", buff_ret2);
return 0;
}
I am expecting value of buff1_ret and and buff2_ret to be 2, but buff_ret2 value is coming as 1. I understand that it is not considering NUL. Is there a way I can say to sscanf function to consider NUL as a character to read.
No, this is not possible. From sscanf
Reaching the end of the string in sscanf() shall be equivalent to encountering end-of-file for fscanf().
This means \0 (end of string) is interpreted as end of file.
Related
I am trying to read a file that contains lines in this format abc=1234. How can I make fscanf ignore the = and store str1="abc" and str2="1234"?
I tried this:
fscanf(fich1, "%[^=]=%[^=]" , palavra, num_char)
I'd recommend using fgets to read lines and then parse them with sscanf. But you can use the same principle for just fscanf if you want.
#include <stdio.h>
int main(void) {
char buf[100];
char str1[100];
char str2[100];
if(! fgets(buf, sizeof buf, stdin)) return 1;
if(sscanf(buf, "%[^=]=%s", str1, str2) != 2) return 1;
puts(str1);
puts(str2);
}
So what does %[^=]=%s do? First %[^=] reads everything until the first occurrence of = and stores it in str1. Then it reads a = and discards it. Then it reads a string to str2. And here you can see the problem with your format string. %[^=] expects the string to end with =, and you have another one at the end. So you would have a successful read of the string abc=1234=.
Note that %[^=] and %s treats white space a little differently. So if that's a concern, you need to account for that. For example with %[^=]=%[^\n].
And in order to avoid buffer overflow, you also might want to do %99[^=]=%99[^\n].
I have tried this code to separate my Str[] string into 2 string, but my problem is "I want to separate John(name) as string and 100(marks) as integer",How can I do it, any suggestion?
#include <stdio.h>
#include <string.h>
void main()
{
char Str[] = "John,100";
int i, j, xchange;
char name[50];
char marks[10];
j = 0; xchange = 0;
for(i=0; Str[i]!='\0'; i++){
if(Str[i]!=',' && xchange!=-1){
name[i] = Str[i];
}else{
xchange = -1;
}
if(xchange==-1){
marks[j++] = Str[i+1];
}
}
printf("Student name is %s\n", name);
printf("Student marks is %s", marks);
}
How to separate "John,100" into 2 strings?
There are three common approaches:
Use strtok() to split the string into individual tokens. This will modify the original string, but is quite simple to implement:
int main(void)
{
char line[] = "John,100;passed";
char *name, *score, *status;
/* Detach the initial part of the line,
up to the first comma, and set name
to point to that part. */
name = strtok(line, ",");
/* Detach the next part of the line,
up to the next comma or semicolon,
setting score to point to that part. */
score = strtok(NULL, ",;");
/* Detach the final part of the line,
setting status to point to it. */
status = strtok(NULL, "");
Note that if you change char line[] = "John,100"; then status will be NULL, but the code is otherwise safe to run.
So, in practice, if you required all three fields to exist in line, it would be sufficient to ensure the last one was not NULL:
if (!status) {
fprintf(stderr, "line[] did not have three fields!\n");
return EXIT_FAILURE;
}
Use sscanf() to convert the string. For example,
char line[] = "John,100";
char name[20];
int score;
if (sscanf(line, "%19[^,],%d", name, &score) != 2) {
fprintf(stderr, "Cannot parse line[] correctly.\n");
return EXIT_FAILURE;
}
Here, the 19 refers to the number of chars in name (one is always reserved for the end-of-string nul char, '\0'), and [^,] is a string conversion, consuming everything except a comma. %d converts an int. The return value is the number of successful conversions.
This approach does not modify the original string, and it allows you to try a number of different parsing patterns; as long as you try them the most complex one first, you can allow multiple input formats with very little added code. I do this regularly when taking 2D or 3D vectors as inputs.
The downside is that sscanf() (all functions in the scanf family) ignores overflow. For example, on 32-bit architectures, the largest int is 2147483647, but scanf functions will happily convert e.g. 9999999999 to 1410065407 (or some other value!) without returning an error. You can only assume the numerical inputs are sane and within the limits; you cannot verify.
Use helper functions to tokenise and/or parse the string.
Typically, the helper functions are something like
char *parse_string(char *source, char **to);
char *parse_long(char *source, long *to);
where source is a pointer to the next character in the string to be parsed, and to is a pointer to where the parsed value will be stored; or
char *next_string(char **source);
long next_long(char **source);
where source is a pointer to a pointer to the next character in the string to be parsed, and the return value is the value of the extracted token.
These tend to be longer than above, and if written by me, then quite paranoid about the inputs they accept. (I want my programs to complain if their input cannot be reliably parsed, rather than silently produce garbage.)
If the data is some variant of CSV (comma-separated values) read from a file, then the proper approach is a different one: instead of reading the file line by line, you read the file token by token.
The only "trick" is to remember the separator character that ended the token (you can use ungetc() for this), and use a different function to (read and ignore the rest of the tokens in the current record, and) consume the newline separator.
#include <stdio.h>
int main()
{
char string[80]="abcdef";
char buffer[80];
int num;
sscanf(string,"%*[^0-9a-fA-F]%n%s",&num,buffer);
printf("%d\n",num);
puts(buffer);
return 0;
}
Output:
-149278720
And what I expect is
0
abcdef
I believe that the regex %*[^0-9a-fA-F] discards all characters other than "xdigits", however, when the first character in the string is a "xdigit", sscanf seems to return instantly.
How can I fix this?
%*[^0-9a-fA-F] matches a non-empty sequence of characters that aren't in the character set. Since you don't have any non-hexdigits at the beginning of the string, this conversion fails and sscanf returns immediately.
As far as I can tell, there's no way to make this optional in sscanf. If you just want to skip over the non-hexdigits, use strcspn().
num = strcspn(string, "0123456789abcdefABCDEF");
strcpy(buf, string+num);
I'm teaching myself programming. I read that a string stored in an array of characters can be indexed to extract the nth character.
However, I've been trying to solve this for hours: I realized trying to solve an exercise that I can only access the first character of the array myarray[0]; whereas the rest of index (1,2,3...) values will return nothing. However, if I use the puts function it does return the whole string. Curious thing: strlen is returning the length of my array +1.
example:
int main (void)
{
char myarray[1000]={0};
int i;
fgets(myarray,1000,stdin);
for(i=0;i<strlen(myarray);i++)
printf("myarray[%d]:%c\n",i,myarray[i]);
printf("\n");
printf("strlen:%d\n",strlen(myarray));
puts(myarray);
return 0;
}
input:
6536
output:
strlen:5
myarray[0]:6
myarray[1]:
myarray[2]:
myarray[3]:
myarray[4]:
6536
You are getting this result most probably because of undefined behavior of your program. You are using wrong format specifier to print a size_t type (strlenreturn size_t type). Change the format specifier to %zu.
Also note that in for loop you need to declare i as size_t type.
Here is the fixed code: http://ideone.com/0sMadV
fgets writes a newline character \n into the buffer to represent newlines in the input stream. Thus strlen returns 5.
The actual output is :
6536
myarray[0]:6
myarray[1]:5
myarray[2]:3
myarray[3]:6
myarray[4]:
strlen:5
6536
As you can see, myarray[4] stores the newline (due to fgets). Also, it would be better to calculate strlen once by placing it above the loop, instead of in every iteration.
From here:
char *fgets(char *restrict s, int n, FILE *restrict stream);
The fgets() function shall read bytes from stream into the array
pointed to by s, until n-1 bytes are read, or a is read and
transferred to s, or an end-of-file condition is encountered. The
string is then terminated with a null byte.
A simple way is to for strlen(var)-1. Another way is to remove the newline with null terminating character:
if (myarray[length - 1] == '\n')
myarray[length - 1] = '\0';
where length = strlen(myarray);
The strlen counts the '\n' at the end of the string. You can "fix" it with strtok:
strtok(myarray, "\n");
While I could use strings, I would like to understand why this small example I'm working on behaves in this way, and how can I fix it ?
int ReadInput() {
char buffer [5];
printf("Number: ");
fgets(buffer,5,stdin);
return atoi(buffer);
}
void RunClient() {
int number;
int i = 5;
while (i != 0) {
number = ReadInput();
printf("Number is: %d\n",number);
i--;
}
}
This should, in theory or at least in my head, let me read 5 numbers from input (albeit overwriting them).
However this is not the case, it reads 0, no matter what.
I understand printf puts a \0 null terminator ... but I still think I should be able to either read the first number, not just have it by default 0. And I don't understand why the rest of the numbers are OK (not all 0).
CLARIFICATION: I can only read 4/5 numbers, first is always 0.
EDIT:
I've tested and it seems that this was causing the problem:
main.cpp
scanf("%s",&cmd);
if (strcmp(cmd, "client") == 0 || strcmp(cmd, "Client") == 0)
RunClient();
somehow.
EDIT:
Here is the code if someone wishes to compile. I still don't know how to fix
http://pastebin.com/8t8j63vj
FINAL EDIT:
Could not get rid of the error. Decided to simply add #ReadInput
int ReadInput(BOOL check) {
...
if (check)
printf ("Number: ");
...
# RunClient()
void RunClient() {
...
ReadInput(FALSE); // a pseudo - buffer flush. Not really but I ignore
while (...) { // line with garbage data
number = ReadInput(TRUE);
...
}
And call it a day.
fgets reads the input as well as the newline character. So when you input a number, it's like: 123\n.
atoi doesn't report errors when the conversion fails.
Remove the newline character from the buffer:
buf[5];
size_t length = strlen(buffer);
buffer[length - 1]=0;
Then use strtol to convert the string into number which provides better error detection when the conversion fails.
char * fgets ( char * str, int num, FILE * stream );
Get string from stream.
Reads characters from stream and stores them as a C string into str until (num-1) characters have been read or either a newline or the end-of-file is reached, whichever happens first.
A newline character makes fgets stop reading, but it is considered a valid character by the function and included in the string copied to str. (This means that you carry \n)
A terminating null character is automatically appended after the characters copied to str.
Notice that fgets is quite different from gets: not only fgets accepts a stream argument, but also allows to specify the maximum size of str and includes in the string any ending newline character.
PD: Try to have a larger buffer.