fscanf with whitespaces as separators - what format should I use? - c

I have a txt file that its lines are as follows
[7 chars string][whitespace][5 chars string][whitespace][integer]
I want to use fscanf() to read all these into memory, and I'm confused about what format should I use.
Here's an example of such line:
hello box 94324
Notice the filling whitespaces in each string, apart from the separating whitespace.
Edit: I know about the recommendation to use fgets() first, I cannot use it here.
Edit: here's my code
typedef struct Product {
char* id; //Product ID number. This is the key of the search tree.
char* productName; //Name of the product.
int currentQuantity; //How many items are there in stock, currently.
} Product;
int main()
{
FILE *initial_inventory_file = NULL;
Product product = { NULL, NULL, 0 };
//open file
initial_inventory_file = fopen(INITIAL_INVENTORY_FILE_NAME, "r");
product.id = malloc(sizeof(char) * 10); //- Product ID: 9 digits exactly. (10 for null character)
product.productName = malloc(sizeof(char) * 11); //- Product name: 10 chars exactly.
//go through each line in inital inventory
while (fscanf(initial_inventory_file, "%9c %10c %i", product.id, product.productName, &product.currentQuantity) != EOF)
{
printf("%9c %10c %i\n", product.id, product.productName, product.currentQuantity);
}
//cleanup...
...
}
Here's a file example: (it's actually 10 chars, 9 chars, and int)
022456789 box-large 1234
023356789 cart-small 1234
023456789 box 1234
985477321 dog food 2
987644421 cat food 5555
987654320 snaks 4444
987654321 crate 9999
987654322 pillows 44

Assuming your input file is well-formed, this is the most straightforward version:
char str1[8] = {0};
char str2[6] = {0};
int val;
...
int result = fscanf( input, "%7s %5s %d", str1, str2, &val );
If result is equal to 3, you successfully read all three inputs. If it's less than 3 but not EOF, then you had a matching failure on one or more of your inputs. If it's EOF, you've either hit the end of the file or there was an input error; use feof( input ) to test for EOF at that point.
If you can't guarantee your input file is well-formed (which most of us can't), you're better off reading in the entire line as text and parsing it yourself. You said you can't use fgets, but there's a way to do it with fscanf:
char buffer[128]; // or whatever size you think would be appropriate to read a line at a time
/**
* " %127[^\n]" tells scanf to skip over leading whitespace, then read
* up to 127 characters or until it sees a newline character, whichever
* comes first; the newline character is left in the input stream.
*/
if ( fscanf( input, " %127[^\n]", buffer ) == 1 )
{
// process buffer
}
You can then parse the input buffer using sscanf:
int result = sscanf( buffer, "%7s %5s %d", str1, str2, &val );
if ( result == 3 )
{
// process inputs
}
else
{
// handle input error
}
or by some other method.
EDIT
Edge cases to watch out for:
Missing one or more inputs per line
Malformed input (such as non-numeric text in the integer field)
More than one set of inputs per line
Strings that are longer than 7 or 5 characters
Value too large to store in an int
EDIT 2
The reason most of us don't recommend fscanf is because it sometimes makes error detection and recovery difficult. For example, suppose you have the input records
foo bar 123r4
blurga blah 5678
and you read it with fscanf( input, "%7s %5s %d", str1, str2, &val );. fscanf will read 123 and assign it to val, leaving r4 in the input stream. On the next call, r4 will get assigned to str1, blurga will get assigned to str2, and you'll get a matching failure on blah. Ideally you'd like to reject the whole first record, but by the time you know there's a problem it's too late.
If you read it as a string first, you can parse and check each field, and if any of them are bad, you can reject the whole thing.

Let's assume the input is
<LWS>* <first> <LWS>+ <second> <LWS>+ <integer>
where <LWS> is any whitespace character, including newlines; <first> has one to seven non-whitespace characters; <second> has one to five non-wihitespace characters; <integer> is an optionally signed integer (in hexadecimal if it begins with 0x or 0X, in octal if it begins with 0, or in decimal otherwise); * indicates zero or more of the preceding element; and + indicates one or more of the preceding element.
Let's say you have a structure,
struct record {
char first[8]; /* 7 characters + end-of-string '\0' */
char second[6]; /* 5 characters + end-of-string '\0' */
int number;
};
then you can read the next record from stream in into the structure pointed to by the caller using e.g.
#include <stdlib.h>
#include <stdio.h>
/* Read a record from stream 'in' into *'rec'.
Returns: 0 if success
-1 if invalid parameters
-2 if read error
-3 if non-conforming format
-4 if bug in function
+1 if end of stream (and no data read)
*/
int read_record(FILE *in, struct record *rec)
{
int rc;
/* Invalid parameters? */
if (!in || !rec)
return -1;
/* Try scanning the record. */
rc = fscanf(in, " %7s %5s %d", rec->first, rec->second, &(rec->number));
/* All three fields converted correctly? */
if (rc == 3)
return 0; /* Success! */
/* Only partially converted? */
if (rc > 0)
return -3;
/* Read error? */
if (ferror(in))
return -2;
/* End of input encountered? */
if (feof(in))
return +1;
/* Must be a bug somewhere above. */
return -4;
}
The conversion specifier %7s converts up to seven non-whitespace characters, and %5s up to five; the array (or char pointer) must have room for an additional end-of-string nul byte, '\0', which the scanf() family of functions add automatically.
If you do not specify the length limit, and use %s, the input can overrun the specified buffer. This is a common cause for the common buffer overflow bug.
The return value from the scanf() family of functions is the number of successful conversions (possibly 0), or EOF if an error occurs. Above, we need three conversions to fully scan a record. If we scan just 1 or 2, we have a partial record. Otherwise, we check if a stream error occurred, by checking ferror(). (Note that you want to check ferror() before feof(), because an error condition may also set feof().) If not, we check if the scanning function encountered end-of-stream before anything was converted, using feof().
If none of the above cases were met, then the scanning function returned zero or negative without neither ferror() or feof() returning true. Because the scanning pattern starts with (whitespace and) a conversion specifier, it should never return zero. The only nonpositive return value from the scanf() family of functions is EOF, which should cause feof() to return true. So, if none of the above cases were met, there must be a bug in the code, triggered by some odd corner case in the input.
A program that reads structures from some stream into a dynamically allocated buffer typically implements the following pseudocode:
Set ptr = NULL # Dynamically allocated array
Set num = 0 # Number of entries in array
Set max = 0 # Number of entries allocated for in array
Loop:
If (num >= max):
Calculate new max; num + 1 or larger
Reallocate ptr
If reallocation failed:
Report out of memory
Abort program
End if
End if
rc = read_record(stream, ptr + num)
If rc == 1:
Break out of loop
Else if rc != 0:
Report error (based on rc)
Abort program
End if
End Loop

The issue in your code using the "%9c ..."-format is that %9c does not write the string terminating character. So your string is probably filled with garbage and not terminated at all, which leads to undefined behaviour when printing it out using printf.
If you set the complete content of the strings to 0 before the first scan, it should work as intended. To achieve this, you can use calloc instead of malloc; this will initialise the memory with 0.
Note that the code also has to somehow consumes the newline character, which is solved by an additional fscanf(f,"%*c")-statement (the * indicates that the value is consumed, but not stored to a variable). Will work only if there are no other white spaces between the last digit and the newline character:
int main()
{
FILE *initial_inventory_file = NULL;
Product product = { NULL, NULL, 0 };
//open file
initial_inventory_file = fopen(INITIAL_INVENTORY_FILE_NAME, "r");
product.id = calloc(sizeof(char), 10); //- Product ID: 9 digits exactly. (10 for null character)
product.productName = calloc(sizeof(char), 11); //- Product name: 10 chars exactly.
//go through each line in inital inventory
while (fscanf(initial_inventory_file, "%9c %10c %i", product.id, product.productName, &product.currentQuantity) == 3)
{
printf("%9s %10s %i\n", product.id, product.productName, product.currentQuantity);
fscanf(initial_inventory_file,"%*c");
}
//cleanup...
}

Have you tried the format specifiers?
char seven[8] = {0};
char five[6] = {0};
int myInt = 0;
// loop here
fscanf(fp, "%s %s %d", seven, five, &myInt);
// save to structure / do whatever you want
If you're sure that the formatting and strings are the always fixed length, you could also iterate over input character by character (using something like fgetc() and manually process it. The example above could cause segmentation errors if the string in the file exceeds 5 or 7 characters.
EDIT Manual Scanning Loop:
char seven[8] = {0};
char five[6] = {0};
int myInt = 0;
// loop this part
for (int i = 0; i < 7; i++) {
seven[i] = fgetc(fp);
}
assert(fgetc(fp) == ' '); // consume space (could also use without assert)
for (int i = 0; i < 5; i++) {
five[i] = fgetc(fp);
}
assert(fgetc(fp) == ' '); // consume space (could also use without assert)
fscanf(fp, "%d", &myInt);

Related

How to split string (character) and variable in 1 line on C?

How can I split character and variable in 1 line?
Example
INPUT
car1900food2900ram800
OUTPUT
car 1900
food 2900
ram 800
Code
char namax[25];
int hargax;
scanf ("%s%s",&namax,&hargax);
printf ("%s %s",namax,hargax);
If I use code like that, I need double enter or space for make output. How can I split without that?
You should be able to use code like this to read one name and number:
if (scanf("%24[a-zA-Z]%d", namax, &hargax) == 2)
…got name and number OK…
else
…some sort of problem to be reported and handled…
You would need to wrap that in a loop of some sort in order to get three pairs of values. Note that using &namax as an argument to scanf() is technically wrong. The %s, %c and %[…] (scan set) notations all expect a char * argument, but you are passing a char (*)[25] which is quite different. A fortuitous coincidence means you usually get away with the abuse, but it is still not correct and omitting the & is easy (and correct).
You can find details about scan sets etc in the POSIX specification of scanf().
You should consider reading a whole line of input with fgets() or POSIX
getline(), and then processing the resulting string with sscanf(). This makes error reporting and error recovery easier. See also How to use sscanf() in loops.
Since you are asking this question which is actually easy, I presume you are somewhat a beginner in C programming. So instead of trying to split the input itself during the input which seems to be a bit too complicated for someone who's new to C programming, I would suggest something simpler(not efficient when you take memory into account).
Just accept the entire input as a String. Then check the string internally to check for digits and alphabets. I have used ASCII values of them to check. If you find an alphabet followed by a digit, print out the part of string from the last such occurrence till the current point. And while printing this do the same with just a slight tweak with the extracted sub-part, i.e, instead of checking for number followed by letter, check for letter followed by digit, and at that point print as many number of spaces as needed.
just so that you know:
ASCII value of digits (0-9) => 48 to 57
ASCII value of uppercase alphabet (A-Z) => 65 to 90
ASCII value of lowercase alphabets (a-z)
=> 97 to 122
Here is the code:
#include<stdio.h>
#include<string.h>
int main() {
char s[100];
int i, len, j, k = 0, x;
printf("\nenter the string:");
scanf("%s",s);
len = strlen(s);
for(i = 0; i < len; i++){
if(((int)s[i]>=48)&&((int)s[i]<=57)) {
if((((int)s[i+1]>=65)&&((int)s[i+1]<=90))||(((int)s[i+1]>=97)&&((int)s[i+1]<=122))||(i==len-1)) {
for(j = k; j < i+1; j++) {
if(((int)s[j]>=48)&&((int)s[j]<=57)) {
if((((int)s[j-1]>=65)&&((int)s[j-1]<=90))||(((int)s[j-1]>=97)&&((int)s[j-1]<=122))) {
printf("\t");
}
}
printf("%c",s[j]);
}
printf("\n");
k = i + 1;
}
}
}
return(0);
}
the output:
enter the string: car1900food2900ram800
car 1900
food 2900
ram 800
In addition to using a character class to include the characters to read as a string, you can also use the character class to exclude digits which would allow you to scan forward in the string until the next digit is found, taking all characters as your name and then reading the digits as an integer. You can then determine the number of characters consumed so far using the "%n" format specifier and use the resulting number of characters to offset your next read within the line, e.g.
char namax[MAXNM],
*p = buf;
int hargax,
off = 0;
while (sscanf (p, "%24[^0-9]%d%n", namax, &hargax, &off) == 2) {
printf ("%-24s %d\n", namax, hargax);
p += off;
}
Note how the sscanf format string will read up to 24 character that are not digits as namax and then the integer that follows as hargax storing the number of characters consumed in off which is then applied to the pointer p to advance within the buffer in preparation for your next parse with sscanf.
Putting it altogether in a short example, you could do:
#include <stdio.h>
#define MAXNM 25
#define MAXC 1024
int main (void) {
char buf[MAXC] = "";
while (fgets (buf, MAXC, stdin)) {
char namax[MAXNM],
*p = buf;
int hargax,
off = 0;
while (sscanf (p, "%24[^0-9]%d%n", namax, &hargax, &off) == 2) {
printf ("%-24s %d\n", namax, hargax);
p += off;
}
}
}
Example Use/Output
$ echo "car1900food2900ram800" | ./bin/fgetssscanf
car 1900
food 2900
ram 800

Reading line per line, and evaluating strings into coordinates using fgets() and sscanf()

I'm trying to read multiple lines of vertices with varying length using fgets and sscanf.
(1,6),(2,6),(2,9),(1,9)
(1,5)
My program goes into an infinite loop stuck within the first vertex.
char temp3[255];
while(fgets(temp3, 255, fp)!= NULL){
printf("Polygon %d: ", polycount);
while(sscanf(temp3, "(%d,%d)", &polygonx[polycount][vertcount], &polygony[polycount][vertcount]) != EOF){
sscanf(temp3, ",");
printf("(%d,%d),",polygonx[polycount][vertcount], polygony[polycount][vertcount]);
vertcount++;
}
vertcounts[polycount] = vertcount;
vertcount = 0;
polycount++;
}
I must be able to feed the x and y values of the vertices into the polygon arrays, so i'm stuck with using sscanf. I'm also having a problem since I cant find anything on the internet that scans varying numbers of elements per line.
It's because this
while(sscanf(temp3, "(%d,%d)",
&polygonx[polycount][vertcount], &polygony[polycount][vertcount]) != EOF)
{
}
is never going to be true I think, because scanf() returns the number of parameters succesfuly scanned, I would do this instead
while(sscanf(temp3, "(%d,%d)",
&polygonx[polycount][vertcount], &polygony[polycount][vertcount]) == 2)
{
}
Your code doesn't work because it does not satisfy the condition for sscanf() to return EOF, the following is from the manual page referenced at the end
The value EOF is returned if the end of input is reached before either the first successful conversion or a matching failure occurs. EOF is also returned if a read error occurs, in which case the error indicator for the stream (see ferror(3)) is set, and errno is set to indicate the error.
So it appears that you are not reaching the end if input before the first successful conversion or a matching failure occurs, which makes sense according to the contents of the file. And the second part applies only to file streams of course.
And instead of the sscanf(temp3, ",") which doesn't do what you think, you could do it like this
next = strchr(temp3, ',');
if (next != NULL)
temp3 = next + 1;
else
/* you've reached the end here */
This is a suggestion on how to parse this file
#include <stdio.h>
#include <string.h>
int
main(void)
{
const char temp3[] = "(1,6),(2,6),(2,9),(1,9)\n(1,5)";
char *source;
int x, y;
int count;
source = temp3;
while (sscanf(source, "(%d,%d)%*[^(]%n", &x, &y, &count) == 2)
{
/* this is just for code clarity */
polygonx[polycount][vertcount] = x;
polygony[polycount][vertcount] = y;
/* Process here if needed, and then advance the pointer */
source += count;
}
return 0;
}
The "%n" specifier captures the number of characters scanned so far, so you can use it to advance the pointer to the las position scanned in the source string.
And the "%*[^(" will skip all characters until the next '(' is found.
Please refer to sscanf(3) for more information on the "%n" specifier, and the %[ specifier.
If successfully read sscanf will return 2 in this case . sscanf returns numbers of variables filled.
Check if it returns 2 which will indicate success here .
while(sscanf(temp3,"(%d,%d)",&polygonx[polycount][vertcount],&polygony[polycount]][vertcount]) != EOF)
Instead of this , check like this -
while(sscanf(temp3,"(%d,%d)%*c",&polygonx[polycount][vertcount],&polygony[polycount]][vertcount])== 2)
^ to exclude the comma after it
also to ignore ',' after the coordinates , you use -
sscanf(temp3, ",");
is not correct . In the above sscanf you can read it and discard it as well by using %*c specifier.

Find the length of an integer within a string

If I've got a text file like:
8f5
I can easily use strstrto parse the values 8 and 5 out of it.
As such:
//while fgets.. etc (other variables and declarations before it)
char * ptr = strstr(str,"f");
if(ptr != NULL)
{
int a = atol(ptr-1); // value of 8
int b = atol(ptr+1); // value of 5
}
But what if the values where two decimals long? I could add +2 and -2 to each atol call. But I can't predict when the values are less than 10 or greater, for instance
12f6
or 15f15 As the values are random each time (i.e either one decimal or two). Is there a way to check the length of the values between the string, and then use atol()?
Use atol(str) and atol(ptr+1), if I am reading the question correctly. This will get you the two numbers separated by the f, regardless of how long they are.
Set *ptr = '\0' first if you don't wish to rely on the fact that garbage characters stop atol from parsing.
If the text is always similar to the one you posted, then you can get the three parts of the string with the following code, and you can parse another token if there is a white space between them
#include <ctype.h>
#include <stdio.h>
int main(void)
{
char string[] = "12f5 1234x2912";
char *next;
next = string;
while (*next != '\0') /* While not at the end of the string */
{
char separator[100];
size_t counter;
int firstNumber;
int secondNumber;
/* Get the first number */
firstNumber = strtol(next, &next, 10);
counter = 0;
/* Skip all non-numeric characters and store them in `separator' */
while ((*next != '\0') && (isdigit(*next) == 0))
separator[counter++] = *next++;
/* nul terminate `separator' */
separator[counter] = '\0';
/* extract the second number */
secondNumber = strtol(next, &next, 10);
/* show me how you did it */
printf("%d:%s:%d\n", firstNumber, separator, secondNumber);
/* skip any number of white space characters */
while ((*next != '\0') && (isspace(*next) != 0))
next++;
}
}
in the example above you can see that there are to strings being parsed, you can read the strtol() manual page to understand why this algorithm works.
Normally you should not use atoi() or atol() functions because you cant validate the input string, since there is no way to know whether the function succeded or not.

How to get the length of a standardinput in C? [duplicate]

This question already has answers here:
Capturing a variable length string from the command-line in C
(4 answers)
Closed 9 years ago.
I will start with my code:
char input[40];
fgets( input, 40, stdin );
if( checkPalin(input) == 0 ) {
printf("%s ist ein Palindrom \n", input);
}
else {
printf("%s ist kein Palindrom \n", input);
}
What I want to do is: Read some standardinput and check with my function if it is a Palindrome or not.
My problems are the following:
How can I get the length of the standardinput? Because if it is larger then 40 chars I wanna put an errormessage and furthermore I want my char array to be the exact length of the actual input.
Anybody can help me?
fgets( input, 40, stdin );
length of input should not go beyond 40 characters == 39characters + nul character
If you give string having length more than 39 characters, then fgets() reads first 39 characters and place nul character('\0') as 40 character and ignores remaining characters.
If you give string less than 39 characters , for example 5
then it places reads newline also
length becomes 6(excluding nul character)
Do not forgot to remove newline character.
char input[60];
fgets(input,sizeof input,stdin);
For example if you declare input buffer size with some 60 then if you want to do error checking for more than 40 characters.
You can simply check with strlen() and check length is more than 40.then show error message
If you want to check error with fgets() check against NULL
There's no any function to do it, you need to write it yourself. I.e., read byte by byte looking for EOF character. But I guees you're doing it for avoid overflow, right? if input is larger than 40 characters, you don't need to because is guaranted such a extra values is not put into your buffer by fgets() function, it's never larger than the size you have requested: 40. The value may be less-than or equal, but never greater than.
EDIT:
By "How to get the lenght of a standardinput in C?" I was thinking that you're talking about how many bytes there's in stdin. I'm sorry for that. If you want to get how may bytes has fgets() written in, just use strlen()
With
fgets( input, 40, stdin );
input is guaranteed to have number of characters less than equal to 40 (null termination included)
You don't have to perform checks .
And for getting size of the input you can always use strlen() function on input, as the produced character string from fgets is always null terminated.
It just turned out that it is not so easy to write a function which uses fgets() repeatedly in order to return a malloc()ed string.
The function does no proper error reporting: If there was an error using realloc() or fgets(), the data retrieved till now is returned.
Apart from these, the function proved quite usable.
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
char * read_one_line(FILE * in)
{
size_t alloc_length = 64;
size_t cumulength = 0;
char * data = malloc(alloc_length);
while (1) {
char * cursor = data + cumulength; // here we continue.
char * ret = fgets(cursor, alloc_length - cumulength, in);
printf("r %p %p %zd %zd %zd\n", data, cursor, cumulength, alloc_length, alloc_length - cumulength);
if (!ret) {
// Suppose we had EOF, no error.
// we just return what we read till now...
// there is still a \0 at cursor, so we are fine.
break;
}
size_t newlength = strlen(cursor); // how much is new?
cumulength += newlength; // add it to what we have.
if (cumulength < alloc_length - 1 || data[cumulength-1] == '\n') {
// not used the whole buffer... so we are probably done.
break;
}
// we need more!
// At least, probably.
size_t newlen = alloc_length * 2;
char * r = realloc(data, newlen);
printf("%zd\n", newlen);
if (r) {
data = r;
alloc_length = newlen;
} else {
// realloc error. Return at least what we have...
// TODO: or better free and return NULL?
return data;
}
}
char * r = realloc(data, cumulength + 1);
printf("%zd\n", cumulength + 1);
return r ? r : data; // shrinking should always have succeeded, but who knows?
}
int main()
{
char * p = read_one_line(stdin);
printf("%p\t%zd\t%zd\n", p, malloc_usable_size(p), strlen(p));
printf("%s\n", p);
free(p);
}

I don't understand the behavior of fgets in this example

While I could use strings, I would like to understand why this small example I'm working on behaves in this way, and how can I fix it ?
int ReadInput() {
char buffer [5];
printf("Number: ");
fgets(buffer,5,stdin);
return atoi(buffer);
}
void RunClient() {
int number;
int i = 5;
while (i != 0) {
number = ReadInput();
printf("Number is: %d\n",number);
i--;
}
}
This should, in theory or at least in my head, let me read 5 numbers from input (albeit overwriting them).
However this is not the case, it reads 0, no matter what.
I understand printf puts a \0 null terminator ... but I still think I should be able to either read the first number, not just have it by default 0. And I don't understand why the rest of the numbers are OK (not all 0).
CLARIFICATION: I can only read 4/5 numbers, first is always 0.
EDIT:
I've tested and it seems that this was causing the problem:
main.cpp
scanf("%s",&cmd);
if (strcmp(cmd, "client") == 0 || strcmp(cmd, "Client") == 0)
RunClient();
somehow.
EDIT:
Here is the code if someone wishes to compile. I still don't know how to fix
http://pastebin.com/8t8j63vj
FINAL EDIT:
Could not get rid of the error. Decided to simply add #ReadInput
int ReadInput(BOOL check) {
...
if (check)
printf ("Number: ");
...
# RunClient()
void RunClient() {
...
ReadInput(FALSE); // a pseudo - buffer flush. Not really but I ignore
while (...) { // line with garbage data
number = ReadInput(TRUE);
...
}
And call it a day.
fgets reads the input as well as the newline character. So when you input a number, it's like: 123\n.
atoi doesn't report errors when the conversion fails.
Remove the newline character from the buffer:
buf[5];
size_t length = strlen(buffer);
buffer[length - 1]=0;
Then use strtol to convert the string into number which provides better error detection when the conversion fails.
char * fgets ( char * str, int num, FILE * stream );
Get string from stream.
Reads characters from stream and stores them as a C string into str until (num-1) characters have been read or either a newline or the end-of-file is reached, whichever happens first.
A newline character makes fgets stop reading, but it is considered a valid character by the function and included in the string copied to str. (This means that you carry \n)
A terminating null character is automatically appended after the characters copied to str.
Notice that fgets is quite different from gets: not only fgets accepts a stream argument, but also allows to specify the maximum size of str and includes in the string any ending newline character.
PD: Try to have a larger buffer.

Resources