Read lines from an input file of varying line sizes - c

Currently, I am using getline to read lines from a file and I can access individual characters the following way from stdin:
char buffer[1024];
while((lineSize = getline(&line, &len, stdin)) != -1) {
if (line[0] != 84) {
// ...
continue; // continue to next line in file
}
if (line[0] == 84){ // (the 'T' character)
printf("TEST: Line Value: %s\n", line);
buffer[0] = line[1]; // this is a single digit number in char form
buffer[1] = '\0';
// send buffer somewhere
send(clientSocket, buffer, strlen(buffer), 0);
// ...
}
A sample file is as follows:
T3
T9
S0
S4
T55
T6
However, as you can see, I run into issues when a number > 9 is given such as the T55 line here. I can only grab the first digit with this method. Therefore, I may have to completely redo the way I read a file. Is there a better and simple way I can read through an input file and check the first character and make the remaining character(s) into an int until the end of a line? (Max the integer can be is 100 btw)

Continuing from my comments, you can use fgets() to read line and then use sscanf() with the format string of " %c%d%n" to extract the first character and converting the next set of digits to an int and finally obtaining to total number of characters consumed by sscanf() in that conversion using the "%n" specifier. You validate that both the character and integer conversion took place and that the first non-whitespace character read was 'T'. You can then use mychar and myint as desired and use mylen as the length to use with send.
(note: you can scan forward in line to determine if any whitespace was included at the beginning and ignore than in your call to send() -- that is left to you)
Putting it altogether, you can so something like:
char line[1024],
mychar = 0;
int myint = 0,
mylen;
/* read using fgets */
while (fgets (line, sizeof line, stdin)) {
/* parse with sscanf, validate both conversions and 'T' as 1st char,
* use "%n" to get number of chars through int conversion
*/
if (sscanf (line, " %c%d%n", &mychar, &myint, &mylen) != 2 ||
mychar != 'T') {
fputs ("error: invalid format.\n", stderr);
continue;
}
send (clientSocket, line, mylen, 0); /* send mylen chars */
}
To be more specific, I will need to see your Minimal Complete Reproducible Example to ensure there is nothing outside what you have posted that will impact the code above.
Adding Example
Adding a short example to show the result of parsing under expected and unexpected input with the above, and adding the scanning forward to remove leading whitespace in the line, a short program that reads input from stdin and writes to stdout, outputting the lines matching 'T'(int), you could do:
#include <stdio.h>
#include <unistd.h>
#include <ctype.h>
int main (void) {
char line[1024],
mychar = 0,
nl = '\n';
int myint = 0,
mylen;
/* read using fgets */
while (fgets (line, sizeof line, stdin)) {
char *p = line;
/* parse with sscanf, validate both conversions and 'T' as 1st char,
* use "%n" to get number of chars through int conversion
*/
if (sscanf (line, " %c%d%n", &mychar, &myint, &mylen) != 2 ||
mychar != 'T') {
fprintf (stderr, "error: invalid format: %s", line);
continue;
}
while (isspace (*p)) { /* reamove leading whitespace */
p += 1;
mylen -= 1;
}
// send (clientSocket, p, mylen, 0); /* send mylen chars */
write (STDOUT_FILENO, p, mylen);
write (STDOUT_FILENO, &nl, 1);
}
}
(note: write (STDOUT_FILENO, &nl, 1); is simply included above to output a newline after each output -- it would not be part of what you send() over your socket -- unless the receiving program is using the '\n' as the line termination character)
Example Input File:
$ cat dat/charint.txt
T4
T44
T444
TT
T3 and more gibberish
P55
(note: leading whitespace and trailing characters included in last two lines beginning with 'T', including the invalid line format " TT")
Example Use/Output
$ ./bin/charandintsend < dat/charint.txt
T4
T44
T444
error: invalid format: TT
T3
error: invalid format: P55
Let me know if you have questions, or if I misunderstood some aspect of your question.

Related

C program to read file reading an extra line

The code I'm working on involves reading a file w/ input structured as the following:
(spaces)name(spaces) val (whatever) \n
(spaces)name(spaces) val (whatever) \n
(spaces)name(spaces) val (whatever) \n
Where spaces denotes an arbitrary amount of white spaces. My code is supposed to give both the name and the value. There is another condition, where everything on the line after a '#' is ignored (treated like a comment). The output is supposed be:
"name: (name) value: val \n"
For the most bit the code is working, except that it adds an extra line where it will create a set name= null and val to whatever the last number read was. For example my test file:
a 12
b 33
#c 15
nice 6#9
The output is:
Line after: a 12
name: a value: 12 :
Line after: b 33
name: b value: 33 :
Line after: # c 15
Line after: nice 6#9
name: nice value: 6 :
Line after:
name: value: 6 : //why is this happening
The code is here.
void readLine(char *filename)
{
FILE *pf;
char name[10000];
char value[20];
pf = fopen(filename, "r");
char line[10000];
if (pf){
while (fgets(line, sizeof(line), pf) != NULL) {
//printf("Line: %s\n",line);
printf("Line after: %s\n",line);
while(true){
int i=0;
char c=line[i]; //parse every char of the line
//assert(c==' ');
int locationS=0; //index in the name
int locationV=0; //index in the value
while((c==' ')&& i<sizeof(line)){
//look for next sequence of chars
++i;
c=line[i];
if(c=='#'){
break;
}
}
if(c=='#'){ break;}
assert(c!=' ');
while (c!=' '&&i<sizeof(line))
{
name[locationS]=c;
locationS++;
//printf("%d",locationS);
++i;
c=line[i];if(c=='#'){
break;
}
}
if(c=='#'){ break;}
assert(c==' ');
while(c==' '&&i<sizeof(line)){
//look for next sequence of chars
++i;
c=line[i];
if(c=='#'){
break;
}
}
if(c=='#'){ break;}
assert(c!=' ');
printf("\n");
while ((c!=' '&& c!='\n')&&i<sizeof(line))
{
value[locationV]=c;
locationV++;
++i;
c=line[i];if(c=='#'){
break;
}
}
printf("name: %s value: %s : \n",name, value);
memset(&name[0], 0, sizeof(name));
memset(&value[0], 0, sizeof(value));
break; //nothing interesting left
}
}
fclose(pf);
}else{
printf("Error in file\n");
exit(EXIT_FAILURE);
}
}
Pasha, you are doing some things correctly, but then you are making what you are trying to do much more difficult that need be. First, avoid using magic-numbers in your code, such as char name[10000];. Instead:
...
#define MAXC 1024 /* if you need a constant, #define one (or more) */
int main (int argc, char **argv) {
char line[MAXC];
...
(you did very good following the rule Don't skimp on Buffer Size :)
Likewise you have done well in opening the file and validating the file is open for reading before attempting to read from it with fgets(). You can do that validation in a single block and handle the error at that time -- which will have the effect of reducing one-level of indention throughout the rest of your code, e.g.
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
Now with the file open and validated that it is open for reading and any error handled, you can proceed to reading each line in your file. Unless you are storing the names in an array that needs to survive your read loop, you can simply declare name[MAXC]; within the read-loop block, e.g.
while (fgets (line, MAXC, fp)) { /* read each line of input */
char name[MAXC]; /* storage for name */
int val; /* integer value for val */
(note: rather than declare another array to hold value, we have simply declared val as an int and will use sscanf to parse name and val converting the value directly to int at that time)
Anytime you are using a line-oriented input function (like fgets() or POSIX getline(), you will want to trim the '\n' read and included in the buffer that is filled. You can do that easily with the strcspn, see strspn(3) - Linux manual page. It is a simple, single call where you use the return from strcspn as the index for the '\n' in order to overwrite the '\n' with the nul-terminating character (which is '\0', or simply 0)
line[strcspn (line, "\n")] = 0; /* trim '\n' from end of line */
Now all you need to do is check for the presence of the first '#' in line that marks the beginning of a comment. If found, you will simply overwrite '#' with the nul-terminating character as you did for the '\n', e.g.
line[strcspn (line, "#")] = 0; /* overwrite '#' with nul-char */
Now that you have your line and have removed the '\n' and any comment that may be present, you can check that line isn't empty (meaning it began with a '#' or was simply an empty line containing only a '\n')
if (!*line) /* if empty-string */
continue; /* get next line */
(note: if (!*line) is simply shorthand for if (line[0] == 0). When you dereference your buffer, e.g. *line your are simply returning the first element (first char) as *line == *(line + 0) in pointer notation which is equivalent *(line + 0) == line[0] in array-index notation. The [] operates as a dereference as well.)
Now simply parse for the name and val directly from line using sscanf. Both the "%s" and "%d" conversion specifiers will ignore all leading whitespace before the conversion specifier. You can use this simple method so long as name itself does not contain whitespace. Just as you validate the return of your file opening, you will validate the return of sscanf to determine if the number of conversions you specified successfully took place. For example:
if (sscanf (line, "%1023s %d", name, &val) == 2) /* have name/value? */
printf ("\nline: %s\nname: %s\nval : %d\n", line, name, val);
else
printf ("\nline: %s (doesn't contain name/value\n", line);
(note: by using the field-width modifier for your string, e.g. "%1023s" you protect your array-bounds for name. The field width limits sscanf from writing more than 1023 char + \0 to name. This cannot be provided by a variable or by a macro and is one of the occasions where you must stick a magic-number in your code... For every rule there is generally a caveat or two...)
If you asked for 2 conversions, and sscanf returned 2, then you know that both the requested conversions were successful. Further, since for val you have specified an integer conversion, you are guaranteed that value will contain an integer.
That's all there is to it. All that remains is closing the file (if not reading from stdin) and you are done. A full example could be:
#include <stdio.h>
#include <string.h>
#define MAXC 1024 /* if you need a constant, #define one (or more) */
int main (int argc, char **argv) {
char line[MAXC];
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
while (fgets (line, MAXC, fp)) { /* read each line of input */
char name[MAXC]; /* storage for name */
int val; /* integer value for val */
line[strcspn (line, "\n")] = 0; /* trim '\n' from end of line */
line[strcspn (line, "#")] = 0; /* overwrite '#' with nul-char */
if (!*line) /* if empty-string */
continue; /* get next line */
if (sscanf (line, "%1023s %d", name, &val) == 2) /* have name/value? */
printf ("\nline: %s\nname: %s\nval : %d\n", line, name, val);
else
printf ("\nline: %s (doesn't contain name/value\n", line);
}
if (fp != stdin) /* close file if not stdin */
fclose (fp);
}
(note: if you want to print the raw line before trimming the '\n' and comments, just move the printing of line before the calls to strcspn. Above line is printed showing the final state of line before the call to sscanf)
Example Use/Output
Using your input file stored in dat/nameval.txt on my system, you could simply do the following to read values redirected from stdin:
$ ./bin/parsenameval <dat/nameval.txt
line: a 12
name: a
val : 12
line: b 33
name: b
val : 33
line: nice 6
name: nice
val : 6
(note: just remove the redirection < to actually open and read from the file rather than having the shell do it for you. Six-to-one, half-dozen to another.)
Look things over and let me know if you have further questions. If for some reason you cannot use any function to help you parse the line and must use only pointers or array-indexing, let me know. Following the approach above, it takes only a little effort to replace each of the operations with its manual equivalent.

C: The first element of a char cannot be detected

I am learning getting inputs from key board. I want the user create a or more strings from the input, each string is considered as a line, the program will not terminate until a specified char is pressed. Then store these strings to the buffer.
However, when I print out the buffer, the first few elements of the string are always missing. Here is my code:
#include<stdio.h>
int main(void){
printf("Please type the string:\n");
char buffer[1000];
int c;
while( (c = getchar()) != ' ' ) {
fgets(buffer, sizeof(buffer), stdin);
printf("The output string is: \n%s\n", buffer);
if((c = getchar())== ' '){
printf("A space is detected!\n");
break;
}
}
}
The output is:
Please type the string:
abcdefg
The output string is:
bcdefg
hijklmn
The output string is:
jklmn
opqrst
The output string is:
qrst
A space is detected!
Program ended with exit code: 0
Which part did I go wrong? Any hints are very much appreciated.
The problem you are having is both getchar(), and fgets in your code are reading from stdin. Since you call getchar() first in your test, it was consuming the first character of your string, when you called it again, another character disappeared...
You don't need getchar() at all to end your loop. All you care about for breaking your loop as you have explained is whether the user enters a space as the first character. fgets does not skip leading whitespace, so any leading space entered by the user will be captured at the beginning of buffer. So to satisfy your loop-exit condition, all you need to do is check if the first character of buffer is a space.
How? The simple way is to just derererence buffer, e.g. *buffer returns the first character in buffer. How? In pointer notation, buffer + 0 is the offset you want in buffer, so to get the character at that location, you dereference, e.g. *(buffer + 0), which of course is just *buffer, which is the equivalent of buffer[0].
So, putting it altogether, and getting rid of getchar(), and adding strlen to properly validate that the string fit in buffer and to get the location for the trailing '\n' read and included in buffer by fgets (which leaves you with the length of trimmed string as a benefit), you could do something similar to:
#include <stdio.h>
#include <string.h>
#define MAXC 1000 /* if you need a constant, define one (or more) */
int main (void) {
char buffer[MAXC] = ""; /* initialize strings zero (good practice) */
for (;;) { /* loop continually taking input */
size_t len; /* variable for buffer length */
printf ("\nenter string: "); /* prompt */
if (!fgets (buffer, sizeof buffer, stdin)) /* read input */
break; /* exit if user cancels input */
len = strlen (buffer); /* get length */
if (len && buffer[len-1] == '\n') /* check if last char is \n */
buffer[--len] = 0; /* overwrite with nul-char */
else { /* otherwise string too long */
fputs ("error: string too long.\n", stderr);
return 1;
}
if (*buffer == ' ') /* check if 1st char of buffer is ' ' */
break;
printf ("buffer: %s (%zu chars)\n", buffer, len); /* output */
}
}
Example Use/Output
$ ./bin/fgetsspace
enter string: my dog has fleas
buffer: my dog has fleas (16 chars)
enter string: my cat has none
buffer: my cat has none (15 chars)
enter string: bye
(note: a space was entered before bye above, e.g. " bye")
Look things over and let me know if you have further questions.
Separating Words with strtok
To separate each line into individual words you can use strtok. The first argument is the buffer (for the 1st call), the second parameter is a list of characters to use as delimeters between the words (e.g. if you want to separate on space include a space, to not include the '.' at the end of a sentence include that as well -- and include the '\n'). After the 1st call to strtok all subsequent calls to get the remaining words uses NULL in place of buffer, e.g.
#include <stdio.h>
#include <string.h>
#define MAXC 1000 /* if you need a constant, define one (or more) */
int main (void) {
char buffer[MAXC] = ""; /* initialize strings zero (good practice) */
for (;;) { /* loop continually taking input */
size_t len; /* variable for buffer length */
char *delim = " .\n", /* delmiters for strtok */
*p = buffer; /* pointer to buffer for strtok */
printf ("\nenter string: "); /* prompt */
if (!fgets (buffer, sizeof buffer, stdin)) /* read input */
break; /* exit if user cancels input */
len = strlen (buffer); /* get length */
if (len && buffer[len-1] == '\n') /* check if last char is \n */
buffer[--len] = 0; /* overwrite with nul-char */
else { /* otherwise string too long */
fputs ("error: string too long.\n", stderr);
return 1;
}
if (*buffer == ' ') /* check if 1st char of buffer is ' ' */
break;
printf ("buffer: %s (%zu chars)\n", buffer, len); /* output */
p = strtok (buffer, delim); /* 1st call to strtok uses buffer */
while (p != NULL) {
printf (" %s\n", p);
p = strtok (NULL, delim); /* subsequent calls use NULL */
}
}
}
(note: the original buffer is modified, so make a copy if you need to preserve the original)
Example Use/Output
$ ./bin/fgetsspace
enter string: my dog has fleas
buffer: my dog has fleas (16 chars)
my
dog
has
fleas
enter string: my cat has none
buffer: my cat has none (15 chars)
my
cat
has
none
enter string: bye
getchar swallows up a character. Your first iteration gets one character swallowed up by the initial call in the while, and then successive iterations get two characters swallowed up, one by the getchar you use to detect a space and then again the one in the while.
Answering in addition to my initial comment and the issue:
First, quoting myself:
I believe that when using getChar(), you efficiently remove the character from stdin buffer.
As stated since then by other people, the problem is that your call to getchar function consume and input, efficiently removing it from stdin buffer.
See Jim Buck's answer for detailed informations on the precise behavior of your application.
Now, what should you do ?
First, the if inside the while loop is not necessary, and using your application right now must be pretty odd. Try doing :
#include<stdio.h>
int main(void){
printf("Please type the string:\n");
char buffer[1000];
int c;
while( (c = getchar()) != ' ' ) {
fgets(buffer, sizeof(buffer), stdin);
printf("The output string is: \n%s\n", buffer);
}
printf("A space is detected!\n");
}
Instead to prevent unnecessary user inputs. Your loop is basically an infinite loop so there is no need to check at the end of every iteration if the loop should terminate, the while statement is already doing that pretty damn well. :P
Now, to prevent the input from being taken out of buffer, I would consider using the buffer's first element instead of "c" variable.
Like so :
#include<stdio.h>
int main(void){
printf("Please type the strings:\n");
char buffer[1000];
while( (buffer[0] = getchar()) != ' ' ) { // Now reads directly into buffer
fgets(buffer + 1, sizeof(buffer), stdin); // + 1 prevents overriding the char we just read.
printf("The output string is: \n%s\n", buffer);
}
printf("A space is detected!\n");
}
Have a nice day!

fgets prints out 'à' instead of an actual input

I am trying to write a program in C so that it takes input from the user which can only be 'Q' 'q' 'N' 'n' '1' or '2', everything else should be invalid. I am new to C thus I still can't figure out how fgets suppose to work. Whatever i enter as an input comes out as an 'à'.
char C=' ';
int N=0;
int flag=1;
char buffer[20];
char input[20];
printMenu();
printf("\n\nPlease choose something: ");
fgets(buffer, sizeof(buffer), stdin);
sscanf(buffer, "$s",&input);
//checking the input, shows à instead of an actual input
printf("Input is %c\n", input);
if(*input=='C'||*input=='c')
C=userInputChar();
else
if(*input=='N'||*input=='n')
N=userInputInt();
else
if(*input=='1')
printTriangleLeft(C,N);
else
if(*input=='2')
printTriangleRight(C,N);
else
if(input[0]=='Q'||input[0]=='q'){
printf("Exiting the program...");
return 0;
}
else
printf("Invalid input");
You're not checking the return value of sscanf, so you don't know whether it successfully parsed anything.
Your sscanf format string is $s, which doesn't extract any values. The &input argument is ignored. Besides, there is no scanf format that would take an argument of type char (*)[20] anyway.
printf %c takes an int. You're passing it a char *. This is why you're getting garbage output.
*input=='C' doesn't work either because *input is uninitialized at this point.
You should not pass array pointer into sscanf.
array itself is enough and $s is incorrect also. '%s' you should use to get string
Correct is like following:
sscanf(buffer, "%s", input);
Changed the following :
char buffer[20];
char input[20];
printf("\n\nPlease choose something: ");
fgets(buffer, sizeof(buffer), stdin);
sscanf(buffer, "$s",&input);
printf("Input is %c\n", input);
to
char buffer[20];
char input;
fgets(buffer, sizeof(buffer), stdin);
sscanf(buffer, "%c",&input);
printf("Input is %c\n", input);
seems to be working at least for one character
Continuing from my comment, in addition to the various syntax errors you have in your code, you are simply making this harder on yourself than it needs to be. Here, if I understand your question correctly, you want to take input and validate that it is made up of only the characters "Qqn12".
To validate your input is entirely made up on "Qqn12", you need to loop over each character in your input buffer and check that it is one of "Qqn12". If not the input is invalid.
Before we get there, lets talk about the proper way to validate and remove the '\n' included in the buffer filled by fgets (POSIX getline also includes the trailing '\n' in it buffer as well). To remove the trailing newline, you can either use strrchr to locate it, or you can just use strlen to get the length of the buffer and check that buffer[len - 1] == '\n'. If buffer[len - 1] is NOT equal to '\n' then you know character remain unread in stdin (because the number of chars in stdin either equal or exceed buffer size) and you need to handle that error. For example you can use the following:
char buf[MAXC] = "";
size_t len;
printf ("enter string: "); /* prompt, read and validate */
if (!fgets (buf, MAXC, stdin)) {
fprintf (stderr, "error: invalid input or user canceled.\n");
return 1;
}
len = strlen (buf); /* get buf length */
if (len && buf[len-1] == '\n') /* check for '\n' */
buf[--len] = 0; /* overwrite with '\0' */
else { /* input equals or exceeds buffer size, '\n' not read */
fprintf (stderr, "error: input exceed %d chars.\n", MAXC-2);
return 1;
}
(note the test condition len && buf[len-1] == '\n', you must check that len > 0 before testing buf[len-1] == '\n', or Undefined Behavior is invoked by attempting to read a negative array index.)
Next, you want to limit your input to just the chosen characters, so make a string literal containing the characters you will accept, e.g. char *accept = "Qqn12";. string.h provides the strchr function that will locate and return a pointer to the first occurrence of a given character c in a string s. The declaration is:
char *strchr(const char *s, int c);
Obviously, it does no good to look for one of the characters "Qqn12" in your buffer, because there could be other characters not "Qqn12" there too. However, if we turn the search around and ask "Is each character in buffer in accept (e.g. one of "Qqn12")", then you have exactly the test you are looking for -- and strchr does the work of scanning each char in accept for you. For example you could do the following:
...
char *accept = "Qqn12";
...
for (char *p = buf; *p; p++) /* for each char in buf */
if (!strchr (accept, *p)) { /* if not in 'accept', error */
fprintf (stderr, "error: invalid input '%c'.\n", *p);
return 1;
}
If you are more comfortable with array indexing than pointers, you can simply use array indexing to iterate over each character in buffer, e.g. the following does the exact same thing:
for (int i = 0; buf[i]; i++) /* for each char in buf */
if (!strchr (accept, buf[i])) { /* if not in 'accept', error */
fprintf (stderr, "error: invalid input '%c'.\n", buf[i]);
return 1;
}
(you can even use i < len instead of buf[i] for the exit condition if you like)
Putting all the pieces together, you can validate an entry made up of "Qqn12" is entered with something like the following:
#include <stdio.h>
#include <string.h>
#define MAXC 512
int main (void) {
char buf[MAXC] = "",
*accept = "Qqn12";
size_t len;
printf ("enter string: "); /* prompt, read and validate */
if (!fgets (buf, MAXC, stdin)) {
fprintf (stderr, "error: invalid input or user canceled.\n");
return 1;
}
len = strlen (buf); /* get buf length */
if (len && buf[len-1] == '\n') /* check for '\n' */
buf[--len] = 0; /* overwrite with '\0' */
else { /* input equals or exceeds buffer size, '\n' not read */
fprintf (stderr, "error: input exceed %d chars.\n", MAXC-2);
return 1;
}
for (char *p = buf; *p; p++) /* for each char in buf */
if (!strchr (accept, *p)) { /* if not in 'accept', error */
fprintf (stderr, "error: invalid input '%c'.\n", *p);
return 1;
}
printf ("valid input : %s\n", buf);
return 0;
}
Example Use/Output
$ ./bin/validinput
enter string: Qqn12n21Qq
valid input : Qqn12n21Qq
$ ./bin/validinput
enter string: Qqn12n21Qbq
error: invalid input 'b'.
Look things over and let me know if you have any further questions.

Splitting user input into strings of specific length

I'm writing a C program that parses user input into a char, and two strings of set length. The user input is stored into a buffer using fgets, and then parsed with sscanf. The trouble is, the three fields have a maximum length. If a string exceeds this length, the remaining characters before the next whitespace should be consumed/discarded.
#include <stdio.h>
#define IN_BUF_SIZE 256
int main(void) {
char inputStr[IN_BUF_SIZE];
char command;
char firstname[6];
char surname[6];
fgets(inputStr, IN_BUF_SIZE, stdin);
sscanf(inputStr, "%c %5s %5s", &command, firstname, surname);
printf("%c %s %s\n", command, firstname, surname);
}
So, with an input of
a bbbbbbbb cc
the desired output would be
a bbbbb cc
but is instead the output is
a bbbbb bbb
Using a format specifier "%c%*s %5s%*s %5s%*s" runs into the opposite problem, where each substring needs to exceed the set length to get to the desired outcome.
Is there way to achieve this by using format specifiers, or is the only way saving the substrings in buffers of their own before cutting them down to the desired length?
In addition to the other answers, never forget when facing string parsing problems, you always have the option of simply walking a pointer down the string to accomplish any type parsing you require. When you read your string into buffer (my buf below), you have an array of characters you are free to analyze manually (either with array indexes, e.g. buffer[i] or by assigning a pointer to the beginning, e.g. char *p = buffer;) With your string, you have the following in buffer with p pointing to the first character in buffer:
--------------------------------
|a| |b|b|b|b|b|b|b|b| |c|c|\n|0| contents
--------------------------------
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 index
|
p
To test the character pointed to by p, you simply dereference the pointer, e.g. *p. So to test whether you have an initial character between a-z followed by a space at the beginning of buffer, you simply need do:
/* validate first char is 'a-z' and followed by ' ' */
if (*p && 'a' <= *p && *p <= 'z' && *(p + 1) == ' ') {
cmd = *p;
p += 2; /* advance pointer to next char following ' ' */
}
note:, you are testing *p first, (which is the shorthand for *p != 0 or the equivalent *p != '\0') to validate the string is not empty (e.g. the first char isn't the nul-byte) before proceeding with further tests. You would also include an else { /* handle error */ } in the event any one of the tests failed (meaning you have no command followed by a space).
When you are done, your are left with p pointing to the third character in buffer, e.g.:
--------------------------------
|a| |b|b|b|b|b|b|b|b| |c|c|\n|0| contents
--------------------------------
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 index
|
p
Now your job is simply, just advance by no more than 5 characters (or until the next space is encountered, assigning the characters to firstname and then nul-terminate following the last character:
/* read up to NLIM chars into fname */
for (n = 0; n < NMLIM && *p && *p != ' ' && *p != '\n'; p++)
fname[n++] = *p;
fname[n] = 0; /* nul terminate */
note: since fgets reads and includes the trailing '\n' in buffer, you should also test for the newline.
When you exit the loop, p is pointing to the seventh character in the buffer as follows:
--------------------------------
|a| |b|b|b|b|b|b|b|b| |c|c|\n|0| contents
--------------------------------
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 index
|
p
You now simply read forward until you encounter the next space and then advance past the space, e.g.:
/* discard remaining chars up to next ' ' */
while (*p && *p != ' ') p++;
p++; /* advance to next char */
note: if you exited the firstname loop pointing at a space, the above code does not execute.
Finally, all you do is repeat the same loop for surname that you did for firstname. Putting all the pieces of the puzzle together, you could do something similar to the following:
#include <stdio.h>
enum { NMLIM = 5, BUFSIZE = 256 };
int main (void) {
char buf[BUFSIZE] = "";
while (fgets (buf, BUFSIZE, stdin)) {
char *p = buf, cmd, /* start & end pointers */
fname[NMLIM+1] = "",
sname[NMLIM+1] = "";
size_t n = 0;
/* validate first char is 'a-z' and followed by ' ' */
if (*p && 'a' <= *p && *p <= 'z' && *(p + 1) == ' ') {
cmd = *p;
p += 2; /* advance pointer to next char following ' ' */
}
else { /* handle error */
fprintf (stderr, "error: no single command followed by space.\n");
return 1;
}
/* read up to NLIM chars into fname */
for (n = 0; n < NMLIM && *p && *p != ' ' && *p != '\n'; p++)
fname[n++] = *p;
fname[n] = 0; /* nul terminate */
/* discard remaining chars up to next ' ' */
while (*p && *p != ' ') p++;
p++; /* advance to next char */
/* read up to NLIM chars into sname */
for (n = 0; n < NMLIM && *p && *p != ' ' && *p != '\n'; p++)
sname[n++] = *p;
sname[n] = 0; /* nul terminate */
printf ("input : %soutput : %c %s %s\n",
buf, cmd, fname, sname);
}
return 0;
}
Example Use/Output
$ echo "a bbbbbbbb cc" | ./bin/walkptr
input : a bbbbbbbb cc
output : a bbbbb cc
Look things over an let me know if you have any questions. No matter how elaborate the string or what you need from it, you can always get what you need by simply walking a pointer (or a pair of pointers) down the length of the string.
One way to split the input buffer as OP desires is to use multiple calls to sscanf(), and to use the %n conversion specifier to keep track of the number of characters read. In the code below, the input string is scanned in three stages.
First, the pointer strPos is assigned to point to the first character of inputStr. Then the input string is scanned with " %c%n%*[^ ]%n". This format string skips over any initial whitespaces that a user might enter before the first character, and stores the first character in command. The %n directive tells sscanf() to store the number of characters read so far in the variable n; then the *[^ ] directive tells sscanf() to read and ignore any characters until a whitespace character is encountered. This effectively skips over any remaining characters that were entered after the initial command character. The %n directive appears again, and overwrites the previous value with the number of characters read until this point. The reason for using %n twice is that, if the user enters a character followed by a whitespace (as expected), the second directive will find no matches, and sscanf() will exit without ever reaching the final %n directive.
The pointer strPos is moved to the beginning of the remaining string by adding n to it, and sscanf() is called a second time, this time with "%5s%n%*[^ ]%n". Here, up to 5 characters are read into the character array firstname[], the number of characters read is saved by the %n directive, any remaining non-whitespace characters are read and ignored, and finally, if the scan made it this far, the number of characters read is saved again.
strPos is increased by n again, and the final scan only needs "%s" to complete the task.
Note that the return value of fgets() is checked to be sure that it was successful. The call to fgets() was changed slightly to:
fgets(inputStr, sizeof inputStr, stdin)
The sizeof operator is used here instead of IN_BUF_SIZE. This way, if the declaration of inputStr is changed later, this line of code will still be correct. Note that the sizeof operator works here because inputStr is an array, and arrays do not decay to pointers in sizeof expressions. But, if inputStr were passed into a function, sizeof could not be used in this way inside the function, because arrays decay to pointers in most expressions, including function calls. Some, #DavidC.Rankin, prefer constants as OP has used. If this seems confusing, I would suggest sticking with the constant IN_BUF_SIZE.
Also note that the return values for each of the calls to sscanf() are checked to be certain that the input matches expectations. For example, if the user enters a command and a first name, but no surname, the program will print an error message and exit. It is worth pointing out that, if the user enters say, a command character and first name only, after the second sscanf() the match may have failed on \n, and strPtr is then incremented to point to the \0 and so is still in bounds. But this relies on the newline being in the string. With no newline, the match might fail on \0, and then strPtr would be incremented out of bounds before the next call to sscanf(). Fortunately, fgets() retains the newline, unless the input line is larger than the specified size of the buffer. Then there is no \n, only the \0 terminator. A more robust program would check the input string for \n, and add one if needed. It would not hurt to increase the size of IN_BUF_SIZE.
#include <stdio.h>
#include <stdlib.h>
#define IN_BUF_SIZE 256
int main(void)
{
char inputStr[IN_BUF_SIZE];
char command;
char firstname[6];
char surname[6];
char *strPos = inputStr; // next scan location
int n = 0; // holds number of characters read
if (fgets(inputStr, sizeof inputStr, stdin) == NULL) {
fprintf(stderr, "Error in fgets()\n");
exit(EXIT_FAILURE);
}
if (sscanf(strPos, " %c%n%*[^ ]%n", &command, &n, &n) < 1) {
fprintf(stderr, "Input formatting error: command\n");
exit(EXIT_FAILURE);
}
strPos += n;
if (sscanf(strPos, "%5s%n%*[^ ]%n", firstname, &n, &n) < 1) {
fprintf(stderr, "Input formatting error: firstname\n");
exit(EXIT_FAILURE);
}
strPos += n;
if (sscanf(strPos, "%5s", surname) < 1) {
fprintf(stderr, "Input formatting error: surname\n");
exit(EXIT_FAILURE);
}
printf("%c %s %s\n", command, firstname, surname);
}
Sample interaction:
a Zaphod Beeblebrox
a Zapho Beebl
The fscanf() functions have a reputation for being subtle and error-prone; the format strings used above may seem a little bit tricky. By writing a function to skip to the next word in the input string, the calls to sscanf() can be simplified. In the code below, skipToNext() takes a pointer to a string as input; if the first character of the string is a \0 terminator, the pointer is returned unchanged. All initial non-whitespace characters are skipped over, then any whitespace characters are skipped, up to the next non-whitespace character (which may be a \0). A pointer is returned to this non-whitespace character.
The resulting program is a little bit longer than the previous program, but it may be easier to understand, and it certainly has simpler format strings. This program does differ from the first in that it no longer accepts leading whitespace in the string. If the user enters whitespace before the command character, this is considered erroneous input.
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#define IN_BUF_SIZE 256
char * skipToNext(char *);
int main(void)
{
char inputStr[IN_BUF_SIZE];
char command;
char firstname[6];
char surname[6];
char *strPos = inputStr; // next scan location
if (fgets(inputStr, sizeof inputStr, stdin) == NULL) {
fprintf(stderr, "Error in fgets()\n");
exit(EXIT_FAILURE);
}
if (sscanf(strPos, "%c", &command) != 1 || isspace(command)) {
fprintf(stderr, "Input formatting error: command\n");
exit(EXIT_FAILURE);
}
strPos = skipToNext(strPos);
if (sscanf(strPos, "%5s", firstname) != 1) {
fprintf(stderr, "Input formatting error: firstname\n");
exit(EXIT_FAILURE);
}
strPos = skipToNext(strPos);
if (sscanf(strPos, "%5s", surname) != 1) {
fprintf(stderr, "Input formatting error: surname\n");
exit(EXIT_FAILURE);
}
printf("%c %s %s\n", command, firstname, surname);
}
char * skipToNext(char *c)
{
int inWord = isspace(*c) ? 0 : 1;
if (inWord && *c != '\0') {
while (!isspace(*c)) {
++c;
}
}
inWord = 0;
while (isspace(*c)) {
++c;
}
return c;
}

Reading multiple lines with different data types in C

I have a very strange problem, I'm trying to read a .txt file with C, and the data is structured like this:
%s
%s
%d %d
Since I have to read the strings all the way to \n I'm reading it like this:
while(!feof(file)){
fgets(s[i].title,MAX_TITLE,file);
fgets(s[i].artist,MAX_ARTIST,file);
char a[10];
fgets(a,10,file);
sscanf(a,"%d %d",&s[i].time.min,&s[i++].time.sec);
}
However, the very first integer I read in s.time.min shows a random big number.
I'm using the sscanf right now since a few people had a similar issue, but it doesn't help.
Thanks!
EDIT: The integers represent time, they will never exceed 5 characters combined, including the white space between.
Note, I take your post to be reading values from 3 different lines, e.g.:
%s
%s
%d %d
(primarily evidenced by your use of fgets, a line-oriented input function, which reads a line of input (up to and including the '\n') each time it is called.) If that is not the case, then the following does not apply (and can be greatly simplified)
Since you are reading multiple values into a single element in an array of struct, you may find it better (and more robust), to read each value and validate each value using temporary values before you start copying information into your structure members themselves. This allows you to (1) validate the read of all values, and (2) validate the parse, or conversion, of all required values before storing members in your struct and incrementing your array index.
Additionally, you will need to remove the tailing '\n' from both title and artist to prevent having embedded newlines dangling off the end of your strings (which will cause havoc with searching for either a title or artist). For instance, putting it all together, you could do something like:
void rmlf (char *s);
....
char title[MAX_TITLE] = "";
char artist[MAX_ARTIST = "";
char a[10] = "";
int min, sec;
...
while (fgets (title, MAX_TITLE, file) && /* validate read of values */
fgets (artist, MAX_ARTIST, file) &&
fgets (a, 10, file)) {
if (sscanf (a, "%d %d", &min, &sec) != 2) { /* validate conversion */
fprintf (stderr, "error: failed to parse 'min' 'sec'.\n");
continue; /* skip line - tailor to your needs */
}
rmlf (title); /* remove trailing newline */
rmlf (artist);
s[i].time.min = min; /* copy to struct members & increment index */
s[i].time.sec = sec;
strncpy (s[i].title, title, MAX_TITLE);
strncpy (s[i++].artist, artist, MAX_ARTIST);
}
/** remove tailing newline from 's'. */
void rmlf (char *s)
{
if (!s || !*s) return;
for (; *s && *s != '\n'; s++) {}
*s = 0;
}
(note: this will also read all values until an EOF is encountered without using feof (see Related link: Why is “while ( !feof (file) )” always wrong?))
Protecting Against a Short-Read with fgets
Following on from Jonathan's comment, when using fgets you should really check to insure you have actually read the entire line, and not experienced a short read where the maximum character value you supply is not sufficient to read the entire line (e.g. a short read because characters in that line remain unread)
If a short read occurs, that will completely destroy your ability to read any further lines from the file, unless you handle the failure correctly. This is because the next attempt to read will NOT start reading on the line you think it is reading and instead attempt to read the remaining characters of the line where the short read occurred.
You can validate a read by fgets by validating the last character read into your buffer is in fact a '\n' character. (if the line is longer than the max you specify, the last character before the nul-terminating character will be an ordinary character instead.) If a short read is encountered, you must then read and discard the remaining characters in the long line before continuing with your next read. (unless you are using a dynamically allocated buffer where you can simply realloc as required to read the remainder of the line, and your data structure)
Your situation complicates the validation by requiring data from 3 lines from the input file for each struct element. You must always maintain your 3-line read in sync reading all 3 lines as a group during each iteration of your read loop (even if a short read occurs). That means you must validate that all 3 lines were read and that no short read occurred in order to handle any one short read without exiting your input loop. (you can validate each individually if you just want to terminate input on any one short read, but that leads to a very inflexible input routine.
You can tweak the rmlf function above to a function that validates each read by fgets in addition to removing the trailing newline from the input. I have done that below in a function called, surprisingly, shortread. The tweaks to the original function and read loop could be coded something like this:
int shortread (char *s, FILE *fp);
...
for (idx = 0; idx < MAX_SONGS;) {
int t, a, b;
t = a = b = 0;
/* validate fgets read of complete line */
if (!fgets (title, MAX_TITLE, fp)) break;
t = shortread (title, fp);
if (!fgets (artist, MAX_ARTIST, fp)) break;
a = shortread (artist, fp);
if (!fgets (buf, MAX_MINSEC, fp)) break;
b = shortread (buf, fp);
if (t || a || b) continue; /* if any shortread, skip */
if (sscanf (buf, "%d %d", &min, &sec) != 2) { /* validate conversion */
fprintf (stderr, "error: failed to parse 'min' 'sec'.\n");
continue; /* skip line - tailor to your needs */
}
s[idx].time.min = min; /* copy to struct members & increment index */
s[idx].time.sec = sec;
strncpy (s[idx].title, title, MAX_TITLE);
strncpy (s[idx].artist, artist, MAX_ARTIST);
idx++;
}
...
/** validate complete line read, remove tailing newline from 's'.
* returns 1 on shortread, 0 - valid read, -1 invalid/empty string.
* if shortread, read/discard remainder of long line.
*/
int shortread (char *s, FILE *fp)
{
if (!s || !*s) return -1;
for (; *s && *s != '\n'; s++) {}
if (*s != '\n') {
int c;
while ((c = fgetc (fp)) != '\n' && c != EOF) {}
return 1;
}
*s = 0;
return 0;
}
(note: in the example above the result of the shortread check for each of the lines that make up and title, artist, time group.)
To validate the approach I put together a short example that will help put it all in context. Look over the example and let me know if you have any further questions.
#include <stdio.h>
#include <string.h>
/* constant definitions */
enum { MAX_MINSEC = 10, MAX_ARTIST = 32, MAX_TITLE = 48, MAX_SONGS = 64 };
typedef struct {
int min;
int sec;
} stime;
typedef struct {
char title[MAX_TITLE];
char artist[MAX_ARTIST];
stime time;
} songs;
int shortread (char *s, FILE *fp);
int main (int argc, char **argv) {
char title[MAX_TITLE] = "";
char artist[MAX_ARTIST] = "";
char buf[MAX_MINSEC] = "";
int i, idx, min, sec;
songs s[MAX_SONGS] = {{ .title = "", .artist = "" }};
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
return 1;
}
for (idx = 0; idx < MAX_SONGS;) {
int t, a, b;
t = a = b = 0;
/* validate fgets read of complete line */
if (!fgets (title, MAX_TITLE, fp)) break;
t = shortread (title, fp);
if (!fgets (artist, MAX_ARTIST, fp)) break;
a = shortread (artist, fp);
if (!fgets (buf, MAX_MINSEC, fp)) break;
b = shortread (buf, fp);
if (t || a || b) continue; /* if any shortread, skip */
if (sscanf (buf, "%d %d", &min, &sec) != 2) { /* validate conversion */
fprintf (stderr, "error: failed to parse 'min' 'sec'.\n");
continue; /* skip line - tailor to your needs */
}
s[idx].time.min = min; /* copy to struct members & increment index */
s[idx].time.sec = sec;
strncpy (s[idx].title, title, MAX_TITLE);
strncpy (s[idx].artist, artist, MAX_ARTIST);
idx++;
}
if (fp != stdin) fclose (fp); /* close file if not stdin */
for (i = 0; i < idx; i++)
printf (" %2d:%2d %-32s %s\n", s[i].time.min, s[i].time.sec,
s[i].artist, s[i].title);
return 0;
}
/** validate complete line read, remove tailing newline from 's'.
* returns 1 on shortread, 0 - valid read, -1 invalid/empty string.
* if shortread, read/discard remainder of long line.
*/
int shortread (char *s, FILE *fp)
{
if (!s || !*s) return -1;
for (; *s && *s != '\n'; s++) {}
if (*s != '\n') {
int c;
while ((c = fgetc (fp)) != '\n' && c != EOF) {}
return 1;
}
*s = 0;
return 0;
}
Example Input
$ cat ../dat/titleartist.txt
First Title I Like
First Artist I Like
3 40
Second Title That Is Way Way Too Long To Fit In MAX_TITLE Characters
Second Artist is Fine
12 43
Third Title is Fine
Third Artist is Way Way Too Long To Fit in MAX_ARTIST
3 23
Fourth Title is Good
Fourth Artist is Good
32274 558212 (too long for MAX_MINSEC)
Fifth Title is Good
Fifth Artist is Good
4 27
Example Use/Output
$ ./bin/titleartist <../dat/titleartist.txt
3:40 First Artist I Like First Title I Like
4:27 Fifth Artist is Good Fifth Title is Good
Instead of sscanf(), I would use strtok() and atoi().
Just curious, why only 10 bytes for the two integers? Are you sure they are always that small?
By the way, I apologize for such a short answer. I'm sure there is a way to get sscanf() to work for you, but in my experience sscanf() can be rather finicky so I'm not a big fan. When parsing input with C, I have just found it a lot more efficient (in terms of how long it takes to write and debug the code) to just tokenize the input with strtok() and convert each piece individually with the various ato? functions (atoi, atof, atol, strtod, etc.; see stdlib.h). It keeps things simpler, because each piece of input is handled individually, which makes debugging any problems (should they arise) much easier. In the end I typically spend a lot less time getting such code to work reliably than I did when I used to try to use sscanf().
Use "%*s %*s %d %d" as your format string, instead...
You seem to be expecting sscanf to automagically skip the two tokens leading up to the decimal digit fields. It doesn't do that unless you explicitly tell it to (hence the pair of %*s).
You can't expect the people who designed C to have designed it the same way as you would. You NEED to check the return value, as iharob said.
That's not all. You NEED to read (and understand reelatively well) the entire scanf manual (the one written by OpenGroup is okay). That way you know how to use the function (including all of the subtle nuances of format strings) and what to do with the return vale.
As a programmer, you need to read. Remember that well.

Resources