Splitting user input into strings of specific length - c

I'm writing a C program that parses user input into a char, and two strings of set length. The user input is stored into a buffer using fgets, and then parsed with sscanf. The trouble is, the three fields have a maximum length. If a string exceeds this length, the remaining characters before the next whitespace should be consumed/discarded.
#include <stdio.h>
#define IN_BUF_SIZE 256
int main(void) {
char inputStr[IN_BUF_SIZE];
char command;
char firstname[6];
char surname[6];
fgets(inputStr, IN_BUF_SIZE, stdin);
sscanf(inputStr, "%c %5s %5s", &command, firstname, surname);
printf("%c %s %s\n", command, firstname, surname);
}
So, with an input of
a bbbbbbbb cc
the desired output would be
a bbbbb cc
but is instead the output is
a bbbbb bbb
Using a format specifier "%c%*s %5s%*s %5s%*s" runs into the opposite problem, where each substring needs to exceed the set length to get to the desired outcome.
Is there way to achieve this by using format specifiers, or is the only way saving the substrings in buffers of their own before cutting them down to the desired length?

In addition to the other answers, never forget when facing string parsing problems, you always have the option of simply walking a pointer down the string to accomplish any type parsing you require. When you read your string into buffer (my buf below), you have an array of characters you are free to analyze manually (either with array indexes, e.g. buffer[i] or by assigning a pointer to the beginning, e.g. char *p = buffer;) With your string, you have the following in buffer with p pointing to the first character in buffer:
--------------------------------
|a| |b|b|b|b|b|b|b|b| |c|c|\n|0| contents
--------------------------------
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 index
|
p
To test the character pointed to by p, you simply dereference the pointer, e.g. *p. So to test whether you have an initial character between a-z followed by a space at the beginning of buffer, you simply need do:
/* validate first char is 'a-z' and followed by ' ' */
if (*p && 'a' <= *p && *p <= 'z' && *(p + 1) == ' ') {
cmd = *p;
p += 2; /* advance pointer to next char following ' ' */
}
note:, you are testing *p first, (which is the shorthand for *p != 0 or the equivalent *p != '\0') to validate the string is not empty (e.g. the first char isn't the nul-byte) before proceeding with further tests. You would also include an else { /* handle error */ } in the event any one of the tests failed (meaning you have no command followed by a space).
When you are done, your are left with p pointing to the third character in buffer, e.g.:
--------------------------------
|a| |b|b|b|b|b|b|b|b| |c|c|\n|0| contents
--------------------------------
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 index
|
p
Now your job is simply, just advance by no more than 5 characters (or until the next space is encountered, assigning the characters to firstname and then nul-terminate following the last character:
/* read up to NLIM chars into fname */
for (n = 0; n < NMLIM && *p && *p != ' ' && *p != '\n'; p++)
fname[n++] = *p;
fname[n] = 0; /* nul terminate */
note: since fgets reads and includes the trailing '\n' in buffer, you should also test for the newline.
When you exit the loop, p is pointing to the seventh character in the buffer as follows:
--------------------------------
|a| |b|b|b|b|b|b|b|b| |c|c|\n|0| contents
--------------------------------
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 index
|
p
You now simply read forward until you encounter the next space and then advance past the space, e.g.:
/* discard remaining chars up to next ' ' */
while (*p && *p != ' ') p++;
p++; /* advance to next char */
note: if you exited the firstname loop pointing at a space, the above code does not execute.
Finally, all you do is repeat the same loop for surname that you did for firstname. Putting all the pieces of the puzzle together, you could do something similar to the following:
#include <stdio.h>
enum { NMLIM = 5, BUFSIZE = 256 };
int main (void) {
char buf[BUFSIZE] = "";
while (fgets (buf, BUFSIZE, stdin)) {
char *p = buf, cmd, /* start & end pointers */
fname[NMLIM+1] = "",
sname[NMLIM+1] = "";
size_t n = 0;
/* validate first char is 'a-z' and followed by ' ' */
if (*p && 'a' <= *p && *p <= 'z' && *(p + 1) == ' ') {
cmd = *p;
p += 2; /* advance pointer to next char following ' ' */
}
else { /* handle error */
fprintf (stderr, "error: no single command followed by space.\n");
return 1;
}
/* read up to NLIM chars into fname */
for (n = 0; n < NMLIM && *p && *p != ' ' && *p != '\n'; p++)
fname[n++] = *p;
fname[n] = 0; /* nul terminate */
/* discard remaining chars up to next ' ' */
while (*p && *p != ' ') p++;
p++; /* advance to next char */
/* read up to NLIM chars into sname */
for (n = 0; n < NMLIM && *p && *p != ' ' && *p != '\n'; p++)
sname[n++] = *p;
sname[n] = 0; /* nul terminate */
printf ("input : %soutput : %c %s %s\n",
buf, cmd, fname, sname);
}
return 0;
}
Example Use/Output
$ echo "a bbbbbbbb cc" | ./bin/walkptr
input : a bbbbbbbb cc
output : a bbbbb cc
Look things over an let me know if you have any questions. No matter how elaborate the string or what you need from it, you can always get what you need by simply walking a pointer (or a pair of pointers) down the length of the string.

One way to split the input buffer as OP desires is to use multiple calls to sscanf(), and to use the %n conversion specifier to keep track of the number of characters read. In the code below, the input string is scanned in three stages.
First, the pointer strPos is assigned to point to the first character of inputStr. Then the input string is scanned with " %c%n%*[^ ]%n". This format string skips over any initial whitespaces that a user might enter before the first character, and stores the first character in command. The %n directive tells sscanf() to store the number of characters read so far in the variable n; then the *[^ ] directive tells sscanf() to read and ignore any characters until a whitespace character is encountered. This effectively skips over any remaining characters that were entered after the initial command character. The %n directive appears again, and overwrites the previous value with the number of characters read until this point. The reason for using %n twice is that, if the user enters a character followed by a whitespace (as expected), the second directive will find no matches, and sscanf() will exit without ever reaching the final %n directive.
The pointer strPos is moved to the beginning of the remaining string by adding n to it, and sscanf() is called a second time, this time with "%5s%n%*[^ ]%n". Here, up to 5 characters are read into the character array firstname[], the number of characters read is saved by the %n directive, any remaining non-whitespace characters are read and ignored, and finally, if the scan made it this far, the number of characters read is saved again.
strPos is increased by n again, and the final scan only needs "%s" to complete the task.
Note that the return value of fgets() is checked to be sure that it was successful. The call to fgets() was changed slightly to:
fgets(inputStr, sizeof inputStr, stdin)
The sizeof operator is used here instead of IN_BUF_SIZE. This way, if the declaration of inputStr is changed later, this line of code will still be correct. Note that the sizeof operator works here because inputStr is an array, and arrays do not decay to pointers in sizeof expressions. But, if inputStr were passed into a function, sizeof could not be used in this way inside the function, because arrays decay to pointers in most expressions, including function calls. Some, #DavidC.Rankin, prefer constants as OP has used. If this seems confusing, I would suggest sticking with the constant IN_BUF_SIZE.
Also note that the return values for each of the calls to sscanf() are checked to be certain that the input matches expectations. For example, if the user enters a command and a first name, but no surname, the program will print an error message and exit. It is worth pointing out that, if the user enters say, a command character and first name only, after the second sscanf() the match may have failed on \n, and strPtr is then incremented to point to the \0 and so is still in bounds. But this relies on the newline being in the string. With no newline, the match might fail on \0, and then strPtr would be incremented out of bounds before the next call to sscanf(). Fortunately, fgets() retains the newline, unless the input line is larger than the specified size of the buffer. Then there is no \n, only the \0 terminator. A more robust program would check the input string for \n, and add one if needed. It would not hurt to increase the size of IN_BUF_SIZE.
#include <stdio.h>
#include <stdlib.h>
#define IN_BUF_SIZE 256
int main(void)
{
char inputStr[IN_BUF_SIZE];
char command;
char firstname[6];
char surname[6];
char *strPos = inputStr; // next scan location
int n = 0; // holds number of characters read
if (fgets(inputStr, sizeof inputStr, stdin) == NULL) {
fprintf(stderr, "Error in fgets()\n");
exit(EXIT_FAILURE);
}
if (sscanf(strPos, " %c%n%*[^ ]%n", &command, &n, &n) < 1) {
fprintf(stderr, "Input formatting error: command\n");
exit(EXIT_FAILURE);
}
strPos += n;
if (sscanf(strPos, "%5s%n%*[^ ]%n", firstname, &n, &n) < 1) {
fprintf(stderr, "Input formatting error: firstname\n");
exit(EXIT_FAILURE);
}
strPos += n;
if (sscanf(strPos, "%5s", surname) < 1) {
fprintf(stderr, "Input formatting error: surname\n");
exit(EXIT_FAILURE);
}
printf("%c %s %s\n", command, firstname, surname);
}
Sample interaction:
a Zaphod Beeblebrox
a Zapho Beebl
The fscanf() functions have a reputation for being subtle and error-prone; the format strings used above may seem a little bit tricky. By writing a function to skip to the next word in the input string, the calls to sscanf() can be simplified. In the code below, skipToNext() takes a pointer to a string as input; if the first character of the string is a \0 terminator, the pointer is returned unchanged. All initial non-whitespace characters are skipped over, then any whitespace characters are skipped, up to the next non-whitespace character (which may be a \0). A pointer is returned to this non-whitespace character.
The resulting program is a little bit longer than the previous program, but it may be easier to understand, and it certainly has simpler format strings. This program does differ from the first in that it no longer accepts leading whitespace in the string. If the user enters whitespace before the command character, this is considered erroneous input.
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#define IN_BUF_SIZE 256
char * skipToNext(char *);
int main(void)
{
char inputStr[IN_BUF_SIZE];
char command;
char firstname[6];
char surname[6];
char *strPos = inputStr; // next scan location
if (fgets(inputStr, sizeof inputStr, stdin) == NULL) {
fprintf(stderr, "Error in fgets()\n");
exit(EXIT_FAILURE);
}
if (sscanf(strPos, "%c", &command) != 1 || isspace(command)) {
fprintf(stderr, "Input formatting error: command\n");
exit(EXIT_FAILURE);
}
strPos = skipToNext(strPos);
if (sscanf(strPos, "%5s", firstname) != 1) {
fprintf(stderr, "Input formatting error: firstname\n");
exit(EXIT_FAILURE);
}
strPos = skipToNext(strPos);
if (sscanf(strPos, "%5s", surname) != 1) {
fprintf(stderr, "Input formatting error: surname\n");
exit(EXIT_FAILURE);
}
printf("%c %s %s\n", command, firstname, surname);
}
char * skipToNext(char *c)
{
int inWord = isspace(*c) ? 0 : 1;
if (inWord && *c != '\0') {
while (!isspace(*c)) {
++c;
}
}
inWord = 0;
while (isspace(*c)) {
++c;
}
return c;
}

Related

Read lines from an input file of varying line sizes

Currently, I am using getline to read lines from a file and I can access individual characters the following way from stdin:
char buffer[1024];
while((lineSize = getline(&line, &len, stdin)) != -1) {
if (line[0] != 84) {
// ...
continue; // continue to next line in file
}
if (line[0] == 84){ // (the 'T' character)
printf("TEST: Line Value: %s\n", line);
buffer[0] = line[1]; // this is a single digit number in char form
buffer[1] = '\0';
// send buffer somewhere
send(clientSocket, buffer, strlen(buffer), 0);
// ...
}
A sample file is as follows:
T3
T9
S0
S4
T55
T6
However, as you can see, I run into issues when a number > 9 is given such as the T55 line here. I can only grab the first digit with this method. Therefore, I may have to completely redo the way I read a file. Is there a better and simple way I can read through an input file and check the first character and make the remaining character(s) into an int until the end of a line? (Max the integer can be is 100 btw)
Continuing from my comments, you can use fgets() to read line and then use sscanf() with the format string of " %c%d%n" to extract the first character and converting the next set of digits to an int and finally obtaining to total number of characters consumed by sscanf() in that conversion using the "%n" specifier. You validate that both the character and integer conversion took place and that the first non-whitespace character read was 'T'. You can then use mychar and myint as desired and use mylen as the length to use with send.
(note: you can scan forward in line to determine if any whitespace was included at the beginning and ignore than in your call to send() -- that is left to you)
Putting it altogether, you can so something like:
char line[1024],
mychar = 0;
int myint = 0,
mylen;
/* read using fgets */
while (fgets (line, sizeof line, stdin)) {
/* parse with sscanf, validate both conversions and 'T' as 1st char,
* use "%n" to get number of chars through int conversion
*/
if (sscanf (line, " %c%d%n", &mychar, &myint, &mylen) != 2 ||
mychar != 'T') {
fputs ("error: invalid format.\n", stderr);
continue;
}
send (clientSocket, line, mylen, 0); /* send mylen chars */
}
To be more specific, I will need to see your Minimal Complete Reproducible Example to ensure there is nothing outside what you have posted that will impact the code above.
Adding Example
Adding a short example to show the result of parsing under expected and unexpected input with the above, and adding the scanning forward to remove leading whitespace in the line, a short program that reads input from stdin and writes to stdout, outputting the lines matching 'T'(int), you could do:
#include <stdio.h>
#include <unistd.h>
#include <ctype.h>
int main (void) {
char line[1024],
mychar = 0,
nl = '\n';
int myint = 0,
mylen;
/* read using fgets */
while (fgets (line, sizeof line, stdin)) {
char *p = line;
/* parse with sscanf, validate both conversions and 'T' as 1st char,
* use "%n" to get number of chars through int conversion
*/
if (sscanf (line, " %c%d%n", &mychar, &myint, &mylen) != 2 ||
mychar != 'T') {
fprintf (stderr, "error: invalid format: %s", line);
continue;
}
while (isspace (*p)) { /* reamove leading whitespace */
p += 1;
mylen -= 1;
}
// send (clientSocket, p, mylen, 0); /* send mylen chars */
write (STDOUT_FILENO, p, mylen);
write (STDOUT_FILENO, &nl, 1);
}
}
(note: write (STDOUT_FILENO, &nl, 1); is simply included above to output a newline after each output -- it would not be part of what you send() over your socket -- unless the receiving program is using the '\n' as the line termination character)
Example Input File:
$ cat dat/charint.txt
T4
T44
T444
TT
T3 and more gibberish
P55
(note: leading whitespace and trailing characters included in last two lines beginning with 'T', including the invalid line format " TT")
Example Use/Output
$ ./bin/charandintsend < dat/charint.txt
T4
T44
T444
error: invalid format: TT
T3
error: invalid format: P55
Let me know if you have questions, or if I misunderstood some aspect of your question.

C: The first element of a char cannot be detected

I am learning getting inputs from key board. I want the user create a or more strings from the input, each string is considered as a line, the program will not terminate until a specified char is pressed. Then store these strings to the buffer.
However, when I print out the buffer, the first few elements of the string are always missing. Here is my code:
#include<stdio.h>
int main(void){
printf("Please type the string:\n");
char buffer[1000];
int c;
while( (c = getchar()) != ' ' ) {
fgets(buffer, sizeof(buffer), stdin);
printf("The output string is: \n%s\n", buffer);
if((c = getchar())== ' '){
printf("A space is detected!\n");
break;
}
}
}
The output is:
Please type the string:
abcdefg
The output string is:
bcdefg
hijklmn
The output string is:
jklmn
opqrst
The output string is:
qrst
A space is detected!
Program ended with exit code: 0
Which part did I go wrong? Any hints are very much appreciated.
The problem you are having is both getchar(), and fgets in your code are reading from stdin. Since you call getchar() first in your test, it was consuming the first character of your string, when you called it again, another character disappeared...
You don't need getchar() at all to end your loop. All you care about for breaking your loop as you have explained is whether the user enters a space as the first character. fgets does not skip leading whitespace, so any leading space entered by the user will be captured at the beginning of buffer. So to satisfy your loop-exit condition, all you need to do is check if the first character of buffer is a space.
How? The simple way is to just derererence buffer, e.g. *buffer returns the first character in buffer. How? In pointer notation, buffer + 0 is the offset you want in buffer, so to get the character at that location, you dereference, e.g. *(buffer + 0), which of course is just *buffer, which is the equivalent of buffer[0].
So, putting it altogether, and getting rid of getchar(), and adding strlen to properly validate that the string fit in buffer and to get the location for the trailing '\n' read and included in buffer by fgets (which leaves you with the length of trimmed string as a benefit), you could do something similar to:
#include <stdio.h>
#include <string.h>
#define MAXC 1000 /* if you need a constant, define one (or more) */
int main (void) {
char buffer[MAXC] = ""; /* initialize strings zero (good practice) */
for (;;) { /* loop continually taking input */
size_t len; /* variable for buffer length */
printf ("\nenter string: "); /* prompt */
if (!fgets (buffer, sizeof buffer, stdin)) /* read input */
break; /* exit if user cancels input */
len = strlen (buffer); /* get length */
if (len && buffer[len-1] == '\n') /* check if last char is \n */
buffer[--len] = 0; /* overwrite with nul-char */
else { /* otherwise string too long */
fputs ("error: string too long.\n", stderr);
return 1;
}
if (*buffer == ' ') /* check if 1st char of buffer is ' ' */
break;
printf ("buffer: %s (%zu chars)\n", buffer, len); /* output */
}
}
Example Use/Output
$ ./bin/fgetsspace
enter string: my dog has fleas
buffer: my dog has fleas (16 chars)
enter string: my cat has none
buffer: my cat has none (15 chars)
enter string: bye
(note: a space was entered before bye above, e.g. " bye")
Look things over and let me know if you have further questions.
Separating Words with strtok
To separate each line into individual words you can use strtok. The first argument is the buffer (for the 1st call), the second parameter is a list of characters to use as delimeters between the words (e.g. if you want to separate on space include a space, to not include the '.' at the end of a sentence include that as well -- and include the '\n'). After the 1st call to strtok all subsequent calls to get the remaining words uses NULL in place of buffer, e.g.
#include <stdio.h>
#include <string.h>
#define MAXC 1000 /* if you need a constant, define one (or more) */
int main (void) {
char buffer[MAXC] = ""; /* initialize strings zero (good practice) */
for (;;) { /* loop continually taking input */
size_t len; /* variable for buffer length */
char *delim = " .\n", /* delmiters for strtok */
*p = buffer; /* pointer to buffer for strtok */
printf ("\nenter string: "); /* prompt */
if (!fgets (buffer, sizeof buffer, stdin)) /* read input */
break; /* exit if user cancels input */
len = strlen (buffer); /* get length */
if (len && buffer[len-1] == '\n') /* check if last char is \n */
buffer[--len] = 0; /* overwrite with nul-char */
else { /* otherwise string too long */
fputs ("error: string too long.\n", stderr);
return 1;
}
if (*buffer == ' ') /* check if 1st char of buffer is ' ' */
break;
printf ("buffer: %s (%zu chars)\n", buffer, len); /* output */
p = strtok (buffer, delim); /* 1st call to strtok uses buffer */
while (p != NULL) {
printf (" %s\n", p);
p = strtok (NULL, delim); /* subsequent calls use NULL */
}
}
}
(note: the original buffer is modified, so make a copy if you need to preserve the original)
Example Use/Output
$ ./bin/fgetsspace
enter string: my dog has fleas
buffer: my dog has fleas (16 chars)
my
dog
has
fleas
enter string: my cat has none
buffer: my cat has none (15 chars)
my
cat
has
none
enter string: bye
getchar swallows up a character. Your first iteration gets one character swallowed up by the initial call in the while, and then successive iterations get two characters swallowed up, one by the getchar you use to detect a space and then again the one in the while.
Answering in addition to my initial comment and the issue:
First, quoting myself:
I believe that when using getChar(), you efficiently remove the character from stdin buffer.
As stated since then by other people, the problem is that your call to getchar function consume and input, efficiently removing it from stdin buffer.
See Jim Buck's answer for detailed informations on the precise behavior of your application.
Now, what should you do ?
First, the if inside the while loop is not necessary, and using your application right now must be pretty odd. Try doing :
#include<stdio.h>
int main(void){
printf("Please type the string:\n");
char buffer[1000];
int c;
while( (c = getchar()) != ' ' ) {
fgets(buffer, sizeof(buffer), stdin);
printf("The output string is: \n%s\n", buffer);
}
printf("A space is detected!\n");
}
Instead to prevent unnecessary user inputs. Your loop is basically an infinite loop so there is no need to check at the end of every iteration if the loop should terminate, the while statement is already doing that pretty damn well. :P
Now, to prevent the input from being taken out of buffer, I would consider using the buffer's first element instead of "c" variable.
Like so :
#include<stdio.h>
int main(void){
printf("Please type the strings:\n");
char buffer[1000];
while( (buffer[0] = getchar()) != ' ' ) { // Now reads directly into buffer
fgets(buffer + 1, sizeof(buffer), stdin); // + 1 prevents overriding the char we just read.
printf("The output string is: \n%s\n", buffer);
}
printf("A space is detected!\n");
}
Have a nice day!

sscanf doesnt seem to capture the correct parts of my strings

I've tried to run this:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char a[1000];
void eliminatesp() {
char buff1[1000], buff2[1000];
LOOP: sscanf(a,"%s %s",buff1,buff2);
sprintf(a,"%s%s", buff1, buff2);
for(int i=0; i<strlen(a); ++i) {
if(a[i]==' ') goto LOOP;
}
}
void eliminateline() {
char buff1[1000]; char buff2[1000];
LOOP: sscanf(a,"%s\n\n%s",buff1,buff2);
sprintf(a,"%s\n%s", buff1, buff2);
for(int i=0; i<strlen(a)-1; ++i) {
if(a[i]=='\n'&&a[i+1]=='\n') goto LOOP;
}
}
int main() {sprintf(a,"%s\n\n%s", "hello world","this is my program, cris");
eliminatesp();
eliminateline();
printf("%s",a); return 0;
return 0;
}
but the output was:
hello world
world
How can I correct it? I was trying to remove spaces and empty lines.
Going with your idea of using sscanf and sprintf you can actually eliminate both spaces and newlines in a single function, as sscanf will ignore all whitespace (including newlines) when reading the input stream. So something like this should work:
void eliminate() {
char buff1[1000], buff2[1000], b[1000];
char* p = a, *q = b, *pq = b;
sprintf(q, "%s", p);
while (q != NULL && *q != '\0')
{
if (iswspace(*q))
{
sscanf(pq, "%s %s", buff1, buff2);
sprintf(p, "%s%s", buff1, buff2);
p += strlen(buff1);
pq = ++q;
}
q++;
}
}
Pedro, while the %s format specifier does stop conversion on the first encountered whitespace, it isn't the only drawback to attempting to parse with sscanf. In order to use sscanf you will also need to use the %n conversion specifier (the number of characters consumed during conversion to the point the %n appears) and save the value as an integer (say offset). Your next conversion will begin a a + offset until you have exhausted all words in 'a'. This can be a tedious process.
A better approach can simply be to loop over all characters in 'a' copying non-whitespace and single-delimiting whitespace to the new buffer as you go. (I often find it easier to copy the full string to a new buffer (say 'b') and then read from 'b' writing the new compressed string back to 'a').
As you work your way down the original string, you use simple if else logic to determine whether to store the current (or last) character or whether to just skip it and get the next. There are many ways to do this, no one way more right than the other as long as they are reasonably close in efficiency. Making use of the <ctype.h> functions like isspace() makes things easier.
Also, in your code, avoid the use of global variables. There is no reason you can't declare 'a' in main() and pass it as a parameter to your eliminate functions. If you need a constant in your code, like 1000, then #define a constant and avoid sprinkling magic numbers throughout your code.
Below is an example putting all those pieces together, and combining both your eliminatesp and eliminateline functions into a single eliminatespline function that does both trim whitespace and eliminate blank lines. This will handle blank lines and considers lines containing only whitespace characters as blank.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#define MAXL 1000 /* if you need a constant, define one (or more) */
/** trim leading, compress included, and trim trailing whitespace.
* given non-empty string 'a', trim all leading whitespace, remove
* multiple included spaces and empty lines, and trim all trailing
* whitespace.
*/
void eliminatespline (char *a)
{
char b[MAXL] = "", /* buffer to hold copy of a */
*rp = b, /* read pointer to copy of a */
*wp = a, /* write pointer for a */
last = 0; /* last char before current */
if (!a || !*a) /* a NULL or empty - return */
return;
strcpy (b, a); /* copy a to b */
while (isspace (*rp)) /* skip leading whitespace */
rp++;
last = *rp++; /* fill last with 1st non-whitespace */
while (*rp) { /* loop over remaining chars in b */
/* last '\n' and current whitespace - advance read pointer */
if (last == '\n' && isspace(*rp)) {
rp++;
continue;
} /* last not current or last not space */
else if (last != *rp || !isspace (last))
*wp++ = last; /* write last, advance write pointer */
last = *rp++; /* update last, advance read pointer */
}
if (!isspace (last)) /* if last not space */
*wp++ = last; /* write last, advance write pointer */
*wp = 0; /* nul-terminate at write pointer */
}
int main() {
char a[] = " hello world\n \n\nthis is my program, cris ";
eliminatespline (a);
printf ("'%s'\n", a);
return 0;
}
note: the line being trimmed has both leading and trailing whitespace as well as embedded blank lines and lines containing only whitespace, e.g.
" hello world\n \n\nthis is my program, cris "
Example Use/Output
$ ./bin/elimspaceline
'hello world
this is my program, cris'
(note: the printf statements wraps the output in single-quotes to confirm all leading and trailing whitespace was eliminated.)
If you did want to use sscanf, you could essentially do the same thing with sscanf (using the %n specifier to report characters consumed) and a array of two-characters to treat the next character as a string, and do something like the following:
void eliminatespline (char *a)
{
char b[MAXL] = "", /* string to hold build w/whitespace removed */
word[MAXL] = "", /* string for each word */
c[2] = ""; /* string made of char after word */
int n = 0, /* number of chars consumed by sscanf */
offset = 0; /* offset from beginning of a */
size_t len; /* length of final string in b */
/* sscanf each word and char that follows, reporting consumed */
while (sscanf (a + offset, "%s%c%n", word, &c[0], &n) == 2) {
strcat (b, word); /* concatenate word */
strcat (b, c); /* concatenate next char */
offset += n; /* update offset with n */
}
len = strlen (b); /* get final length of b */
if (len && isspace(b[len - 1])) /* if last char is whitespace */
b[len - 1] = 0; /* remove last char */
strcpy (a, b); /* copy b to a */
}
Look things over, try both approaches and let me know if you have further questions.

Comparing String Arrays in C

This is the code:
#include <stdio.h>
int main(void)
{
char words[256];
char filename[64];
int count = 0;
printf("Enter the file name: ");
scanf("%s", filename);
FILE *fileptr;
fileptr = fopen(filename, "r");
if(fileptr == NULL)
printf("File not found!\n");
while ((fscanf(fileptr, " %s ", words))> 0)
{
if (words==' ' || words == '\n')
count++;
}
printf("%s contains %d words.\n", filename, count);
return 0;
}
I keep getting this error:
warning: comparison between pointer and integer [enabled by default]
if (words==' ' || words == '\n')
^
I don't get the error once I change, words to *words but that does not give me the correct results. I am trying count the number of words in a file.
not necessary compare because %s(words) does not contain white spaces(e.g. ' ' or '\n').
try this
while (fscanf(fileptr, "%s", words)> 0) {
count++;
}
words is char pointer while ' ' is char, *words equals to words[0]
usually we would define a new pointer as below
char *p = words;
while(*p != '\0' )
{
// using *p something you need to do
p++;
}
There is no string in C. Every string (/ literal) is an Array of chars. Use strcmp
Take care that using the array name words by itself implies a pointer to the first element in the array. If what you need is to compare 2 strings in C then the strcmp is what you are looking for.
You cannot compare strings in C. You should compare them character by character using the standard library function strcmp. Here's its prototype contained in the string.h header.
int strcmp(const char *s1, const char *s2);
The strcmp function compares the two strings s1 and s2. It returns an integer less than, equal to, or greater than zero if s1 is found, respectively, to be less than, to match, or be greater than s2.
The format string of fscanf " %s " (note the trailing and the leading space) will read and discard any number of whitespaces which it does anyway with the format string "%s". This means no whitespaces will be written into the buffer words by fscanf. fscanf will write only non-whitespace characters in words and returns when it encounters a whitespace. So, to count the number of words, just increase the counter for each successful fscanf call.
Also, your program should check for possible buffer overflow in scanf and fscanf calls. If the input string is too big for the buffer, then this would cause undefined behaviour and even causing crash due to segfault. You can guard against it by changing the format string. scanf("%63s", filename); means scanf will read from stdin until it encounters a whitespace and write at most 63 non-whitespace characters in the buffer filename and then add a terminating null byte at the end.
#include <stdio.h>
#include <string.h>
int main(void) {
// assuming max word length is 256
// +1 for the terminating null byte added by scanf
char words[256 + 1];
// assuming max file name length is 64
// +1 for the terminating null byte
char filename[64 + 1];
int count = 0; // counter for number of words
printf("Enter the file name: ");
scanf("%64s", filename);
FILE *fileptr;
fileptr = fopen(filename, "r");
if(fileptr == NULL)
printf("File not found!\n");
while((fscanf(fileptr, "%256s", words)) == 1)
count++;
printf("%s contains %d words.\n", filename, count);
return 0;
}

Parsing text in C

I have a file like this:
...
words 13
more words 21
even more words 4
...
(General format is a string of non-digits, then a space, then any number of digits and a newline)
and I'd like to parse every line, putting the words into one field of the structure, and the number into the other. Right now I am using an ugly hack of reading the line while the chars are not numbers, then reading the rest. I believe there's a clearer way.
Edit: You can use pNum-buf to get the length of the alphabetical part of the string, and use strncpy() to copy that into another buffer. Be sure to add a '\0' to the end of the destination buffer. I would insert this code before the pNum++.
int len = pNum-buf;
strncpy(newBuf, buf, len-1);
newBuf[len] = '\0';
You could read the entire line into a buffer and then use:
char *pNum;
if (pNum = strrchr(buf, ' ')) {
pNum++;
}
to get a pointer to the number field.
fscanf(file, "%s %d", word, &value);
This gets the values directly into a string and an integer, and copes with variations in whitespace and numerical formats, etc.
Edit
Ooops, I forgot that you had spaces between the words.
In that case, I'd do the following. (Note that it truncates the original text in 'line')
// Scan to find the last space in the line
char *p = line;
char *lastSpace = null;
while(*p != '\0')
{
if (*p == ' ')
lastSpace = p;
p++;
}
if (lastSpace == null)
return("parse error");
// Replace the last space in the line with a NUL
*lastSpace = '\0';
// Advance past the NUL to the first character of the number field
lastSpace++;
char *word = text;
int number = atoi(lastSpace);
You can solve this using stdlib functions, but the above is likely to be more efficient as you're only searching for the characters you are interested in.
Given the description, I think I'd use a variant of this (now tested) C99 code:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <ctype.h>
struct word_number
{
char word[128];
long number;
};
int read_word_number(FILE *fp, struct word_number *wnp)
{
char buffer[140];
if (fgets(buffer, sizeof(buffer), fp) == 0)
return EOF;
size_t len = strlen(buffer);
if (buffer[len-1] != '\n') // Error if line too long to fit
return EOF;
buffer[--len] = '\0';
char *num = &buffer[len-1];
while (num > buffer && !isspace((unsigned char)*num))
num--;
if (num == buffer) // No space in input data
return EOF;
char *end;
wnp->number = strtol(num+1, &end, 0);
if (*end != '\0') // Invalid number as last word on line
return EOF;
*num = '\0';
if (num - buffer >= sizeof(wnp->word)) // Non-number part too long
return EOF;
memcpy(wnp->word, buffer, num - buffer);
return(0);
}
int main(void)
{
struct word_number wn;
while (read_word_number(stdin, &wn) != EOF)
printf("Word <<%s>> Number %ld\n", wn.word, wn.number);
return(0);
}
You could improve the error reporting by returning different values for different problems.
You could make it work with dynamically allocated memory for the word portion of the lines.
You could make it work with longer lines than I allow.
You could scan backwards over digits instead of non-spaces - but this allows the user to write "abc 0x123" and the hex value is handled correctly.
You might prefer to ensure there are no digits in the word part; this code does not care.
You could try using strtok() to tokenize each line, and then check whether each token is a number or a word (a fairly trivial check once you have the token string - just look at the first character of the token).
Assuming that the number is immediately followed by '\n'.
you can read each line to chars buffer, use sscanf("%d") on the entire line to get the number, and then calculate the number of chars that this number takes at the end of the text string.
Depending on how complex your strings become you may want to use the PCRE library. At least that way you can compile a perl'ish regular expression to split your lines. It may be overkill though.
Given the description, here's what I'd do: read each line as a single string using fgets() (making sure the target buffer is large enough), then split the line using strtok(). To determine if each token is a word or a number, I'd use strtol() to attempt the conversion and check the error condition. Example:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
/**
* Read the next line from the file, splitting the tokens into
* multiple strings and a single integer. Assumes input lines
* never exceed MAX_LINE_LENGTH and each individual string never
* exceeds MAX_STR_SIZE. Otherwise things get a little more
* interesting. Also assumes that the integer is the last
* thing on each line.
*/
int getNextLine(FILE *in, char (*strs)[MAX_STR_SIZE], int *numStrings, int *value)
{
char buffer[MAX_LINE_LENGTH];
int rval = 1;
if (fgets(buffer, buffer, sizeof buffer))
{
char *token = strtok(buffer, " ");
*numStrings = 0;
while (token)
{
char *chk;
*value = (int) strtol(token, &chk, 10);
if (*chk != 0 && *chk != '\n')
{
strcpy(strs[(*numStrings)++], token);
}
token = strtok(NULL, " ");
}
}
else
{
/**
* fgets() hit either EOF or error; either way return 0
*/
rval = 0;
}
return rval;
}
/**
* sample main
*/
int main(void)
{
FILE *input;
char strings[MAX_NUM_STRINGS][MAX_STRING_LENGTH];
int numStrings;
int value;
input = fopen("datafile.txt", "r");
if (input)
{
while (getNextLine(input, &strings, &numStrings, &value))
{
/**
* Do something with strings and value here
*/
}
fclose(input);
}
return 0;
}

Resources