Scanning in more than one word in C

Scanning in more than one word in C - c

I am trying to make a program which needs scans in more than one word, and I do not know how to do this with an unspecified length.
My first port of call was scanf, however this only scans in one word (I know you can do scanf("%d %s",temp,temporary);, but I do not know how many words it needs), so I looked around and found fgets. One issue with this is I cannot find how to make it move to the next code, eg
scanf("%99s",temp);
printf("\n%s",temp);
if (strcmp(temp,"edit") == 0) {
editloader();
}
would run editloader(), while:
fgets(temp,99,stdin);
while(fgets(temporary,sizeof(temporary),stdin))
{
sprintf(temp,"%s\n%s",temp,temporary);
}
if (strcmp(temp,"Hi There")==0) {
editloader();
}
will not move onto the strcmp() code, and will stick on the original loop. What should I do instead?

I would scan in each loop a word with scanf() and then copy it with strcpy() in the "main" string.

maybe you can use getline method ....I have used it in vc++ but if it exists in standard c library too then you are good to go
check here http://www.daniweb.com/software-development/c/threads/253585
http://www.cplusplus.com/reference/iostream/istream/getline/
Hope you find what you are looking for

I use this to read from stdin and get the same format that you would get by passing as arguments... so that you can have spaces in words and quoted words within a string. If you want to read from a specific file, just fopen it and change the fgets line.
#include <stdio.h>
void getargcargvfromstdin(){
char s[255], **av = (char **)malloc(255 * sizeof(char *));
unsigned char i, pos, ac;
for(i = 0; i < 255; i++)
av[i] = (char *)malloc(255 * sizeof(char));
enum quotes_t{QUOTED=0,UNQUOTED}quotes=UNQUOTED;
while (fgets(s,255,stdin)){
i=0;pos=0;ac=0;
while (i<strlen(s)) {
/* '!'=33, 'ÿ'=-1, '¡'=-95 outside of these are non-printables */
if ( quotes && ((s[i] < 33) && (s[i] > -1) || (s[i] < -95))){
av[ac][pos] = '\0';
if (av[ac][0] != '\0') ac++;
pos = 0;
}else{
if (s[i]=='"'){ /* support quoted strings */
if (pos==0){
quotes=QUOTED;
}else{ /* support \" within strings */
if (s[i-1]=='\\'){
av[ac][pos-1] = '"';
}else{ /* end of quoted string */
quotes=UNQUOTED;
}
}
}else{ /* printable ascii characters */
av[ac][pos] = s[i];
pos++;
}
}
i++;
}
//your code here ac is the number of words and av is the array of words
}
}

If it exceeds the buffer size you simply can't do it.
You will have to do multiple loops
the maximum size you can scan with scanf() will come from
char *name;
scanf("%s",name);
reed this
http://sekrit.de/webdocs/c/beginners-guide-away-from-scanf.html

Related

Replacing `goto` with a different programming construct

I m trying to do this little programm with defensive programming but its more than difficult for me to handle this avoiding the Loop-Goto as i know that as BAD programming. I had try with while and do...while loop but in one case i dont have problem. Problem begins when i m going to make another do...while for the second case ("Not insert space or click enter button"). I tried and nested do...while but here the results was more complicated.
#include <ctype.h>
#include <stdlib.h>
#include <string.h>
int main()
{
int i;
int length;
char giventext [25];
Loop:
printf("String must have 25 chars lenght:\n");
gets(giventext);
length = strlen(giventext);
if (length > 25) {
printf("\nString has over %d chars.\nMust give a shorter string\n", length);
goto Loop;
}
/* Here i trying to not give space or nothing*/
if (length < 1) {
printf("You dont give anything as a string.\n");
goto Loop;
} else {
printf("Your string has %d\n",length);
printf("Letter in lower case are: \n");
for (i = 0; i < length; i++) {
if (islower(giventext[i])) {
printf("%c",giventext[i]);
}
}
}
return 0;
}

Note that your code is not defensive at all. You have no way to avoid a buffer overflow because,
you check for the length of the string after it has been input to your program so after the buffer overflow has already occurred and
you used gets() which doesn't check input length and thus is very prone to buffer overflow.
Use fgets() instead and just discard extra characters.
I think you need to understand that strlen() doesn't count the number of characters of input but instead the number of characters in a string.
If you want to ensure that there are less than N characters inserted then
int
readinput(char *const buffer, int maxlen)
{
int count;
int next;
fputc('>', stdout);
fputc(' ', stdout);
count = 0;
while ((next = fgetc(stdin)) && (next != EOF) && (next != '\n')) {
// We need space for the terminating '\0';
if (count == maxlen - 1) {
// Discard extra characters before returning
// read until EOF or '\n' is found
while ((next = fgetc(stdin)) && (next != EOF) && (next != '\n'))
;
return -1;
}
buffer[count++] = next;
}
buffer[count] = '\0';
return count;
}
int
main(void)
{
char string[8];
int result;
while ((result = readinput(string, (int) sizeof(string))) == -1) {
fprintf(stderr, "you cannot input more than `%d' characters\n",
(int) sizeof(string) - 1);
}
fprintf(stdout, "accepted `%s' (%d)\n", string, result);
}
Note that by using a function, the flow control of this program is clear and simple. That's precisely why goto is discouraged, not because it's an evil thing but instead because it can be misused like you did.

Try using functions that label logical steps that your program needs to execute:
char * user_input() - returns an input from the user as a pointer to a char (using something other than get()! For example, look at scanf)
bool validate_input(char * str_input) - takes the user input from the above function and performs checks, such as validate the length is between 1 and 25 characters.
str_to_lower(char * str_input) - if validate_input() returns true you can then call this function and pass it the user input. The body of this function can then print the user input back to console in lower case. You could use the standard library function tolower() here to lower case each character.
The body of your main function will then be much simpler and perform a logical series of steps that tackle your problem. This is the essence of defensive programming - modularising your problem into separate steps that are self contained and easily testable.
A possible structure for the main function could be:
char * user_input();
bool validate_input(char *);
void str_to_lower(char *);
int main()
{
char * str_input = user_input();
//continue to get input from the user until it satisfies the requirements of 'validate_input()'
while(!validate_input(str_input)) {
str_input = user_input();
}
//user input now satisfied 'validate_input' so lower case and print it
str_to_lower(str_input);
return 0;
}

How does C know the end of my string?

I have a program in which I wanted to remove the spaces from a string. I wanted to find an elegant way to do so, so I found the following (I've changed it a little so it could be better readable) code in a forum:
char* line_remove_spaces (char* line)
{
char *non_spaced = line;
int i;
int j = 0;
for (i = 0; i <= strlen(line); i++)
{
if ( line[i] != ' ' )
{
non_spaced[j] = line[i];
j++;
}
}
return non_spaced;
}
As you can see, the function takes a string and, using the same allocated memory space, selects only the non-spaced characters. It works!
Anyway, according to Wikipedia, a string in C is a "Null-terminated string". I always thought this way and everything was good. But the problem is: we put no "null-character" in the end of the non_spaced string. And somehow the compiler knows that it ends at the last character changed by the "non_spaced" string. How does it know?

This does not happen by magic. You have in your code:
for (i = 0; i <= strlen(line); i++)
^^
The loop index i runs till strlen(line) and at this index there is a nul character in the character array and this gets copied as well. As a result your end result has nul character at the desired index.
If you had
for (i = 0; i < strlen(line); i++)
^^
then you had to put the nul character manually as:
for (i = 0; i < strlen(line); i++)
{
if ( line[i] != ' ' )
{
non_spaced[j] = line[i];
j++;
}
}
// put nul character
line[j] = 0;

Others have answered your question already, but here is a faster, and perhaps clearer version of the same code:
void line_remove_spaces (char* line)
{
char* non_spaced = line;
while(*line != '\0')
{
if(*line != ' ')
{
*non_spaced = *line;
non_spaced++;
}
line++;
}
*non_spaced = '\0';
}

The loop uses <= strlen so you will copy the null terminator as well (which is at i == strlen(line)).

You could try it. Debug it while it is processing a string containing only one space: " ". Watch carefully what happens to the index i.

How do you know that it "knows"? The most likely scenario is that you're simply having luck with your undefined behavior, and that there is a '\0'-character after the valid bytes of line end.
It's also highly likely that you're not seeing spaces at the end, which might be printed before hitting the stray "lucky '\0'".
A few other points:
There's no need to write this using indexing.
It's not very efficient to call strlen() on each loop iteration.
You might want to use isspace() to remove more whitespace characters.
Here's how I would write it, using isspace() and pointers:
char * remove_spaces(char *str)
{
char *ret = str, *put = str;
for(; *str != '\0'; str++)
{
if(!isspace((unsigned char) *str)
*put++ = *str;
}
*put = '\0';
return ret;
}
Note that this does terminate the space-less version of the string, so the returned pointer is guaranteed to point at a valid string.

The string parameter of your function is null-terminated, right?
And in the loop, the null character of the original string get also copied into the non spaced returned string. So the non spaced string is actually also null-terminated!
For your compiler, the null character is just another binary data that doesn't get any special treatment, but it's used by string APIs as a handy character to easily detect end of strings.

If you use the <= strlen(line), the length of the strlen(line) include the '\0' so your program can work. You can use debug and run analysis.

Reading a file in C

I have an input file I need to extract words from. The words can only contain letters and numbers so anything else will be treated as a delimiter. I tried fscanf,fgets+sscanf and strtok but nothing seems to work.
while(!feof(file))
{
fscanf(file,"%s",string);
printf("%s\n",string);
}
Above one clearly doesn't work because it doesn't use any delimiters so I replaced the line with this:
fscanf(file,"%[A-z]",string);
It reads the first word fine but the file pointer keeps rewinding so it reads the first word over and over.
So I used fgets to read the first line and use sscanf:
sscanf(line,"%[A-z]%n,word,len);
line+=len;
This one doesn't work either because whatever I try I can't move the pointer to the right place. I tried strtok but I can't find how to set delimitters
while(p != NULL) {
printf("%s\n", p);
p = strtok(NULL, " ");
This one obviously take blank character as a delimitter but I have literally 100s of delimitters.
Am I missing something here becasue extracting words from a file seemed a simple concept at first but nothing I try really works?

Consider building a minimal lexer. When in state word it would remain in it as long as it sees letters and numbers. It would switch to state delimiter when encountering something else. Then it could do an exact opposite in the state delimiter.
Here's an example of a simple state machine which might be helpful. For the sake of brevity it works only with digits. echo "2341,452(42 555" | ./main will print each number in a separate line. It's not a lexer but the idea of switching between states is quite similar.
#include <stdio.h>
#include <string.h>
int main() {
static const int WORD = 1, DELIM = 2, BUFLEN = 1024;
int state = WORD, ptr = 0;
char buffer[BUFLEN], *digits = "1234567890";
while ((c = getchar()) != EOF) {
if (strchr(digits, c)) {
if (WORD == state) {
buffer[ptr++] = c;
} else {
buffer[0] = c;
ptr = 1;
}
state = WORD;
} else {
if (WORD == state) {
buffer[ptr] = '\0';
printf("%s\n", buffer);
}
state = DELIM;
}
}
return 0;
}
If the number of states increases you can consider replacing if statements checking the current state with switch blocks. The performance can be increased by replacing getchar with reading a whole block of the input to a temporary buffer and iterating through it.
In case of having to deal with a more complex input file format you can use lexical analysers generators such as flex. They can do the job of defining state transitions and other parts of lexer generation for you.

Several points:
First of all, do not use feof(file) as your loop condition; feof won't return true until after you attempt to read past the end of the file, so your loop will execute once too often.
Second, you mentioned this:
fscanf(file,"%[A-z]",string);
It reads the first word fine but the file pointer keeps rewinding so it reads the first word over and over.
That's not quite what's happening; if the next character in the stream doesn't match the format specifier, scanf returns without having read anything, and string is unmodified.
Here's a simple, if inelegant, method: it reads one character at a time from the input file, checks to see if it's either an alpha or a digit, and if it is, adds it to a string.
#include <stdio.h>
#include <ctype.h>
int get_next_word(FILE *file, char *word, size_t wordSize)
{
size_t i = 0;
int c;
/**
* Skip over any non-alphanumeric characters
*/
while ((c = fgetc(file)) != EOF && !isalnum(c))
; // empty loop
if (c != EOF)
word[i++] = c;
/**
* Read up to the next non-alphanumeric character and
* store it to word
*/
while ((c = fgetc(file)) != EOF && i < (wordSize - 1) && isalnum(c))
{
word[i++] = c;
}
word[i] = 0;
return c != EOF;
}
int main(void)
{
char word[SIZE]; // where SIZE is large enough to handle expected inputs
FILE *file;
...
while (get_next_word(file, word, sizeof word))
// do something with word
...
}

I would use:
FILE *file;
char string[200];
while(fscanf(file, "%*[^A-Za-z]"), fscanf(file, "%199[a-zA-Z]", string) > 0) {
/* do something with string... */
}
This skips over non-letters and then reads a string of up to 199 letters. The only oddness is that if you have any 'words' that are longer than 199 letters they'll be split up into multiple words, but you need the limit to avoid a buffer overflow...

What are your delimiters? The second argument to strtok should be a string containing your delimiters, and the first should be a pointer to your string the first time round then NULL afterwards:
char * p = strtok(line, ","); // assuming a , delimiter
printf("%s\n", p);
while(p)
{
p = strtok(NULL, ",");
printf("%S\n", p);
}

String manipulations

Is there a proper way to just copy a part of a string after a certain point.
Party City 1422 Evergreen Street
I use strpbrk() to copy the name out, I could always just tokenize it by white space but is there a string process or technique where I can copy out a specific section of a string besides from the beginning like copy just [1422 Evergreen Street] or delete the first portion of the string?

If you want to specify it by starting position and length, you can always use strncpy and a bit of pointer arithmetic.
EDIT: When you know the starting string you can use
char *pos = strstr(src, "1422");
strcpy(dst, pos);

If you know the first and last characters' indexes of the substring you want to pick, you should do this with strncpy. See the following snippet to copy substringLength characters from the inputStr string at the given startIndex.
char * inputStr;
char * outputStr;
strncpy(outputStr, inputStr + startIndex, substringLength);

If you want to split at the location of a particular string, you can do something like this:
#define MAX_STRING 1024
int main() {
char myleftBuffer[MAX_STRING]="";
char myrightBuffer[MAX_STRING]="";
char mystring[]="Party City 1422 Evergreen Street";
char *start = strstr(mystring, "1422");
if(start) {
strcpy(myrightBuffer, start);
strncpy(myleftBuffer, mystring, (start - mystring));
}
printf("%s -> %s\n", myleftBuffer, myrightBuffer);
return;
}
Which outputs:
Party City -> 1422 Evergreen Street

Actually, strncpy is not a particularly good choice for the task at hand. It always pads your value out to occupy the entire destination, which is generally pretty wasteful (it was originally designed for putting file names into the Unix file system; it's good for that, but not really much else).
I think I'd use sscanf. Assuming we always want to copy from the first digit to the end of the string, you could do something like this:
char street_name[256];
sscanf(input_buffer, "%*[^0-9]%255[^\n]", street_name);
FWIW, the %*[^0-9] part skips over characters until it reaches something in the range 0-9 (yes, I know it looks like a regex, but scanf and company support it too). The * in it means to scan but not assign what it finds. The %255[^\n] means to read and assign until the next newline in the input, or up to 255 characters, whichever comes first.

int split_at(const char *in, const char *match, char *buf, size_t len)
{
char *pos;
if( (pos = strstr(in, match)) == NULL )
return -1; // No match
else if( pos == in )
return 0; // match is empty
if( strlcpy(buf, pos, len) >= len )
fprintf(stderr, "WARNING: match truncated: %s", buf);
return 1;
}

Probably impossible in the general case, and you would do better to get the input in seperate fields, but if thats not a option, something the following should work:
size_t street_extract(char* ret,size_t retsz,char* addr)
{
size_t i,nwrote;
for(i=0; addr[i] ;i++)
{
if(addr[i]!=' ') continue; /* only check at start of word */
i++;
if('0' < addr[i] && addr[i] < '9') break; /* found street number */
}
if(!addr[i]) return -1; /* not found */
for(nwrote=0; addr[i+nwrote] && nwrote < retsz-1 ;nwrote++)
{
ret[nwrote] = addr[i+nwrote];
}
ret[nwrote] = 0;
while(addr[i+nwrote]) nwrote++;
return nwrote; /* result is nwrote characters in length */
}
modify and error-check as needed.

Parsing text in C

I have a file like this:
...
words 13
more words 21
even more words 4
...
(General format is a string of non-digits, then a space, then any number of digits and a newline)
and I'd like to parse every line, putting the words into one field of the structure, and the number into the other. Right now I am using an ugly hack of reading the line while the chars are not numbers, then reading the rest. I believe there's a clearer way.

Edit: You can use pNum-buf to get the length of the alphabetical part of the string, and use strncpy() to copy that into another buffer. Be sure to add a '\0' to the end of the destination buffer. I would insert this code before the pNum++.
int len = pNum-buf;
strncpy(newBuf, buf, len-1);
newBuf[len] = '\0';
You could read the entire line into a buffer and then use:
char *pNum;
if (pNum = strrchr(buf, ' ')) {
pNum++;
}
to get a pointer to the number field.

fscanf(file, "%s %d", word, &value);
This gets the values directly into a string and an integer, and copes with variations in whitespace and numerical formats, etc.
Edit
Ooops, I forgot that you had spaces between the words.
In that case, I'd do the following. (Note that it truncates the original text in 'line')
// Scan to find the last space in the line
char *p = line;
char *lastSpace = null;
while(*p != '\0')
{
if (*p == ' ')
lastSpace = p;
p++;
}
if (lastSpace == null)
return("parse error");
// Replace the last space in the line with a NUL
*lastSpace = '\0';
// Advance past the NUL to the first character of the number field
lastSpace++;
char *word = text;
int number = atoi(lastSpace);
You can solve this using stdlib functions, but the above is likely to be more efficient as you're only searching for the characters you are interested in.

Given the description, I think I'd use a variant of this (now tested) C99 code:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <ctype.h>
struct word_number
{
char word[128];
long number;
};
int read_word_number(FILE *fp, struct word_number *wnp)
{
char buffer[140];
if (fgets(buffer, sizeof(buffer), fp) == 0)
return EOF;
size_t len = strlen(buffer);
if (buffer[len-1] != '\n') // Error if line too long to fit
return EOF;
buffer[--len] = '\0';
char *num = &buffer[len-1];
while (num > buffer && !isspace((unsigned char)*num))
num--;
if (num == buffer) // No space in input data
return EOF;
char *end;
wnp->number = strtol(num+1, &end, 0);
if (*end != '\0') // Invalid number as last word on line
return EOF;
*num = '\0';
if (num - buffer >= sizeof(wnp->word)) // Non-number part too long
return EOF;
memcpy(wnp->word, buffer, num - buffer);
return(0);
}
int main(void)
{
struct word_number wn;
while (read_word_number(stdin, &wn) != EOF)
printf("Word <<%s>> Number %ld\n", wn.word, wn.number);
return(0);
}
You could improve the error reporting by returning different values for different problems.
You could make it work with dynamically allocated memory for the word portion of the lines.
You could make it work with longer lines than I allow.
You could scan backwards over digits instead of non-spaces - but this allows the user to write "abc 0x123" and the hex value is handled correctly.
You might prefer to ensure there are no digits in the word part; this code does not care.

You could try using strtok() to tokenize each line, and then check whether each token is a number or a word (a fairly trivial check once you have the token string - just look at the first character of the token).

Assuming that the number is immediately followed by '\n'.
you can read each line to chars buffer, use sscanf("%d") on the entire line to get the number, and then calculate the number of chars that this number takes at the end of the text string.

Depending on how complex your strings become you may want to use the PCRE library. At least that way you can compile a perl'ish regular expression to split your lines. It may be overkill though.

Given the description, here's what I'd do: read each line as a single string using fgets() (making sure the target buffer is large enough), then split the line using strtok(). To determine if each token is a word or a number, I'd use strtol() to attempt the conversion and check the error condition. Example:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
/**
* Read the next line from the file, splitting the tokens into
* multiple strings and a single integer. Assumes input lines
* never exceed MAX_LINE_LENGTH and each individual string never
* exceeds MAX_STR_SIZE. Otherwise things get a little more
* interesting. Also assumes that the integer is the last
* thing on each line.
*/
int getNextLine(FILE *in, char (*strs)[MAX_STR_SIZE], int *numStrings, int *value)
{
char buffer[MAX_LINE_LENGTH];
int rval = 1;
if (fgets(buffer, buffer, sizeof buffer))
{
char *token = strtok(buffer, " ");
*numStrings = 0;
while (token)
{
char *chk;
*value = (int) strtol(token, &chk, 10);
if (*chk != 0 && *chk != '\n')
{
strcpy(strs[(*numStrings)++], token);
}
token = strtok(NULL, " ");
}
}
else
{
/**
* fgets() hit either EOF or error; either way return 0
*/
rval = 0;
}
return rval;
}
/**
* sample main
*/
int main(void)
{
FILE *input;
char strings[MAX_NUM_STRINGS][MAX_STRING_LENGTH];
int numStrings;
int value;
input = fopen("datafile.txt", "r");
if (input)
{
while (getNextLine(input, &strings, &numStrings, &value))
{
/**
* Do something with strings and value here
*/
}
fclose(input);
}
return 0;
}