Parsing user input in c - c

I'm having trouble wrapping my mind around parsing user input in c. My task (homework) is to read user input, then parse in the same way BASH does, so delimiters are ' ', |, >, etc. My (wrong) solution so far uses strtok. I've been advised to use sscanf, but haven't been able to wrap my mind around how that will work for all cases of user input.
I'd love a strategy that will point me in the right direction. Here's what I have so far:
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#define MAX_LINE 80
int main ()
{
const char delim[]=" \\|\\>\\1>\\2>\\>>\\2>>\\&>\\<";
char* args[MAX_LINE/2 + 1];
char tok[MAX_LINE];
char* token;
printf("osh>");
fgets(tok, sizeof(tok), stdin);
token = strtok(tok,delim);
while (token != NULL)
{
printf("%s\n", token);
token = strtok(NULL, delim);
}
return 0;
}

Method 1)
You can use pointer arithmetic to locate the delimiter while still using strtok to extract the delimited strings. This seems to me the easiest solution but requires pointer arithmetic. Be sure you don't try to access 'tok' beyond the end of the array or before the array (by over-decrementing the pointer).
Example:
token = strtok(tok, delim);
char verb = *token--;
Method 2)
You could use sscanf in the same manner looking for strings, then single characters, then strings... and so forth till you hit the end of the line.
For either method you need to store the strings and delimiters somewhere and maintain the order so you can reconstruct the sequence.
Good luck.

Thanks for the help on this. I ended up going a different route entirely, basically keeping track of the contents of each index of the fgets result, then parsing from there. I didn't end up using any c-ish methods (i.e. strtok) to do the job.
Here's a sample snippet.
{
//integers correspond to ASCII values
LEN++;
if ((line[i+1] == 60) || (line[i+1] == 62) || (line[i+1] == 124) || (line[i+1] == 38) || (line[i+1] == 32) || (line[i+1] == 10))
{
memcpy(substring, &line[string_start], LEN);
substring[LEN] = '\0';
args[token_number] = malloc(strlen(substring) + 1);
strcpy(args[token_number], substring);
token_number++;
string_start = i+1;
LEN = 0;
}
i++;
}

Related

How to use strtok to get a word

I'm having some problem with my code. I need to use strtok() in c to output the words "Sing" and "Toy" (which are both in between the words "Due" and "De") in the string "Date WEEk Dae Due Toy De Dae i Date Due Sing De". I tried to use the if statement found in the code to explicitly output the words "Sing" and "Toy" but my code would not produce any output and it had no warnings during compilation. I'm only a beginner at C so please be patient with me. I heard that other functions such as strstr() might be able to do the same job as strtok() so if those other functions are much more convenient to use, do not hesitate to use those functions instead. Thank you.
Summary: I'm trying to get the words in between "Due" and "De" in the string above using strtok() and is it possible to do so or should I use another function?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(){
char string[]="Date WEEk Dae Due Toy De Dae i Date Due Sing De";
char*pch;
pch=strtok(string,"De");
while(pch!=NULL){
if((*(pch-1)=='a')&&(*(pch-2)=='u'))
printf("%s\n",pch);
pch=strtok(NULL,"De");
}
return 0;
}
Keep in mind that the second parameter of strtok() is a delimeter list:
C string containing the delimiter characters.
These can be different from one call to another.
They way it's now in your code, the token will be taken after each capital D and lower case e.
For the case mentioned in your description, it's more suitable to workaround the problem using strstr().
you should pass " " as second argument to strtok
If you want to print Sing, check if pch is not null and strcmp(pch,"Sing") == 0 then print Sing
Find "Due" followed by a space
char *due = strstr(string, "Due ");
Find "De" preceded by a space
char *de = strstr(string, " De");
Check for errors
if (!due || !de) exit(EXIT_FAILURE);
Print what is between
printf("%.*s\n", (int)(de - due - 4), due + 4);
Use strstr like this
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
int main(void){
char string[]="Date WEEk Dae Due Toy De Dae i Date Due Sing De";
char *pre_word = "Due", *post_word = "De";
size_t pre_word_len = strlen(pre_word), post_word_len = strlen(post_word);
char *p = string, *pre, *post;
while(pre = strstr(p, pre_word)){//find pre word
if((pre == string || isspace((unsigned char)pre[-1])) &&
isspace((unsigned char)pre[pre_word_len])){//word check
if(post = strstr(pre + pre_word_len, post_word)){//find post word
if(isspace((unsigned char)post[-1]) &&
(isspace((unsigned char)post[post_word_len]) || !post[post_word_len])){//word check
*post = 0;//The original string is changed
char word[32], dummy[2];
if(1==sscanf(pre + pre_word_len, "%31s %1s", word, dummy)){//There is one word between words
printf("'%s'\n", word);
}
}
p = post + post_word_len;//set next search position
} else {
break;//Since post_word does not exist, it ends loop.
}
}
}
return 0;
}

String split in C with strtok function

I'm trying to do split some strings by {white_space} symbol.
btw, there is a problem within some splits. which means, I want to split by {white_space} symbol but also quoted sub-strings.
example,
char *pch;
char str[] = "hello \"Stack Overflow\" good luck!";
pch = strtok(str," ");
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok(NULL, " ");
}
This will give me
hello
"Stack
Overflow"
good
luck!
But What I want, as you know,
hello
Stack Overflow
good
luck!
Any suggestion or idea please?
You'll need to tokenize twice. The program flow you currently have is as follows:
1) Search for space
2) Print all characters prior to space
3) Search for next space
4) Print all characters between last space, and this one.
You'll need to start thinking in a different matter, two layers of tokenization.
Search for Quotation Mark
On odd-numbered strings, perform your original program (search for spaces)
On even-numbered strings, print blindly
In this case, even numbered strings are (ideally) within quotes. ab"cd"ef would result in ab being odd, cd being even... etc.
The other side, is remembering what you need to do, and what you're actually looking for (in regex) is "[a-zA-Z0-9 \t\n]*" or, [a-zA-Z0-9]+. That means the difference between the two options, are whether it's separated by quotes. So separate by quotes, and identify from there.
Try altering your strategy.
Look at non-white space things, then when you find quoted string you can put it in one string value.
So, you need a function that examines characters, between white space. When you find '"' you can change the rules and hoover everything up to a matching '"'. If this function returns a TOKEN value and a value (the string matched) then what calls it, can decide to do the correct output. Then you have written a tokeniser, and there actually exist tools to generate them called "lexers" as they are used widely, to implement programming languages/config files.
Assuming nextc reads next char from string, begun by firstc( str) :
for (firstc( str); ((c = nextc) != NULL;) {
if (isspace(c))
continue;
else if (c == '"')
return readQuote; /* Handle Quoted string */
else
return readWord; /* Terminated by space & '"' */
}
return EOS;
You'll need to define return values for EOS, QUOTE and WORD, and a way to get the text in each Quote or Word.
Here's the code that works... in C
The idea is that you first tokenize the quote, since that's a priority (if a string is inside the quotes than we don't tokenize it, we just print it). And for each of those tokenized strings, we tokenize within that string on the space character, but we do it for alternate strings, because alternate strings will be in and out of the quotes.
#include <stdio.h>
#include <string.h>
#include <stdbool.h>
int main() {
char *pch1, *pch2, *save_ptr1, *save_ptr2;
char str[] = "hello \"Stack Overflow\" good luck!";
pch1 = strtok_r(str,"\"", &save_ptr1);
bool in = false;
while (pch1 != NULL) {
if(in) {
printf ("%s\n", pch1);
pch1 = strtok_r(NULL, "\"", &save_ptr1);
in = false;
continue;
}
pch2 = strtok_r(pch1, " ", &save_ptr2);
while (pch2 != NULL) {
printf ("%s\n",pch2);
pch2 = strtok_r(NULL, " ", &save_ptr2);
}
pch1 = strtok_r(NULL, "\"", &save_ptr1);
in = true;
}
}
References
Tokenizing multiple strings simultaneously
http://linux.die.net/man/3/strtok_r
http://www.cplusplus.com/reference/cstring/strtok/

Changing the extension of a passed filename

My function is passed a filename of the type
char *myFilename;
I want to change the existing extension to ".sav", or if there is no extension, simply add ".sav" to the end of the file. But I need to consider files named such as "myfile.ver1.dat".
Can anyone give me an idea on the best way to achieve this.
I was considering using a function to find the last "." and remove all characters after it and replace them with "sav". or if no "." is found, simple add ".sav" to the end of the string. But not sure how to do it as I get confused by the '\0' part of the string and whether strlen returns the whole string with '\0' or do I need to +1 to the string length after.
I want to eventual end up with a filename to pass to fopen().
May be something like this :
char *ptrFile = strrchr(myFilename, '/');
ptrFile = (ptrFile) ? myFilename : ptrFile+1;
char *ptrExt = strrchr(ptrFile, '.');
if (ptrExt != NULL)
strcpy(ptrExt, ".sav");
else
strcat(ptrFile, ".sav");
And then the traditional way , remove and rename
Here's something lazy I've whipped up, it makes minimum use of the standard library functions (maybe you'd like something that does?):
#include <stdio.h>
#include <string.h>
void change_type(char* input, char* new_extension, int size)
{
char* output = input; // save pointer to input in case we need to append a dot and add at the end of input
while(*(++input) != '\0') // move pointer to final position
;
while(*(--input) != '.' && --size > 0) // start going backwards until we encounter a dot or we go back to the start
;
// if we've encountered a dot, let's replace the extension, otherwise let's append it to the original string
size == 0 ? strncat(output, new_extension, 4 ) : strncpy(input, new_extension, 4);
}
int main()
{
char input[10] = "file";
change_type(input, ".bff", sizeof(input));
printf("%s\n", input);
return 0;
}
And it indeed prints file.bff. Please note that this handles extensions up to 3 chars long.
strlen returns the number of characters in the string but arrays are indexed from 0 so
filename [strlen(filename)]
is the terminating null.
int p;
for (p = strlen (filename) - 1; (p > 0) && (filename[p] != '.'); p--)
will loop to zero if no extension and stop at the correct spot otherwise.

Reading formatted strings from file into Array in C

I am new to the C programming language and trying to improve by solving problems from the Project Euler website using only C and its standard libraires. I have covered basic C fundamentals(I think), functions, pointers, and some basic file IO but now am running into some issues.
The question is about reading a text file of first names and calculating a "name score" blah blah, I know the algorithm I am going to use and have most of the program setup but just cannot figure out how to read the file correctly.
The file is in the format
"Nameone","Nametwo","billy","bobby","frank"...
I have searched and searched and tried countless things but cannot seem to read these as individual names into an array of strings(I think thats the right way to store them individually?) I have tried using sscanf/fscanf with %[^\",]. I have tried different combos of those functions and fgets, but my understanding of fgets is everytime I call it it will get a new line, and this is a text file with over 45,000 characters all on the same line.
I am unsure if I am running into problems with my misunderstanding of the scanf functions, or my misunderstanding with storing an array of strings. As far as the array of strings goes, I (think) I have realized that when I declare an array of strings it does not allocate memory for the strings themselves, something that I need to do. But I still cannot get anything to work.
Here is the code I have now to try to just read in some names I enter from the command line to test my methods.
This code works to input any string up to buffer size(100):
int main(void)
{
int i;
char input[100];
char* names[10];
printf("\nEnter up to 10 names\nEnter an empty string to terminate input: \n");
for(int i = 0; i < 10; i++)
{
int length = 0;
printf("%d: ", i);
fgets(input, 100, stdin);
length = (int)strlen(input);
input[length-1] = 0; // Delete newline character
length--;
if(length < 1)
{
break;
}
names[i] = malloc(length+1);
assert(names[i] != NULL);
strcpy(names[i], input);
}
}
However, I simply cannot make this work for reading in the formatted strings.
PLEASE advise me as to how to read it in with format. I have previously used sscanf on the input buffer and that has worked fine, but I dont feel like I can do that on a 45000+ char line? Am I correct in assuming this? Is this even an acceptable way to read strings into an array?
I apologize if this is long and/or not clear, it is very late and I am very frustrated.
Thank anyone and everyone for helping, and I am looking forward to finally becoming an active member on this site!
There are really two basic issues here:
Whether scanning string input is the proper strategy here. I would argue not because while it might work on this task you are going to run into more complicated scenarios where it too easily breaks.
How to handle a 45k string.
In reality you won't run into too many string of this size but it is nothing that a modern computer of any capacity can't easily handle. Insofar as this is for learning purposes then learn iteratively.
The easiest first approach is to fread() the entire line/file into an appropriately sized buffer and parse it yourself. You can use strtok() to break up the comma-delimited tokens and then pass the tokens to a function that strips the quotes and returns the word. Add the word to your array.
For a second pass you can do away with strtok() and just parse the string yourself by iterating over the buffer and breaking up the comma tokens yourself.
Last but not least you can write a version that reads smaller chunks of the file into a smaller buffer and parses them. This has the added complexity of handling multiple reads and managing the buffers to account for half-read tokens at the end of a buffer and so on.
In any case, break the problem into chunks and learn with each refinement.
EDIT
#define MAX_STRINGS 5000
#define MAX_NAME_LENGTH 30
char* stripQuotes(char *str, char *newstr)
{
char *temp = newstr;
while (*str)
{
if (*str != '"')
{
*temp = *str;
temp++;
}
str++;
}
return(newstr);
}
int main(int argc, char *argv[])
{
char fakeline[] = "\"Nameone\",\"Nametwo\",\"billy\",\"bobby\",\"frank\"";
char *token;
char namebuffer[MAX_NAME_LENGTH] = {'\0'};
char *name;
int index = 0;
char nameArray[MAX_STRINGS][MAX_NAME_LENGTH];
token = strtok(fakeline, ",");
if (token)
{
name = stripQuotes(token, namebuffer);
strcpy(nameArray[index++], name);
}
while (token != NULL)
{
token = strtok(NULL, ",");
if (token)
{
memset(namebuffer, '\0', sizeof(namebuffer));
name = stripQuotes(token, namebuffer);
strcpy(nameArray[index++], name);
}
}
return(0);
}
fscanf("%s", input) reads one token (a string surrounded by spaces) at a time. You can either scan the input until you encounter a specific "end-of-input" string, such as "!", or you can wait for the end-of-file signal, which is achieved by pressing "Ctrl+D" on a Unix console or by pressing "Ctrl+Z" on a Windows console.
The first option:
fscanf("%s", input);
if (input[0] == '!') {
break;
}
// Put input on the array...
The second option:
result = fscanf("%s", input);
if (result == EOF) {
break;
}
// Put input on the array...
Either way, as you read one token at a time, there are no limits on the size of the input.
Why not search the giant string for quote characters instead? Something like this:
#include <stdio.h>
#include <string.h>
int main(void)
{
char mydata[] = "\"John\",\"Smith\",\"Foo\",\"Bar\"";
char namebuffer[20];
unsigned int i, j;
int begin = 1;
unsigned int beginName, endName;
for (i = 0; i < sizeof(mydata); i++)
{
if (mydata[i] == '"')
{
if (begin)
{
beginName = i;
}
else
{
endName = i;
for (j = beginName + 1; j < endName; j++)
{
namebuffer[j-beginName-1] = mydata[j];
}
namebuffer[endName-beginName-1] = '\0';
printf("%s\n", namebuffer);
}
begin = !begin;
}
}
}
You find the first double quote, then the second, and then read out the characters in between to your name string. Then you process those characters as needed for the problem in question.

Postfix evaluation in C

I’m taking a course in C and we have to make a program for the classic Postfix evaluation problem. Now, I’ve already completed this problem in java, so I know that we have to use a stack to push the numbers into, then pop them when we get an operator, I think I’m fine with all of that stuff. The problem I have been having is scanning the postfix expression in C. In java it was easier because you could use charAt and you could use the parseInt command. However, I’m not aware of any similar commands in C. So could anyone explain a method to read each value from a string in the form:
4 9 * 0 - =
Where the equals is the signal of the end of the input.
Any help would be greatly appreciated and thank you in advance :)
Let's suppose you input is in an array of characters.
char input[] = "4 9 * 0 - =";
you can access individual characters by accessing each individual array element
if (input[4] == '*') /* deal with star */;
or you can use pointer arithmetic and parse from a different point in the input (remember to #include <stdio.h> for the prototype for `sscanf´)
if (sscanf(input + 2, "%d", &number) != 1) /* deal with error */;
Or, as suggested by Chris Lutz in a comment, use strtol (after the proper #include <stdlib.h>)
number = strtol(input + 2, &next, 10);
/* don't forget to check for errors! */
/* `next` now points to the character after the `long` at position 2 in the array */
C strings are arrays of chars: char[] or char*.
you can use a for loop to iterate it and get each characher by it's index:
for (int i = 0; i < strlen(yourString); i++)
{
char ch = yourString[i];
// ...
}
Also there is a function, strtok() that might be helpful here for tokenizing the string:
#include <string.h>
#define NULL (void*)0
char yourString[] = "4 9 * 0 - =";
char delimiters[] = " "; // could be " +*/-=" depending on your implementation
char *token = NULL;
token = strtok(yourString, delimiters);
while(token != NULL)
{
printf("current token is: %s\n", token);
// do what ever you want with the token
token = strtok(NULL, delimiters); // next token
}
You can also know with sscanf how many items have been read (the counter of well read data items is the result of sscanf) and what is the relative position (using the %n format specifier).
so you could also code
int pos = 0;
int endpos = 0;
int val = 0;
if (sscanf(input + pos, "%d %n", &val, &endpos) >= 1) {
// val has been read as an integer, handle it
stack[top++] = val;
pos += endpos; // skip to next token in input
}
There are many more ways of doing that. You might want to read about lexers and parsers, e.g. with flex and bison, or antlr, etc.

Resources