Reading a formatted input using scanf - c

I want read from stdin some variables with their values using scanf.The input is formatted as below:
MY_VARIABLE_BEGIN
var1
var2
...
MY_VARIABLE_END
MY_VALUES_BEGIN
val1
val2
...
MY_VALUES_END
The input is composed of 2 parts:
part 1:Name of the variables this part is delimited by MY_VARIABLE_BEGIN ,MY_VARIABLE_END
part 2:The values of each variable this part is delimited by MY_VALUES_BEGIN, MY_VALUES_BEGIN
The problem is that i don't know the number of the variables and their values.
Can any body help me find the right format to pass to scanf function,or if there is any other solution to solve the problem?
Example of input
MY_VARIABLE_BEGIN
var1
var2
MY_VARIABLE_END
MY_VALUES_BEGIN
1
5
MY_VALUES_END
I should read 2 variables var1 and var2, var1=1 and var2=5

You can try this
char line[256];
fgets(line, sizeof(line), stdin);
if (strcmp(line, "MY_VARIABLE_BEGIN") {
do {
fgets(line, sizeof(line), stdin);
// . . . do something with the line
} while (strcmp(line, "MY_VARIABLE_END"));
}
Not sure if it'll work.

Doing it with scanf is a pain. Why not use a regular expression from C?
Here's a complete working program to show how easy it can be.
Start by reading all the data into a single string, data. I'm just using a constant.
Compile your pattern with regcomp, then apply it with regexec to your string.
It returns an array of matched groups which correspond to the (.*?) parts of the pattern.
Group 0 is of no interest in this example as it is just the entire data.
For the other 2 groups, you get the indexes in the string of the start and end of the match.
Use strndup() to copy these. Use strtok to split this dup on the newline \n character.
You have in ptr at each point each var and value.
/* regex example. meuh on stackoverflow */
#include <stdlib.h>
#include <sys/types.h>
#include <regex.h>
#include <errno.h>
#include <string.h>
#include <stdio.h>
void pexit(char *str){
extern int errno;
perror(str);
exit(errno);
}
#define NUMMATCH (1+2) /* max num matching capture groups in pattern */
main(int argc, char **argv){
regex_t myexpn;
regmatch_t matches[NUMMATCH] = {0};
int rc,i;
char *data = "\n\
MY_VARIABLE_BEGIN\n\
var1 \n\
var2\n\
...\n\
MY_VARIABLE_END\n\
MY_VALUES_BEGIN\n\
val1\n\
val2\n\
...\n\
MY_VALUES_END\n\
";
char *delim = "\n";
char *pattern = "\\s*MY_VARIABLE_BEGIN\\s*(.*?)MY_VARIABLE_END.*?MY_VALUES_BEGIN\\s*(.*?)MY_VALUES_END";
/* need REG_EXTENDED to use () in pattern else \\(\\) */
rc = regcomp(&myexpn, pattern, REG_EXTENDED);
if(rc!=0)pexit("regcomp");
rc = regexec(&myexpn, data, NUMMATCH, matches, 0);
if(rc==REG_NOMATCH)printf("no match\n");
else{
for(i = 1;i<NUMMATCH;i++){ /* ignore group 0 which is whole match */
if(matches[i].rm_so!=-1){
char *dup = strndup(data+matches[i].rm_so, matches[i].rm_eo-matches[i].rm_so);
printf(" match %d %d..%d \"%s\"\n",i, matches[i].rm_so, matches[i].rm_eo, dup);
char *ptr = strtok(dup, delim);
while(ptr){
printf(" token: %s\n",ptr);
ptr = strtok(NULL, delim);
}
free(dup);
}
}
}
regfree(&myexpn);
}
This prints out:
match 1 19..34 "var1
var2
...
"
token: var1
token: var2
token: ...
match 2 66..80 "val1
val2
...
"
token: val1
token: val2
token: ...

Related

Reading particular character from file

I am trying to make a function that reads a text file which contains data and assign it to a variable. However some lines start with $ which need to be ignored. For example:
$ Monday test results
10 12
$ Tuesday test results
4
This is what I have so far which just prints out:
10 12
4
The code that does this is:
#include <stdio.h>
#include <stdlib.h>
void read_data(){
FILE* f;
if (f = fopen("testdata.txt", "r")) {
char line[100];
while (!feof(f)) {
fgets(line, 100, f);
if (line[0] == '$') {
continue;
} else{
puts(line);
}
}
} else {
exit(1);
}
fclose(f);
}
void main(){
read_data();
return 0;
}
I have tried fgetc and have googled extensively but am still stuck ;(
**Edits
Added #include and main
What I am asking is how to assign like a = 10, b = 12, c = 4. Had troubles since using fgets is for lines. Tried fgetc but it would only ignore the actual $ sign not the whole line that the $ is on
C string.h library function - strtok()
char *strtok(char *str, const char *delim)
str − The contents of this string are modified and broken into smaller strings (tokens).
delim − This is the C string containing the delimiters. These may vary from one call to another.
This function returns a pointer to the first token found in the string. A null pointer is returned if there are no tokens left to retrieve.
Copied from: https://www.tutorialspoint.com/c_standard_library/c_function_strtok.htm
#include <string.h>
#include <stdio.h>
int main () {
char str[80] = "This is - www.tutorialspoint.com - website";
const char s[2] = "-";
char *token;
/* get the first token */
token = strtok(str, s);
/* walk through other tokens */
while( token != NULL ) {
printf( " %s\n", token );
token = strtok(NULL, s);
}
return(0);
}
Output:
This is
www.tutorialspoint.com
website

C - Split string with repeated delimiter char into 2 substrings

I'm making a very simple C program that simulates the export command, getting an input with fgets().
Input example:
KEY=VALUE
Has to be converted to:
setenv("KEY", "VALUE", 1);
That's easy to solve with something similar to this code:
key = strtok(aux, "=");
value = strtok(NULL, "=");
The problem comes when the user input a value that start with one or several equals = characters. For example:
KEY===VALUE
This should be converted to:
setenv("KEY", "==VALUE", 1);
But with my current code it is converted to:
setenv("KEY", NULL, 1);
How I can solve this?
Thanks in advice.
Your second strtok() should not use = as the delimiter. You would only do that if there were another = that ended the value. But the value ends at the end of the string. Use an empty delimiter for this part.
key = strtok(aux, "=");
value = strtok(NULL, "");
strtok is probably overkill (and non-reentrant) when it's just one token. This will do,
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int main(int argc, char **argv) {
char *key, *equals, *value;
if(argc != 2 || !(equals = strchr(key = argv[1], '=')))
return fprintf(stderr, "KEY=VALUE\n"), EXIT_FAILURE;
value = equals + 1;
*equals = '\0';
printf("key: <%s>; value: <%s>.\n", key, value);
return EXIT_SUCCESS;
}
Although strtok is probably easier to read. One may try strsep, but it is GNU C.

strtok: How to store tokens in two different buffers

I have a string containing datatypes and addresses of variables. These values are separated by "///" and they are alternating (type /// address /// type /// address ...). The amount of these tuples is not fixed and can vary from execution to execution.
Now my problem is how to process the string in a loop, as strtok needs to be called first with the original string and then with the NULL parameter but in the loop it has to be called twice. So after the first loop strtok is called three times which leads to an uneven count of strtok executions whereas it should be an even count. I tried to solve this problem by processing the first tuple outside the loop (because strtok has to be called with the original string) and process the remaining tuples inside the loop.
char mystring[128];
char seperator[] = "///";
char *part;
int type [128];
int address [128];
number_of_variables = 0;
part = strtok(mystring, seperator);
type[number_of_variables] = (int) atoi(part);
part = strtok(NULL, seperator);
address[number_of_variables] = (int)strtol(part, NULL, 16);
while(part != NULL){
part = strtok(NULL, seperator);
type[number_of_variables] = (int) atoi(part);
part = strtok(NULL, seperator);
address[number_of_variables] = (int)strtol(part, NULL, 16);
number_of_variables++;
}
So now I have an even count of strtok executions but if my strings contains for example 2 tuples it will enter the loop for a second time so strtok is called for a fifth time which causes the program to crash as atoi() gets a bad pointer.
EDIT:
Example for mystring:
"1///0x37660///2///0x38398"
1 and 2 are type identifiers for the further program.
I can suggest the following loop as it is shown in the demonstrative program below.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main(void)
{
char mystring[128] = "1///0x37660///2///0x38398";
char separator[] = "/ ";
int type [128];
int address [128];
size_t number_of_variables = 0;
for ( char *part = strtok( mystring, separator ); part; part = strtok( NULL, separator ) )
{
type[number_of_variables] = atoi(part);
part = strtok( NULL, separator );
address[number_of_variables] = part ? (int)strtol(part, NULL, 16) : 0;
++number_of_variables;
}
for ( size_t i = 0; i < number_of_variables; i++ )
{
printf( "%d\t%x\n", type[i], address[i] );
}
return 0;
}
The program output is
1 37660
2 38398
You can write a robust and fast parser, that is guaranteed to work and has no bugs like this
File: lexer.l
%{
#include <stdio.h>
#include "parser.tab.h"
int yyerror(const char *const message);
%}
%option noyywrap
%x IN_ADDRESS
DECIMAL [0-9]+
HEX "0x"[a-fA-F0-9]+
DELIMITER "///"
%%
<*>{DELIMITER} { return DELIMITER; }
<INITIAL>{DECIMAL} {
char *endptr;
// Make the lexer know that we are expecting a
// hex number
BEGIN(IN_ADDRESS);
// Asign the value to use by bison
yylval = strtol(yytext, &endptr, 10);
// Check conversion's success
if (*endptr != '\0')
return ERROR;
return TYPE;
}
<IN_ADDRESS>{HEX} {
char *endptr;
// Restore the initial state
BEGIN(INITIAL);
// Asign the value to use by bison
yylval = strtol(yytext, &endptr, 16);
// Check conversion's success
if (*endptr != '\0')
return ERROR;
return ADDRESS;
}
%%
File: parser.y
%{
#include <stdio.h>
extern int yylex();
extern FILE *yyin;
int yyerror(const char *const message);
#define YYSTYPE int
%}
%token TYPE
%token DELIMITER
%token ADDRESS
%token ERROR
%%
program:
| program statement
;
command: TYPE DELIMITER ADDRESS {
fprintf(stdout, "type %d, address 0x%08x\n", $1, $3);
}
;
statement: command
| statement DELIMITER command;
;
%%
int
yyerror(const char *const message)
{
return fprintf(stdout, "error: %s\n", message);
}
int
main(void)
{
yyin = fopen("program.txt", "r");
if (yyin == NULL)
return -1;
yyparse();
}
File: program.txt
1///0x37660///2///0x38398
Compiling this with gcc, bison and flex is rather simple
bison -d parser.y
flex lexer.l
gcc -Wno-unused-function -Wall -Werror lex.yy.c parser.tab.c -o parserparser
Of course, this program needs some tweaking and adjusting it to your needs should be straightforward.
Just find a simple tutorial on bison and flex to help you fully understand this code.

Segmentation fault: 11 while trying to parse string

I'm trying to parse an input string into a command string and an array of arguments strings.
I'm having some issue using strtok and strcpy, I think that my command string is not being null terminated properly which is leading to the seg fault.
#include <stdio.h>
#include <string.h>
#define delims " \t\r\n"
int main() {
char input[] = "ls -a -f";
char *buffer;
char command[256];
char arguments[256][256];
int i = 0, j;
buffer = strtok(input, delims);
strcpy(command, buffer);
printf("Command: %s\n\r", command);
while (buffer != NULL)
{
buffer = strtok(NULL, delims);
strcpy(arguments[++i], buffer);
}
buffer = strtok(NULL, delims);
for (j = 0; j < i; ++i)
{
printf("Argument[%d]: %s", j, arguments[j]);
}
return 0;
}
Current Output:
Command: ls
Segmentation fault: 11
Expected Output:
Command: ls
Argument[0]: -a
Argument[1]: -f
I don't pretend to be very good with C, so any pointers in the right direction would be extremely helpful.
Your problem likely revolves around the line strcpy(arguments[++i], buffer);. You are incrementing i, and then using it as an array index. The first round through the loop will copy into array index 1. When you print from the loop, you start at index 0. Since you don't initialize the arrays, they're full of garbage and bad things happen when you try to print index 0 (full of garbage) as a string.
Two suggestions to fix this: First, move expressions with side effects (like ++i) to a line of their own. This makes things simpler and eliminates any order-of-operations gotchas. Second, print out the arguments as soon as you read them instead of looping back through everything a second time. Since you're just printing the values, this would mean you wouldn't need an entire array to store all of the arguments. You'd only need enough buffer to store the current argument long enough to print it.
the following code:
compiles cleanly
removes unneeded local variables
outputs the proper items, then quits
defines magic numbers with meaningful names
uses NUL terminated array for the delimiters for strtok()
used the 'typical' name for the returned value of strtok()
always checks the returned value from strtok()
and now the code:
#include <stdio.h>
#include <string.h>
#define MAX_CMD_LEN (256)
#define MAX_ARGS (256)
#define MAX_ARG_LEN (256)
int main( void )
{
char input[] = "ls -a -f";
char *token;
char command[ MAX_CMD_LEN ] = {'\0'};
char arguments[ MAX_ARGS ][ MAX_ARG_LEN ] = {{'\0'}};
if ( NULL != (token = strtok(input, " \t\r\n" )) )
strcpy(command, token);
printf("Command: %s\n\r", command);
size_t i = 0;
while (i<MAX_ARGS && NULL != (token = strtok( NULL, " \t\r\n" ) ) )
{
strcpy(arguments[ i ], token);
i++;
}
for( i=0; *arguments[i]; i++ )
{
printf("Argument[%lu]: %s\n", i, arguments[i]);
}
return 0;
} // end function: main

How do I make this shell to parse the statement with quotes around them in C?

I am trying to make this shell parse. How do I make the program implement parsing in a way so that commands that are in quotes will be parsed based on the starting and ending quotes and will consider it as one token? During the second while loop where I am printing out the tokens I think I need to put some sort of if statement, but I am not too sure. Any feedback/suggestions are greatly appreciated.
#include <stdio.h> //printf
#include <unistd.h> //isatty
#include <string.h> //strlen,sizeof,strtok
int main(int argc, char **argv[]){
int MaxLength = 1024; //size of buffer
int inloop = 1; //loop runs forever while 1
char buffer[MaxLength]; //buffer
bzero(buffer,sizeof(buffer)); //zeros out the buffer
char *command; //character pointer of strings
char *token; //tokens
const char s[] = "-,+,|, ";
/* part 1 isatty */
if (isatty(0))
{
while(inloop ==1) // check if the standard input is from terminal
{
printf("$");
command = fgets(buffer,sizeof(buffer),stdin); //fgets(string of char pointer,size of,input from where
token = strtok(command,s);
while (token !=NULL){
printf( " %s\n",token);
token = strtok(NULL, s); //checks for elements
}
if(strcmp(command,"exit\n")==0)
inloop =0;
}
}
else
printf("the standard input is NOT from a terminal\n");
return 0;
}
For an arbitrary command-line syntax, strtok is not the best function. It works for simple cases, where the words are delimited by special characters or white space, but there will come a time where you want to split something like this ls>out into three tokens. strtok can't handle this, because it needs to place its terminating zeros somewhere.
Here's a quick and dirty custom command-line parser:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <ctype.h>
int error(const char *msg)
{
printf("Error: %s\n", msg);
return -1;
}
int token(const char *begin, const char *end)
{
printf("'%.*s'\n", end - begin, begin);
return 1;
}
int parse(const char *cmd)
{
const char *p = cmd;
int count = 0;
for (;;) {
while (isspace(*p)) p++;
if (*p == '\0') break;
if (*p == '"' || *p == '\'') {
int quote = *p++;
const char *begin = p;
while (*p && *p != quote) p++;
if (*p == '\0') return error("Unmachted quote");
count += token(begin, p);
p++;
continue;
}
if (strchr("<>()|", *p)) {
count += token(p, p + 1);
p++;
continue;
}
if (isalnum(*p)) {
const char *begin = p;
while (isalnum(*p)) p++;
count += token(begin, p);
continue;
}
return error("Illegal character");
}
return count;
}
This code understands words separated by white-space, words separated by single or double quotation marks and single-character operators. It doesn't understand escaped quotation marks inside quotes and non-alphanumeric characters such as the dot in words.
The code is not hard to understand and you can extend it easily to understand double-char operators such as >> or comments.
If you want to escape quotation marks, you'll have to recognise the escape character in parse and unescape it and possible other escape sequences in token.
First, you've declared argv to be an array of pointers to... pointers. In fact, it is an array of pointers to chars. So:
int main(int argc, char **argv){
The trend is you want to reach for [], which got you into incorrect code here, but the idiom in C/C++ is more commonly to use pointer syntax, e.g.:
const char* s = "-+| ";
FWIW.
Also, note that fgets() will return NULL when it hits end of file (e.g., the user types CTRL-D on *nix or CTRL-Z on DOS/Windows). You probably don't want a segment violation when that happens.
Also, bzero() is a nonportable function (you probably don't care in this context) and the C compiler will happily initialize an array to zeroes for you if you ask it to (possibly worth caring about; syntax demonstrated below).
Next, as soon as you allow quoted strings, the next language question that immediately arises is: "how do I quote a quote?". Then, you are immediately out of the territory that can be handled cleanly with strtok(). I'm not 100% sure how you want to break your string into tokens. Using strtok() in the way you do, I think the string "a|b" would produce two tokens, "a" and "b", making you overlook the "|". You're treating "|" and "-" and "+" like whitespace, to be ignored, which is not generally what a shell does. For example, given this command-line:
echo 'This isn''t so hard' | cp -n foo.h .. >foo.out
I would probably want to get the following list of tokens:
echo
'This isn''t so hard'
|
cp
-n
foo.h
..
>
foo.out
Usually, characters like '+' and '-' are not special for most shells' tokenizing process (unlike '|' and '&' and '<', etc. which are instructions to the shell that the spawned command never sees). They get passed onto the application that is then free to decide "'-' indicates this word is an option and not a filename" or whatever.
What follows is a version of your code that produces the output I described (which may or may not be exactly what you want) and allows either double or single-quoted arguments (trivial to extend to handle back-ticks too) that can contain quote marks of the same kind, etc.
#include <stdio.h> //printf
#include <unistd.h> //isatty
#include <string.h> //strlen,sizeof,strtok
#define MAXLENGTH 1024
int main(int argc, char **argv[]){
int inloop = 1; //loop runs forever while 1
char buffer[MAXLENGTH] = {'\0'}; //compiler inits entire array to NUL bytes
// bzero(buffer,sizeof(buffer)); //zeros out the buffer
char *command; //character pointer of strings
char *token; //tokens
char* rover;
const char* StopChars = "|&<> ";
size_t toklen;
/* part 1 isatty */
if (isatty(0))
{
while(inloop ==1) // check if the standard input is from terminal
{
printf("$");
token = command = fgets(buffer,sizeof(buffer),stdin); //fgets(string of char pointer,size of,input from where
if(command)
while(*token)
{
// skip leading whitespace
while(*token == ' ')
++token;
rover = token;
// if possible quoted string
if(*rover == '\'' || *rover == '\"')
{
char Quote = *rover++;
while(*rover)
if(*rover != Quote)
++rover;
else if(rover[1] == Quote)
rover += 2;
else
{
++rover;
break;
}
}
// else if special-meaning character token
else if(strchr(StopChars, *rover))
++rover;
// else generic token
else
while(*rover)
if(strchr(StopChars, *rover))
break;
else
++rover;
toklen = (size_t)(rover-token);
if(toklen)
printf(" %*.*s\n", toklen, toklen, token);
token = rover;
}
if(strcmp(command,"exit\n")==0)
inloop =0;
}
}
else
printf("the standard input is NOT from a terminal\n");
return 0;
}
Regarding your specific request: commands that are in quotes will be parsed based on the starting and ending quotes.
You can use strtok() by tokenizing on the " character. Here's how:
char a[]={"\"this is a set\" this is not"};
char *buf;
buf = strtok(a, "\"");
In that code snippet, buf will contain "this is a set"
Note the use of \ allowing the " character to used as a token delimiter.
Also, Not your main issue, but you need to:
Change this:
const char s[] = "-,+,|, "; //strtok will parse on -,+| and a " " (space)
To:
const char s[] = "-+| "; //strtok will parse on only -+| and a " " (space)
strtok() will parse out whatever you have in the delimiter string, including ","

Resources