strtok: How to store tokens in two different buffers - c

I have a string containing datatypes and addresses of variables. These values are separated by "///" and they are alternating (type /// address /// type /// address ...). The amount of these tuples is not fixed and can vary from execution to execution.
Now my problem is how to process the string in a loop, as strtok needs to be called first with the original string and then with the NULL parameter but in the loop it has to be called twice. So after the first loop strtok is called three times which leads to an uneven count of strtok executions whereas it should be an even count. I tried to solve this problem by processing the first tuple outside the loop (because strtok has to be called with the original string) and process the remaining tuples inside the loop.
char mystring[128];
char seperator[] = "///";
char *part;
int type [128];
int address [128];
number_of_variables = 0;
part = strtok(mystring, seperator);
type[number_of_variables] = (int) atoi(part);
part = strtok(NULL, seperator);
address[number_of_variables] = (int)strtol(part, NULL, 16);
while(part != NULL){
part = strtok(NULL, seperator);
type[number_of_variables] = (int) atoi(part);
part = strtok(NULL, seperator);
address[number_of_variables] = (int)strtol(part, NULL, 16);
number_of_variables++;
}
So now I have an even count of strtok executions but if my strings contains for example 2 tuples it will enter the loop for a second time so strtok is called for a fifth time which causes the program to crash as atoi() gets a bad pointer.
EDIT:
Example for mystring:
"1///0x37660///2///0x38398"
1 and 2 are type identifiers for the further program.

I can suggest the following loop as it is shown in the demonstrative program below.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main(void)
{
char mystring[128] = "1///0x37660///2///0x38398";
char separator[] = "/ ";
int type [128];
int address [128];
size_t number_of_variables = 0;
for ( char *part = strtok( mystring, separator ); part; part = strtok( NULL, separator ) )
{
type[number_of_variables] = atoi(part);
part = strtok( NULL, separator );
address[number_of_variables] = part ? (int)strtol(part, NULL, 16) : 0;
++number_of_variables;
}
for ( size_t i = 0; i < number_of_variables; i++ )
{
printf( "%d\t%x\n", type[i], address[i] );
}
return 0;
}
The program output is
1 37660
2 38398

You can write a robust and fast parser, that is guaranteed to work and has no bugs like this
File: lexer.l
%{
#include <stdio.h>
#include "parser.tab.h"
int yyerror(const char *const message);
%}
%option noyywrap
%x IN_ADDRESS
DECIMAL [0-9]+
HEX "0x"[a-fA-F0-9]+
DELIMITER "///"
%%
<*>{DELIMITER} { return DELIMITER; }
<INITIAL>{DECIMAL} {
char *endptr;
// Make the lexer know that we are expecting a
// hex number
BEGIN(IN_ADDRESS);
// Asign the value to use by bison
yylval = strtol(yytext, &endptr, 10);
// Check conversion's success
if (*endptr != '\0')
return ERROR;
return TYPE;
}
<IN_ADDRESS>{HEX} {
char *endptr;
// Restore the initial state
BEGIN(INITIAL);
// Asign the value to use by bison
yylval = strtol(yytext, &endptr, 16);
// Check conversion's success
if (*endptr != '\0')
return ERROR;
return ADDRESS;
}
%%
File: parser.y
%{
#include <stdio.h>
extern int yylex();
extern FILE *yyin;
int yyerror(const char *const message);
#define YYSTYPE int
%}
%token TYPE
%token DELIMITER
%token ADDRESS
%token ERROR
%%
program:
| program statement
;
command: TYPE DELIMITER ADDRESS {
fprintf(stdout, "type %d, address 0x%08x\n", $1, $3);
}
;
statement: command
| statement DELIMITER command;
;
%%
int
yyerror(const char *const message)
{
return fprintf(stdout, "error: %s\n", message);
}
int
main(void)
{
yyin = fopen("program.txt", "r");
if (yyin == NULL)
return -1;
yyparse();
}
File: program.txt
1///0x37660///2///0x38398
Compiling this with gcc, bison and flex is rather simple
bison -d parser.y
flex lexer.l
gcc -Wno-unused-function -Wall -Werror lex.yy.c parser.tab.c -o parserparser
Of course, this program needs some tweaking and adjusting it to your needs should be straightforward.
Just find a simple tutorial on bison and flex to help you fully understand this code.

Related

Reading particular character from file

I am trying to make a function that reads a text file which contains data and assign it to a variable. However some lines start with $ which need to be ignored. For example:
$ Monday test results
10 12
$ Tuesday test results
4
This is what I have so far which just prints out:
10 12
4
The code that does this is:
#include <stdio.h>
#include <stdlib.h>
void read_data(){
FILE* f;
if (f = fopen("testdata.txt", "r")) {
char line[100];
while (!feof(f)) {
fgets(line, 100, f);
if (line[0] == '$') {
continue;
} else{
puts(line);
}
}
} else {
exit(1);
}
fclose(f);
}
void main(){
read_data();
return 0;
}
I have tried fgetc and have googled extensively but am still stuck ;(
**Edits
Added #include and main
What I am asking is how to assign like a = 10, b = 12, c = 4. Had troubles since using fgets is for lines. Tried fgetc but it would only ignore the actual $ sign not the whole line that the $ is on
C string.h library function - strtok()
char *strtok(char *str, const char *delim)
str − The contents of this string are modified and broken into smaller strings (tokens).
delim − This is the C string containing the delimiters. These may vary from one call to another.
This function returns a pointer to the first token found in the string. A null pointer is returned if there are no tokens left to retrieve.
Copied from: https://www.tutorialspoint.com/c_standard_library/c_function_strtok.htm
#include <string.h>
#include <stdio.h>
int main () {
char str[80] = "This is - www.tutorialspoint.com - website";
const char s[2] = "-";
char *token;
/* get the first token */
token = strtok(str, s);
/* walk through other tokens */
while( token != NULL ) {
printf( " %s\n", token );
token = strtok(NULL, s);
}
return(0);
}
Output:
This is
www.tutorialspoint.com
website

How to extract contents between a specific character in a c string?

Say I have char ch[] = "/user/dir1/file.txt";
I want to use a loop such that:
1st iteration:
prints: "user"
2nd iteration:
prints: "dir1"
3rd iteration:
prints: "file1.txt"
reach the end of string. Exists the loop
You have to use strtok or its threadsafe version if you are developing a multithreaded program:
#include<stdio.h>
#include <string.h>
int main() {
char ch[] = "/user/dir1/file.txt";
// Extract the first token
char * token = strtok(ch, "/");
// loop through the string to extract all other tokens
while( token != NULL ) {
printf( "%s\n", token ); //printing each token
token = strtok(NULL, " ");
}
return 0;
}
A "simple", portable, thread-safe solution that does not modify the string, as the approach using strtok() does. So the approach below can be applied to literals as well!
#include <stdio.h>
#include <string.h>
int main(void)
{
const char * s = "/user/dir1/file.txt";
for (const char * ps = s, *pe;
pe = strchr(ps, '/'), ps != pe ?printf("%.*s\n", (int) (pe - ps), ps) :0, pe;
ps = pe + 1);
}
The only limitation this code is facing, is that the tokens within the string to be parsed may not be longer then INT_MAX characters.

Splitting a command line argument in C

I want to split the first command line argument into two different numbers. I got a segmentation fault error when running the program this way:
gcc -ansi main.c -o main
./main 6000V7000
Here is the source code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char *argv[]) {
char *token;
char arr[200];
strcpy(arr, argv[1]);
token = strtok(arr, "v,V");
int firstNumber = atoi(token);
token = strtok(NULL, "v,V");
int secondNumber = atoi(token);
return 0;
}
How can I fix this problem?
You do not test if there is at least one command line argument, nor that this argument is less than 200 characters long, nor do you test the return values of strtok: you would have undefined behavior if the command is given no argument or if the argument does not contain any of the characters v, V or ,.
If you effectively compile the program with gcc -ansi main.c -o main and run it with the posted argument as ./main 6000V7000 you should not get a segmentation fault... There is something you are not telling us ;)
It is always better to avoid wild assumptions: test for the unexpected to give your program defined behavior in all cases.
Here is a simpler approach for your problem using sscanf():
#include <stdio.h>
int main(int argc, char *argv[]) {
int a, b;
if (argc > 1 && sscanf(argv[1], "%d%*1[vV,]%d", &a, &b) == 2)
printf("a=%d, b=%d\n", a, b);
return 0;
}
The code and command line arguments given do not seg-fault. However if the command line argument either omits the delimiter, includes a space before the second number, or omits any argument at all, then it will fail.
The following will prevent erroneous input causing a runtime error:
if( argc > 1 )
{
strcpy( arr, argv[1]);
int firstNumber = 0 ;
int secondNumber = 0 ;
token = strtok(arr, "v,V");
if( token != NULL )
{
firstNumber = atoi(token) ;
token = strtok(NULL, "v,V") ;
if( token != NULL )
{
secondNumber= atoi(token);
}
}
}
You must define var at the begin of the function :
The follow code could work :
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char *argv[]){
char *token;
char arr[200];
int firstNumber;
int secondNumber;
strcpy( arr, argv[1]);
token = strtok(arr, "v,V");
firstNumber = atoi(token);
token = strtok(NULL, "v,V");
secondNumber= atoi(token);
return 0;
}
There's only one "V" in your input. The second call to strtok() finds no V and returns NULL.

Segmentation fault: 11 while trying to parse string

I'm trying to parse an input string into a command string and an array of arguments strings.
I'm having some issue using strtok and strcpy, I think that my command string is not being null terminated properly which is leading to the seg fault.
#include <stdio.h>
#include <string.h>
#define delims " \t\r\n"
int main() {
char input[] = "ls -a -f";
char *buffer;
char command[256];
char arguments[256][256];
int i = 0, j;
buffer = strtok(input, delims);
strcpy(command, buffer);
printf("Command: %s\n\r", command);
while (buffer != NULL)
{
buffer = strtok(NULL, delims);
strcpy(arguments[++i], buffer);
}
buffer = strtok(NULL, delims);
for (j = 0; j < i; ++i)
{
printf("Argument[%d]: %s", j, arguments[j]);
}
return 0;
}
Current Output:
Command: ls
Segmentation fault: 11
Expected Output:
Command: ls
Argument[0]: -a
Argument[1]: -f
I don't pretend to be very good with C, so any pointers in the right direction would be extremely helpful.
Your problem likely revolves around the line strcpy(arguments[++i], buffer);. You are incrementing i, and then using it as an array index. The first round through the loop will copy into array index 1. When you print from the loop, you start at index 0. Since you don't initialize the arrays, they're full of garbage and bad things happen when you try to print index 0 (full of garbage) as a string.
Two suggestions to fix this: First, move expressions with side effects (like ++i) to a line of their own. This makes things simpler and eliminates any order-of-operations gotchas. Second, print out the arguments as soon as you read them instead of looping back through everything a second time. Since you're just printing the values, this would mean you wouldn't need an entire array to store all of the arguments. You'd only need enough buffer to store the current argument long enough to print it.
the following code:
compiles cleanly
removes unneeded local variables
outputs the proper items, then quits
defines magic numbers with meaningful names
uses NUL terminated array for the delimiters for strtok()
used the 'typical' name for the returned value of strtok()
always checks the returned value from strtok()
and now the code:
#include <stdio.h>
#include <string.h>
#define MAX_CMD_LEN (256)
#define MAX_ARGS (256)
#define MAX_ARG_LEN (256)
int main( void )
{
char input[] = "ls -a -f";
char *token;
char command[ MAX_CMD_LEN ] = {'\0'};
char arguments[ MAX_ARGS ][ MAX_ARG_LEN ] = {{'\0'}};
if ( NULL != (token = strtok(input, " \t\r\n" )) )
strcpy(command, token);
printf("Command: %s\n\r", command);
size_t i = 0;
while (i<MAX_ARGS && NULL != (token = strtok( NULL, " \t\r\n" ) ) )
{
strcpy(arguments[ i ], token);
i++;
}
for( i=0; *arguments[i]; i++ )
{
printf("Argument[%lu]: %s\n", i, arguments[i]);
}
return 0;
} // end function: main

Reading a formatted input using scanf

I want read from stdin some variables with their values using scanf.The input is formatted as below:
MY_VARIABLE_BEGIN
var1
var2
...
MY_VARIABLE_END
MY_VALUES_BEGIN
val1
val2
...
MY_VALUES_END
The input is composed of 2 parts:
part 1:Name of the variables this part is delimited by MY_VARIABLE_BEGIN ,MY_VARIABLE_END
part 2:The values of each variable this part is delimited by MY_VALUES_BEGIN, MY_VALUES_BEGIN
The problem is that i don't know the number of the variables and their values.
Can any body help me find the right format to pass to scanf function,or if there is any other solution to solve the problem?
Example of input
MY_VARIABLE_BEGIN
var1
var2
MY_VARIABLE_END
MY_VALUES_BEGIN
1
5
MY_VALUES_END
I should read 2 variables var1 and var2, var1=1 and var2=5
You can try this
char line[256];
fgets(line, sizeof(line), stdin);
if (strcmp(line, "MY_VARIABLE_BEGIN") {
do {
fgets(line, sizeof(line), stdin);
// . . . do something with the line
} while (strcmp(line, "MY_VARIABLE_END"));
}
Not sure if it'll work.
Doing it with scanf is a pain. Why not use a regular expression from C?
Here's a complete working program to show how easy it can be.
Start by reading all the data into a single string, data. I'm just using a constant.
Compile your pattern with regcomp, then apply it with regexec to your string.
It returns an array of matched groups which correspond to the (.*?) parts of the pattern.
Group 0 is of no interest in this example as it is just the entire data.
For the other 2 groups, you get the indexes in the string of the start and end of the match.
Use strndup() to copy these. Use strtok to split this dup on the newline \n character.
You have in ptr at each point each var and value.
/* regex example. meuh on stackoverflow */
#include <stdlib.h>
#include <sys/types.h>
#include <regex.h>
#include <errno.h>
#include <string.h>
#include <stdio.h>
void pexit(char *str){
extern int errno;
perror(str);
exit(errno);
}
#define NUMMATCH (1+2) /* max num matching capture groups in pattern */
main(int argc, char **argv){
regex_t myexpn;
regmatch_t matches[NUMMATCH] = {0};
int rc,i;
char *data = "\n\
MY_VARIABLE_BEGIN\n\
var1 \n\
var2\n\
...\n\
MY_VARIABLE_END\n\
MY_VALUES_BEGIN\n\
val1\n\
val2\n\
...\n\
MY_VALUES_END\n\
";
char *delim = "\n";
char *pattern = "\\s*MY_VARIABLE_BEGIN\\s*(.*?)MY_VARIABLE_END.*?MY_VALUES_BEGIN\\s*(.*?)MY_VALUES_END";
/* need REG_EXTENDED to use () in pattern else \\(\\) */
rc = regcomp(&myexpn, pattern, REG_EXTENDED);
if(rc!=0)pexit("regcomp");
rc = regexec(&myexpn, data, NUMMATCH, matches, 0);
if(rc==REG_NOMATCH)printf("no match\n");
else{
for(i = 1;i<NUMMATCH;i++){ /* ignore group 0 which is whole match */
if(matches[i].rm_so!=-1){
char *dup = strndup(data+matches[i].rm_so, matches[i].rm_eo-matches[i].rm_so);
printf(" match %d %d..%d \"%s\"\n",i, matches[i].rm_so, matches[i].rm_eo, dup);
char *ptr = strtok(dup, delim);
while(ptr){
printf(" token: %s\n",ptr);
ptr = strtok(NULL, delim);
}
free(dup);
}
}
}
regfree(&myexpn);
}
This prints out:
match 1 19..34 "var1
var2
...
"
token: var1
token: var2
token: ...
match 2 66..80 "val1
val2
...
"
token: val1
token: val2
token: ...

Resources