Working with tokenizing in c - c

I am trying to tokenize a line and put it into a two dimensional array so far I have come up with this but I feel I am far off:
/**
* Function to tokenize an input line into seperate tokens
*
* The first arg is the line to be tokenized and the second arg points to
* a 2-dimentional string array. The number of rows of this array should be
* at least MAX_TOKENS_PER_LINE size, and the number of columns (i.e., length
* of each string should be at least MAX_TOKEN_SIZE)
*
* Returns 0 on success and negative number on failure
*/
int __tokenize(char *line, char tokens[][MAX_TOKEN_SIZE], int *num_tokens){
char *tokenPtr;
tokenPtr = strtok(line, " \t");
for(int j =0; j<MAX_TOKEN_SIZE; j++){
while(tokenPtr != NULL){
if(!(tokens[][j] = tokenPtr)){return -1;}
num_tokens++;
tokenPtr = strtok(NULL, " \t");
}
}
return 0;
}

int __tokenize(char *line, char tokens[][MAX_TOKEN_SIZE], int *num_tokens)
{
char *tokenPtr;
tokenPtr = strtok(line, " \t");
for (int i = 0; tokenPtr; i++)
{
tokens[i] = tokenPtr;
tokenPtr = strtok(NULL, " \t");
}
}
Hope this should work.

You should implement a finite state machine, I've just finish my shell command Lexer/Parser (LL)
Look : How to write a (shell) lexer by hand

tokenPtr is not initialized - it may or may not be NULL the first time through the loop.
strtok takes 2 arguments. If you want to split on multiple chars, include them all in the 2nd string.
After the strtok call, token pointer points to the string you want. Now what? You need somewhere to store it. Perhaps an array of char*? Or an 2d array of characters, as in your edited prototype.
tokens[i] is storage for MAX_TOKEN_SIZE characters. strtok() returns a pointer to a string (a sequence of 1 or more characters ). You need to copy one into the other.
What is the inner loop accomplishing?
Note that char tokens[][MAX] is usually referred to as a 2-D array of characters. (or a 1-D array of fixed-length strings). A 2-D array of strings would be char* tokens[][MAX]

Related

How to add words from a string to an array of strings in c

What I have is a string, let's say char input[] = "one two three"; and what I want is a function that takes in two arguments, the input string and an array of strings where I want those words to be.
For example, in pseudo code, transferWords(input, words) would take every word in the input string and put it in the string array words so that words = {"one", "two", "three"}. I can't allocate memory (malloc(), etc...) to do this since the exercise does not allow me to.
What I've tried is using pointers but this isn't useful because if I happen to access words[21] it would be reading something else:
void transfer(char input[], char *words[20]){
char *p;
int i = 0;
p = strtok(input," \t\n");
while(p != 0)
{
words[i++] = p;
p = strtok(0, " \t\n");
}
}
where words would be initalized as char *words[20] = {0}; before.
How could I go about doing this?
(I am still pretty new to C and I'm not very used to it yet, so apologies if this is something obvious.)
If you are not able to resize your arrays, you must allocate them initially with the proper size. For any input array a, the max number of words is (n/2)+1, where n is the number of characters in a. We then know the max size of any word is n, as we could have an input string with only one word. If you declare your words array with this size, you can guarantee for any input you can capture all the words. You will, in many cases, waste some (or a lot) of space, but you will guarantee all possible words can be stored. I'm not sure how the allocation is done before hand, but see the following code for a general description.
int n;
//Get first the size of the input array...
scanf("%d", n);
//Now we need to get the entire input and allocate our arrays
char input[n + 1]; //plus one for the null terminator if we need it
char words[(n/2) + 1][n + 1]; //(n/2) + 1 max words with a max size of n for each
//plus one on n for the null terminator
//get input...
fgets(input, n, stdin);
//Now you can run your function
The general more intutive way of doing this is using malloc and realloc to dynamically grow your array so you don't waste so much space, but since you explicitly said you cannot do this, this will work as well and will guarantee the minimum amount of space used while guaranteeing all possible combination of words can be stored.
Then, to move the strings from the input to the words array, use strcpy to copy the individual words to the words array.
void transfer(char *input, char **words){
char *p;
int i = 0;
p = strtok(input," \t\n");
while(p != NULL)
{
strcpy(words[i++], p);
p = strtok(NULL, " \t\n");
}
}
As a hint, name your functions and parameters more meaningful.
For Example:
/*
* Breaks the string str into words (delimited by whitespace)
* and stores them in the array words.
*
* #param str a null-terminated string, must not be NULL
* #param words an array of char pointers, must not be NULL
* #param length the size of the array words, must be >0
*
* #return returns the number of words in the string
*/
int split(char *str, char *words[], unsigned length)
{
int i=0;
for (; i < length; ++i, str = NULL) {
words[i] = strtok(str, "\r\n\t\f ");
if (words[i] == NULL)
break;
}
return i;
}
int main()
{
#define N 20
char *words[N];
char *input = strdup("one two three");
int num = split(input, words, N);
printf("%d\n", num);
free(input);
return 0;
}

C programming arrays big level

So im getting a file with strings, i want to tokenize each string whenever i come to a whitespace/newline. i am able to get the tokens seperated into delimiter strings, but im not able to copy them into an array.
int lexer(FILE* file){
char line[50];
char* delim;
int i = 0;
int* intptr = &i;
while(fgets(line,sizeof(line),file)){
printf("%s\n", line);
if(is_empty(line) == 1)
continue;
delim = strtok(line," ");
if(delim == NULL)
printf("%s\n", "ERROR");
while(delim != NULL){
if(delim[0] == '\n'){
//rintf("%s\n", "olala");
break;
}
tokenArray[*intptr] = delim;
printf("Token IN array: %s\n", tokenArray[*intptr]);
*intptr = *intptr + 1;
delim = strtok(NULL, " ");
}
if i run this i get the output :
Token IN array: 012
Token IN array: 23ddd
Token IN array: vs32
Token IN array: ,344
Token IN array: 0sdf
which is correct according to my textfile, but when i try to reprint the array at a later time in the same function and out
*intptr = *intptr + 1;
delim = strtok(NULL, " ");
}
}
printf("%s\n", tokenArray[3]);
fclose(file);
return 0;
i dont get an output, i tried writing all the contents of the array to a txt file, i got gibberish. i dont know what to do plz help
First, your pointer on i is useless. Why not using i directly?
I'll assume that from now on.
Then, the real problem: you have to allocate and copy the strings that strtok returns each time because strtok does not allocate the tokens for you, it justs points to the last one. The references are all the same, so you get last empty token
Something like this would help:
tokenArray[*intptr] = strdup(delim);
(instead of tokenArray[*intptr] = delim;) note that I have replaced the index by i. Just to i++ afterwards.
BTW I wouldn't recommend using strtok for other purposes that quick hacks. This function has a memory, so if you call several functions using it in different parts of your program, it can conflict (I made that mistake a long time ago). Check manual for strtok_r in that case (r for reentrant)
tokenArray[*intptr] = delim;
In this line, delim is a pointer to a char array of which the content is ever changing in the for loop. So in your case, the content which delim point to should be copied as content of tokenArray[*intptr], that is:
tokenArray[*intptr] = strdup(delim);

Fill char*[] from strtok

I have problems getting following Code to work. It parses a users input into a char*[] and returns it. However the char* command[] does not accept any values and stays filled with NULL... whats going on here?
void* setCommands(int length){
char copy[strlen(commandline)]; //commandline is a char* read with gets();
strcpy(copy, commandline);
char* commands[length];
for (int x=0; x<length; x++)
commands[x] = "\0";
int i = 0;
char* temp;
temp = strtok (copy, " \t");
while (temp != NULL){
commands[i] = temp; //doesnt work here.. commands still filled with NULL afterwards
i++;
printf("word:%s\n", temp);
temp = strtok (NULL, " \t");
}
commands[i] = NULL;
for (int u=0; u<length; u++)
printf("%s ", commands[i]);
printf("\n");
return *commands;
}
You may assume, that commandline != NULL, length != 0
commands[i] = NULL;
for (int u=0; u<length; u++)
printf("%s ", commands[i]);
Take a very good look at that code. It uses u as the loop control variable but prints out the element based on i.
Hence, due to the fact you've set commands[i] to NULL in the line before the loop, you'll just get a series of NULLs.
Use commands[u] in the loop rather than commands[i].
In addition to that:
void* setCommands(int length){
char* commands[length];
:
return *commands;
}
will only return one pointer, the one to the first token, not the one to the array of token pointers. You cannot return addresses of local variables that are going out of scope (well, you can, but it may not work).
And, in any case, since that one pointer most likely points to yet another local variable (somewhere inside copy), it's also invalid.
If you want to pass back blocks of memory from functions, you'll need to look into using malloc, in this case both for the array of pointers and the strings themselves.
You have a number of issues... Your program will be exhibiting undefined behaviour currently, so until you address the issues you cannot hope to predict what's going on. Let's begin.
The following string is one character too short. You forgot to include a character for the string terminator ('\0'). This will lead to a buffer overrun during tokenising, which might be partly responsible for the behaviour you are seeing.
char copy[strlen(commandline)]; // You need to add 1
strcpy(copy, commandline);
The next part is your return value, but it's a temporary (local array). You are not allowed to return this. You should allocate it instead.
// Don't do this:
char* commands[length];
for (int x=0; x<length; x++)
commands[x] = "\0"; // And this is not the right way to zero a pointer
// Do this instead (calloc will zero your elements):
char ** commands = calloc( length, sizeof(char*) );
It's possible for the tokenising loop to overrun your buffer because you never check for length, so you should add in a test:
while( temp != NULL && i < length )
And because of the above, you can't just blindly set commands[i] to NULL after the loop. Either test i < length or just don't set it (you zeroed the array beforehand anyway).
Now let's deal with the return value. Currently you have this:
return *commands;
That returns a pointer to the first token in your temporary string (copy). Firstly, it looks like you actually intended to return an array of tokens, not just the first token. Secondly, you can't return a temporary string. So, I think you meant this:
return commands;
Now, to deal with those strings... There's an easy way, and a clever way. The easy way has already been suggested: you call strdup on each token before shoving them in memory. The annoying part of this is that when you clean up that memory, you have to go through the array and free each individual token.
Instead, let's do it all in one hit, by allocating the array AND the string storage in one call:
char **commands = malloc( length * sizeof(char*) + strlen(commandline) + 1 );
char *copy = (char*)(commands + length);
strcpy( copy, commandline );
The only thing I didn't do above is zero the array. You can do this after the tokenising loop, by just zeroing the remaining values:
while( i < length ) commands[i++] = NULL;
Now, when you return commands, you return an array of tokens which also contains its own token storage. To free the array and all strings it contains, you just do this:
free( commands );
Putting it all together:
void* setCommands(int length)
{
// Create array and string storage in one memory block.
char **commands = malloc( length * sizeof(char*) + strlen(commandline) + 1 );
if( commands == NULL ) return NULL;
char *copy = (char*)(commands + length);
strcpy( copy, commandline );
// Tokenise commands
int i = 0;
char *temp = strtok(copy, " \t");
while( temp != NULL && i < length )
{
commands[i++] = temp;
temp = strtok(NULL, " \t");
}
// Zero any unused tokens
while( i < length ) commands[i++] = NULL;
return commands;
}

Double pointer to char[]

Alright, so I have the following code:
char** args = (char**)malloc(10*sizeof(char*));
memset(args, 0, sizeof(char*)*10);
char* curToken = strtok(string, ";");
for (int z = 0; curToken != NULL; z++) {
args[z] = strdup(curToken);
curToken = strtok(NULL, ";")
}
I want every arg[z] casted into an array of chars -- char string[100] -- and then processed in the algorithms I have following. Every arg[z] needs to be casted to the variable string at some point. I am confused by pointers, but I am slowly getting better at them.
EDIT:
char string[100] = "ls ; date ; ls";
arg[0] will be ls, arg[1] will be date, and arg[2] will be ls after the above code.
I want to put each argument back into char string[100] and process it through algorithms.
one easiest way is to keep a backup of the original string in some temporary variable.
char string[100] = "ls ; date ; ls";
char temp_str[100] = {0};
strcpy (temp_str, string);
Another way is to do it by strcat. z has the number of agruments.
memset(string, '\0', 100);
for (i = 0; i < z; i++)
{
strcat(string, args[i]);
if (i != (z - 1))
{
//if it is last string dont append semicolon
strcat(string, ";");
}
}
Note : Take care of the boundary condition check
If you want the parts of string copied into a fixed length string[100] then you need to malloc 100 chars for each args[] inside the loop and strncpy() the result of strtok into it. strdup will only allocate enough memory for the actual length of the supplied string (plus \0)
This:
char** args = (char**)malloc(10*sizeof(char*));
memset(args, 0, sizeof(char*)*10);
is broken code. First, you shouldn't cast malloc()'s return value. Second, args is a pointer to ten pointers to char. You can't set them to NULL using memset(), there's no guarantee that "all bytes zero" is the same as NULL. You need to use a loop.

Tokenize the string in c

i need to tokenize the string in c. suppose if i have a string like this
"product=c,author=dennis,category=programming".
I want to extract only the values among these key values pairs. Like
[c,dennis,programming].
I have used strtok function which tokenizes with "=" and I get values
[product,c,author,dennis,category,programming].
Is there any built in function that can generate only the values like mentioned above.
Just a simple scanf
#include<stdio.h>
int main()
{
char token[20] = { 0 };
char c, name[20];
int i=0, offset;
while (scanf("%[a-z]%*[^a-z]", token) == 1) {
i++;
if(i%2==0)
printf("[%s]\n",token );
}
return 0;
}
./a.out
product=c,author=dennis,category=programming,
[c]
[dennis]
[programming]
Ctrl+D
Note. I have added , at the end of the string
You could simply skip every second token like that:
#include <stdio.h>
#include <string.h>
int main(void) {
char str[] = "product=c,author=dennis,category=programming";
char* p = strtok(str, ",=");
while (p != NULL) {
p = strtok(NULL, ",=");
if (p != NULL) {
printf("%s\n", p);
strtok(NULL, ",="); // skip this
}
}
return 0;
}
I can think of a couple of ways:
First tokenize on ,, then split each part on the =.
Find the first =, then the , after it, and get the word in between. Repeat.
If there are always three values, you can use sscanf to read the values.
You can use a regex library to parse the string.
You can first tokenize on ,, splitting the contents into 3 different strings, then tokenize on '=' for each of those strings:
char *kvpair[N] = {NULL}; // where N is large enough for the expected
// number of key-value pairs
char *tok = strtok(input, ",");
size_t kvcount = 0;
while (tok != NULL && kvcount < N)
{
kvpair[kvcount++] = tok;
tok = strtok(NULL, ",");
}
...
for (i = 0; i < kvcount; i++)
{
char delim = '[';
char *key = strtok(kvpair[i], "=");
char *val = strtok(NULL, "=");
printf("%c%s", delim, val);
delim = ',';
}
putchar(']');
This is just a rough sketch; it assumes that the maximum number of key-value pairs is known ahead of time, it doesn't attempt to handle empty keys or values, or really do any sort of error handling at all. But it should point you in the right direction.
Remember that strok modifies its input; if your original data is a string literal or if you need to preserve the original data, you'll need to make a copy and work on that copy.
Note that, because of how strok works, you can't "nest" calls; that is, you can't tokenize the first key-value pair, then split it into key and value tokens, then get the next key-value pair. You'll have to tokenize all the key-value pairs first, then process each one in turn.

Resources