How to obtain a single word from a string in C? - c

In order to complete a program I am working on, I have to be able to put pieces of a string into a stack for later use. For example, say I had this string:
"22 15 - 2 +"
Ideally, I first want to extract 22 from the string, place it in a separate, temporary string, and then manipulate it as I would like. Here is the code that I'm using which I think would work, but it is very over-complicated.
void evaluatePostfix(char *exp){
stack *s = initStack();
char *temp_str;
char temp;
int temp_len, val, a, b, i=0, j;
int len = strlen(exp);
while(len > 0){
temp_str = malloc(sizeof(char)); //holds the string i am extracting
j=0; //first index in temp_str
temp = exp[i]; //current value in exp, incremented later on the function
temp_len = 1; //for reallocation purposes
while(!isspace(temp)){ //if a white space is hit, the full value is already scanned
if(ispunct(temp)) //punctuation will always be by itself
break; //break if it is encountered
temp_str = (char*)realloc(temp_str, temp_len+1); //or else reallocate the string to hold the new character
temp_str[j] = temp; //copy the character to the string
temp_len++; //increment for the length of temp_str
i++; //advance one value in exp
j++; //advance one value in temp_str
len--; //the number of characters left to scan is one less
temp = exp[i]; //prepare for the next loop
} //and so on, and so on...
} //more actions follow this, but are excluded
}
Like I said, overcomplicated. Is there a simpler way for me to extract this code? I can reliably depend upon there being white space between the values and characters I need to extract.

If you are good to use library function, then strtok is for this
#include <string.h>
#include <stdio.h>
int main()
{
char str[80] = "22 15 - 2 +";
const char s[2] = " ";
char *token;
/* get the first token */
token = strtok(str, s);
/* walk through other tokens */
while( token != NULL )
{
printf( " %s\n", token );
token = strtok(NULL, s);
}
return(0);
}
Reference
The limitation of strtok(char *str, const char *delim) is that it can't work on multiple strings simultaneously as it maintains a static pointer to store the index till it has parsed (hence sufficient if playing with only one string at a time). The better and safer method is to use strtok_r(char *str, const char *delim, char **saveptr) which explicitly takes a third pointer to save the parsed index.
#include <string.h>
#include <stdio.h>
int main()
{
char str[80] = "22 15 - 2 +";
const char s[2] = " ";
char *token, *saveptr;
/* get the first token */
token = strtok_r(str, s, &saveptr);
/* walk through other tokens */
while( token != NULL )
{
printf( " %s\n", token );
token = strtok_r(NULL, s, &saveptr);
}
return(0);
}

Take a look at the strotk function, i think it's what you'r looking for.

Related

How to split a string into separate words and create the array of these words in C language?

So, the task is the following:
Find the number of words in the text in which the first and last characters are the same.
In order to do this, I think I first should split the text and create the array of separate words.
For example, the string is:
"hello goodbye river dog level"
I want to split it and get the following array:
{"hello", "goodbye", "river", "dog", "level"}
I have the code that splits the string:
#include<stdio.h>
#include <string.h>
int main() {
char string[100] = "hello goodbye river dog level";
// Extract the first token
char * token = strtok(string, " ");
// loop through the string to extract all other tokens
while( token != NULL ) {
printf( " %s\n", token ); //printing each token
token = strtok(NULL, " ");
}
return 0;
}
However, it just prints these words, and I need to append each word to some array. The array shouldn't be of fixed size, because potentially I could add as many elements as the text requires. How to do this?
I don't see any reason to split into words. Just iterate the string while keeping a flag that tells whether you are inside or outside a word (i.e. a state variable). Then have variables for first and last character that you maintain as you iterate. Compare them when you go out of a word or reach end-of-string.
A simple approach could look like:
#include <stdio.h>
int count(const char* s)
{
int res = 0;
int in_word = 0;
char first;
char last;
while(*s)
{
if (in_word)
{
if (*s == ' ')
{
// Found end of a word
if (first == last) ++res;
in_word = 0;
}
else
{
// Word continues so update last
last = *s;
}
}
else
{
if (*s != ' ')
{
// Found start of new word. Update first and last
first = *s;
last = *s;
in_word = 1;
}
}
++s;
}
if (in_word && first == last) ++res;
return res;
}
int main(void)
{
char string[100] = "hello goodbye river dog level";
printf("found %d words\n", count(string));
return 0;
}
Output:
found 2 words
Note: Current code assumes that word delimiter is always a space. Further the code doesn't treat stuff like , . etc. But all that can be added pretty easy.
Here is a simple (but naive) implementation based on the existing strtok code. It doesn't just count but also points out which words that were found, by storing a pointer to them in a separate array of pointers.
This works since strtok changes the string in-place, replacing spaces with null terminators.
#include <stdio.h>
#include <string.h>
int main(void)
{
char string[100] = "hello goodbye river dog level";
char* words[10]; // this is just assuming there's not more than 10 words
size_t count=0;
for(char* token=strtok(string," "); token!=NULL; token=strtok(NULL, " "))
{
if( token[0] == token[strlen(token)-1] ) // strlen(token)-1 gives index of last character
{
words[count] = token;
count++;
}
}
printf("Found: %zu words. They are:\n", count);
for(size_t i=0; i<count; i++)
{
puts(words[i]);
}
return 0;
}
Output:
Found: 2 words. They are:
river
level
with strtok based on Alexander's code.
#include <stdio.h>
#include <string.h>
int main(void)
{
char string[] = "hello, goodbye; river, dog; level.";
char *token = strtok(string, " ,;.");
int counter =0;
while( token != NULL )
{
if(token[0]==token[strlen(token)-1]) counter++;
token = strtok(NULL, " ,;.");
}
printf("found : %d", counter);
return 0;
}

How to replace characters by strtok function - C?

I really want to change all spaces ' ' in my char array for NULL -
#include <string.h>
void ReplaceCharactersInString(char *pcString, char *cOldChar, char *cNewChar) {
char *p = strtok(pcString, cOldChar);
strcpy(pcString, p);
while (p != NULL) {
strcat(pcString, p);
p = strtok(cNewChar, cOldChar);
}
}
int main() {
char pcString[] = "I am testing";
ReplaceCharactersInString(pcString, " ", NULL);
printf(pcString);
}
OUTPUT: Iamtesting
If I simply put the printf(p) function before:
p = strtok(cNewChar, cOldChar);
In the result I have what I need - but the problem is how to store it in pcString (directly)?
Or there is maybe a better solution to simply do it?
While some functions expect a [single] string to be pre-parsed to: I\0am\0testing, that is rare.
And, if you have multiple spaces/delimiters, you'll get (e.g.) foo\0\0bar, which you probably don't want.
And, your printf in main will only print the first token in the string because it will stop on the first EOS (i.e. '\0').
(i.e.) You probably don't want strcpy/strcat.
More likely, you want to fill an array of char * pointers to the tokens you parse.
So, you'd want to pass down char **argv, then do: argv[argc++] = strtok(...); and then do: return argc
Here's how I would refactor your code:
#include <stdio.h>
#include <string.h>
#define ARGMAX 100
int
ReplaceCharactersInString(int argmax,char **argv,char *pcString,
const char *delim)
{
char *p;
int argc;
// allow space for NULL termination
--argmax;
for (argc = 0; argc < argmax; ++argc, ++argv) {
// get next token
p = strtok(pcString,delim);
if (p == NULL)
break;
// zap the buffer pointer
pcString = NULL;
// store the token in the [returned] array
*argv = p;
}
*argv = NULL;
return argc;
}
int
main(void)
{
char pcString[] = "I am testing";
int argc;
char **av;
char *argv[ARGMAX];
argc = ReplaceCharactersInString(ARGMAX,argv,pcString," ");
printf("argc: %d\n",argc);
for (av = argv; *av != NULL; ++av)
printf("'%s'\n",*av);
return 0;
}
Here's the output:
argc: 3
'I'
'am'
'testing'
strcat strcpy should not be used when the source and destination overlap in memory.
Iterate through the array and replace the matching character with the desired character.
Since zeros are part of the string, printf will stop at the first zero and strlen can't be used for the length to print. sizeof can be used as pcString is defined in the same scope.
Note that ReplaceCharactersInString would not work a second time as it would stop at the first zero. The function could be written to accept a length parameter and loop using the length.
#include <stdio.h>
#include <stdlib.h>
void ReplaceCharactersInString(char *pcString, char cOldChar,char cNewChar){
while ( pcString && *pcString) {//not NULL and not zero
if ( *pcString == cOldChar) {//match
*pcString = cNewChar;//replace
}
++pcString;//advance to next character
}
}
int main ( void) {
char pcString[] = "I am testing";
ReplaceCharactersInString ( pcString, ' ', '\0');
for ( int each = 0; each < sizeof pcString; ++each) {
printf ( "pcString[%02d] = int:%-4d char:%c\n", each, pcString[each], pcString[each]);
}
return 0;
}
You want to split the string into individual tokens separated by spaces such as "I\0am\0testing\0". You can use strtok() for this but this function is error prone. I suggest you allocate an array of pointers and make them point to the words. Note that splitting the source string is sloppy and does not allow for tokens to be adjacent such as in 1+1. You could allocate the strings instead.
Here is an example:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char **split_string(const char *str, char *delim) {
size_t i, len, count;
const char *p;
/* count tokens */
p = str;
p += strspn(p, delim); // skip initial delimiters
count = 0;
while (*p) {
count++;
p += strcspn(p, delim); // skip token
p += strspn(p, delim); // skip delimiters
}
/* allocate token array */
char **array = calloc(sizeof(*array, count + 1);
p = str;
p += strspn(p, delim); // skip initial delimiters
for (i = 0; i < count; i++) {
len = strcspn(p, delim); // token length
array[i] = strndup(p, len); // allocate a copy of the token
p += len; // skip token
p += strspn(p, delim); // skip delimiters
}
/* array ends with a null pointer */
array[count] = NULL;
return array;
}
int main() {
const char *pcString = "I am testing";
char **array = split_string(pcString, " \t\r\n");
for (size_t i = 0; array[i] != NULL; i++) {
printf("%zu: %s\n", i, array[i]);
}
return 0;
}
The strtok function pretty much does exactly what you want. It basically replaces the next delimiter with a '\0' character and returns the pointer to the current token. The next time you call strtok, you should pass a NULL argument (see the documentation for strtok) and it will point to the next token, which will again be delimited by '\0'. Read some more examples of correct strtok usage.

Appending words to an array based on a separator

I am trying to break up the sentence "once upon a time" into an array of words. I am doing this via a for loop, detecting three conditions:
It's the end of the loop (add the \0 and break);
It's the separator character (add the \0 and advance to the next word)
It's anything else (add the character)
Here is what I have now:
#include <stdlib.h>
#include <stdio.h>
char ** split_string(char * string, char sep) {
// Allow single separators only for now
// get length of the split string array
int i, c, array_length = 0;
for (int i=0; (c=string[i]) != 0; i++)
if (c == sep) array_length ++;
// allocate the array
char ** array_of_words = malloc(array_length + 1);
char word[100];
for (int i=0, char_num=0, word_num=0;; i++) {
c = string[i];
// if a newline add the word and break
if (c == '\0') {
word[char_num] = '\0';
array_of_words[word_num] = word;
break;
}
// if the separator, add a NUL, increment the word_num, and reset the character counter
if (c == sep) {
word[char_num] = '\0';
array_of_words[word_num] = word;
word_num ++;
char_num = 0;
}
// otherwise, just add the character in the string and increment the character counter
else {
word[char_num] = c;
char_num ++;
}
}
return array_of_words;
}
int main(int argc, char *argv[]) {
char * input_string = "Once upon a time";
// separate the string into a list of tokens separated by the separator
char ** array_of_words;
array_of_words = split_string(input_string, ' ');
printf("The array of words is: ");
// how to get the size of this array? sizeof(array_of_words) / sizeof(array_of_words[0]) gives 1?!
for (int i=0; i < 4 ;i++)
printf("%s[sep]%d", array_of_words[i], i);
return 0;
}
However, instead of printing "once", "upon", "a", "time" at the end, it's printing "time", "time", "time", "time".
Where is the mistake in my code that is causing this?
Here is a working example of the code: https://onlinegdb.com/S1ss6a4Ur
You need to allocate memory for each word, not just for one. char word[100]; only puts aside memory for one word, and once it goes out of scope, the memory is invalid. Instead, you could allocate the memory dynamically:
char* word = malloc(100);
And then, when you found a separator, allocate memory for a new word:
if (c == sep) {
word[char_num] = '\0';
array_of_words[word_num] = word;
word = malloc(100);
Also, this here is incorrect:
char ** array_of_words = malloc(array_length + 1);
You want enough memory for all the char pointers, but you only allocate 1 byte per pointer. Instead, do this:
char ** array_of_words = malloc(sizeof(char*)*(array_length + 1));
The sizeof(array_of_words) / sizeof(array_of_words[0]) works to calculate the amount of elements when array_of_words is an array, because then its size is known at compile time (barring VLAs). It's just a pointer though, so it doesn't work as sizeof(array_of_words) will give you the pointer size. Instead, you'll have to calculate the size on your own. You already do so in the split_string function, so you just need to get that array_of_words out to the main function. There are multiple ways of doing this:
Have it be a global variable
Pass an int* to the function via which you can write the value to a variable in main (this is sometimes called an "out parameter")
Return it along with the other pointer you're returning by wrapping them up in a struct
Don't pass it at all and recalculate it
The global variable solution is the most simple for this small program, just put the int array_length = 0; before the split_string instead of having it inside it.
Last but not least, since we used malloc to allocate memory, we should free it:
for (int i = 0; i < array_length; i++) {
printf("%s[sep]%d", array_of_words[i], i);
free(array_of_words[i]); // free each word
}
free(array_of_words); // free the array holding the pointers to the words
Is strtok not suitable?
char str[] = "once upon a time";
const char delim[] = " ";
char* word = strtok(str, delim);
while(word != NULL)
{
printf("%s\n", word);
word = strtok(NULL, delim);
}

How do I compare the tokenized line with the str "exit"?

I don't know how to compare the line to the word "exit" so that when the keyboard input is exit then the program will exit.
#define MAX_LINE 4096
#define MAX_WORDS MAX_LINE/2
int main()
{
char line[MAX_LINE], *words[MAX_WORDS], message[MAX_LINE];
int stop=0,nwords=0;
while(1)
{
printf("OSP CLI $ ");
fgets(line,MAX_LINE,stdin);
if(strcmp(line,"exit")==0)
{
exit(0);
}
void tokenize(char *line, char **words, int *nwords)
{
*nwords=1;
for(words[0]=strtok(line," \t\n");
(*nwords<MAX_WORDS)&&(words[*nwords]=strtok(NULL, " \t\n"));
*nwords=*nwords+1
); /* empty body */
return;
}
The code is correct but I do not know what it does. So the for(words[0]=strtok(line," \t\n"); reads the first word in the line. "line" is keyboard input that the user types in at runtime which is just a string like: hello world blah dee doo. But after that the next line with nwords<..... don't understand anything after the line with the for.
for (a; b; c) d;
Can be translated into:
a;
while (b) {
d;
c;
}
So:
void tokenize(char *line, char **words, int *nwords)
{
*nwords=1;
for(words[0]=strtok(line," \t\n");
(*nwords<MAX_WORDS)&&(words[*nwords]=strtok(NULL, " \t\n"));
*nwords=*nwords+1
); /* empty body */
return;
}
can be translated into (with some other improvements, ex. int *a; if (a) is same as int *a; if (a != NULL)):
void tokenize(char *line, char **words, int *nwords)
{
*nwords = 1;
words[0] = strtok(line, " \t\n");
while (
*nwords < MAX_WORDS &&
(words[*nwords] = strtok(NULL, " \t\n")) != NULL
) {
/* empty body */
*nwords = *nwords + 1;
}
}
let's verbose it a bit more:
void tokenize(char *line, char **words, int *nwords)
{
*nwords = 1;
words[0] = strtok(line, " \t\n");
while (*nwords < MAX_WORDS) {
words[*nwords] = strtok(NULL, " \t\n");
if (words[*nwords] == NULL) {
break;
}
/* empty body */
*nwords = *nwords + 1;
}
}
This function is dangerous or probably a part of something bigger (does not check if arguments are null and omits if line is empty).
words is a pointer, it's an array of char* pointers. The length of the words pointer seems to be at least MAX_WORDS long. nwords is a pointer to the returned length of the words pointer. The caller expects this function to fill words memory and nwords memory with tokens from the string. It is assumed that all pointers are not NULL and valid, MAX_WORDS > 0 and strlen(line) != 0 or that the string line does not consist only of " \t\n" delimiters we use, so that there is always a first token.
First the nwords is initialized with 1, and first token is extracted words[0] = strtok(line, " \t\n");.
Then until the number of tokens is lower then MAX_WORDS the next token is extracted words[*nwords] = strtok(NULL, " \t\n")
From strtok manual - the returned value from strtok is "NULL if there are no more tokens". If strtok returns NULL, means we finished th string - so we return from the function.
If however the number of tokens is lower then MAX_WORDS and we extracted the next valid token, we increase the count *nwords = *nwords + 1;
The caller is left with words initialized with pointers inside line string, the memory behind nwords is initialized with the count of tokens and the line array is modified to have terminating zeros '\0' in place of token delimeters.
Let's re-write the code to be less terse and more readable:
void tokenize(char *line, char **words, int *nwords)
{
*nwords=1;
words[0]=strtok(line," \t\n");
while (*nwords < MAX_WORDS) {
words[*nwords] = strtok(NULL, " \t\n");
if (!words[*nwords])
break;
*nwords = *nwords + 1;
}
}
One thing which also makes this code a bit harder to understand is the fact that it always accesses the number of words indirectly, via the nwords pointer. Here's one more rewrite, without this shorthand:
void tokenize(char *line, char **words, int *nwords)
{
int wordCount = 1;
words[0]=strtok(line," \t\n");
while (wordCount < MAX_WORDS) {
words[wordCount] = strtok(NULL, " \t\n");
if (!words[wordCount])
break;
wordCount = wordCount + 1;
}
*nwords = wordCount;
}
Finally, for a pointer p, testing !p is the same as testing p == NULL. So the check if (!words[wordCount]) means "if the currently last element in words is a null pointer." That can happen when strtok returns a null pointer, indicating it has finished parsing.
Hopefully, it's a bit clearer now.
In general, the function uses strtok to extract words from line and store them into successive elements of the array words, with the number of words stored returned in nwords.
It will repeatedly extract one word, store it, and increment the word count. This continues until either:
MAX_WORDS are extracted, or
strtok returns a null pointer, meaning there are no more words left in line.

How do I split elements in an array of unsigned char

I have the following:
unsigned char input[];
unsigned char *text = &input[];
I'm taking in user input as follows:
do {
printf ("Please enter an numeric message terminated by -1:\n");
fgets(input, sizeof(input), stdin);
}
while (input[0] == '\n')
Since my output will give me an array of individual characters, how
can I go about concatenating them. If I enter input such as:
14 156 23 72 122
when I try to work with it, it's breaking it into:
1 4 1 5 6 ...
In other words, when I want to pass it to a function as an unsigned char,
I want to pass '14', so the function can read the binary of 14, rather than
1, then 4, etc. Any help would be greatly appreciated!
As it stands now, your code does not compile.
You cannot declare these variables like this:
unsigned char input[];
unsigned char *text = &input[];
You need to say how big input is supposed to be. I'm not sure what you're doing with your second definition.
You also need to put a semicolon after this line
while (input[0] == '\n')
All of that aside, if the input is separated by a know delimiter, you could use strtok() instead of reading the string byte by byte.
I scrapped your program because it didn't compile. This is what I assume you're trying to do with your code, adjust accordingly:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
/*
* Converts "input" separated by "delims" to an array of "numbers"
*/
size_t str_to_nums(const char* input, const char* delims, int* numbers, size_t numsize)
{
char* parsed = malloc(strlen(input) + 1); /* allocate memory for a string to tokenize */
char* tok; /* the current token */
size_t curr; /* the current index in the numbers array */
strcpy(parsed, input); /* copy the string so we don't modify the original */
curr = 0;
tok = strtok(parsed, delims);
while(tok != NULL && curr < numsize) { /* tokenize until NULL or we exceed the buffer size */
numbers[curr++] = atoi(tok); /* convert token to integer */
tok = strtok(NULL, delims); /* get the next token */
}
free(parsed);
return curr; /* return the number of tokens parsed */
}
int main(void)
{
char input[256];
int numbers[64];
size_t count, i;
puts("Please enter an numeric message terminated by -1:");
fgets(input, sizeof(input), stdin);
count = str_to_nums(input, " ", numbers, sizeof(numbers)/sizeof(*numbers)); /* string is separated by space */
for(i = 0; i < count; ++i) {
printf("%d\n", numbers[i]); /* show the results */
}
}
P.s. This is not concatenation. The phrase your looking for is "string splitting" or "tokenizing".
Try this!
#include <string.h>
#include <stdio.h>
int main()
{
char *line;
char *token;
scanf(" %[^\n]s",line);
/* get the first token */
token = strtok(line, " ");
/* walk through other tokens */
while( token != NULL )
{
printf( " %s\n", token );
token = strtok(NULL, " ");
}
return(0);
}

Resources