How to store each sentence as an element of an array? - c

So, suppose I have an array (program asks me to write some text):
char sentences[] = "The first sentence.The second sentence.The third sentence";
And I need to store each sentence as an array, where I can have access to any word, or to store the sentences in a single array as elements.
(sentences[0] = "The first sentence"; sentences[1] = "The second sentence";)
How to print out each sentence separately I know:
char* sentence_1 = strtok(sentences, ".");
char* sentence_2 = strtok(NULL, ".");
char* sentence_3 = strtok(NULL, ".");
printf("#1 %s\n", sentence_1);
printf("#2 %s\n", sentence_2);
printf("#3 %s\n", sentence_3);
But how to make program store those sentences in 1 or 3 arrays I have no idea.
Please, help!

If you keep it in the main, since your sentences memory is static (cannot be deleted) you can simply do that:
#include <string.h>
#include <stdio.h>
int main()
{
char sentences[] = "The first sentence.The second sentence.The third sentence";
char* sentence[3];
unsigned int i;
sentence[0] = strtok(sentences, ".");
for (i=1;i<sizeof(sentence)/sizeof(sentence[0]);i++)
{
sentence[i] = strtok(NULL, ".");
}
for (i=0;i<sizeof(sentence)/sizeof(sentence[0]);i++)
{
printf("%d: %s\n",i,sentence[i]);
}
return 0;
}
In the general case, you first have to duplicate your input string:
char *sentences_dup = strdup(sentences);
sentence[0] = strtok(sentences_dup, ".");
many reasons for that:
you don't know the lifespan/scope of the input, and it is generally a pointer/a parameter, so your sentences could be invalid as soon as the input memory is freed/goes out of scope
the passed buffer may be const: you cannot modify its memory (strtok modifies the passed buffer)
change sentences[] by *sentences in the example above and you're pointing on a read-only zone: you have to make a copy of the buffer.
Don't forget to store the duplicated pointer, because you may need to free it at some point.
Another alternative is to also duplicate there:
for (i=1;i<sizeof(sentence)/sizeof(sentence[0]);i++)
{
sentence[i] = strdup(strtok(NULL, "."));
}
so you can free your big tokenized string at once, and the sentences have their own, independent memory.
EDIT: the remaining problem here is that you still have to know in advance how many sentences there are in your input.
For that, you could count the dots, and then allocate the proper number of pointers.
int j,nb_dots=0;
char pathsep = '.';
int nb_sentences;
int len = strlen(sentences);
char** sentence;
// first count how many dots we have
for (j=0;j<len;j++)
{
if (sentences[j]==pathsep)
{
nb_dots++;
}
}
nb_sentences = nb_dots+1; // one more!!
// allocate the array of strings
sentence=malloc((nb_sentences) * sizeof(*sentence));
now that we have the number of strings, we can perform our strtok loop. Just be careful of using nb_sentences and not sizeof(sentence)/sizeof(sentence[0]) which is now irrelevant (worth 1) because of the change of array type.
But at this point you could also get rid of strtok completely like proposed in another answer of mine

Related

Can't understand the difference between declaring a pointer char *str as str/&str?Whats the difference and what does it do?

I will say honestly, this isn't my code. It's my brother's who's studying with me but he's a ahead of me.
Please notice char *str and char *resultString in the function char *replaceWord().
/*Suppose you have a template letter.txt. You have to fill in values to a template. Letter.txt looks something like this:
Thanks {{name}} for purchasing {{item}} from our outlet {{outlet}}. Please visit our outlet {{outlet}} for any kind of problems. We plan to serve you again soon.
You have to write a program that will automatically fill the template.For this, read this file and replace these values:
{{name}} - Harry
{{item}} - Table Fan
{{outlet}} - Ram Laxmi fan outlet
Use file functions in c to accomplish the same.*/
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
char * replaceWord(const char * str, const char * oldWord, const char * newWord)
{
char * resultString;
int i, count = 0;
int newWordLength = strlen(newWord);
int oldWordLength = strlen(oldWord);
for (i = 0; str[i] != '\0'; i++)
{
if (strstr(&str[i], oldWord) == &str[i])
{
count++;
//Jumping over the word and continuing
i = i + oldWordLength - 1;
}
}
//dynamically allocation memory to resultString since it can be big or samll depending on the size of the newWord.
/*i = old string length , count = no. of times the word appeared in the string,
newWordLength-oldWordLength=difference between the new word and the old word
+1 for the null character '\0'
Basically we are saying that add the size required for the newWord to the strings length i.e i;
*/
resultString = (char *)malloc(i + count * (newWordLength - oldWordLength) + 1);
i = 0; //refreshing the i for the while loop
while (*str)
{
if (strstr(str, oldWord) == str)
{
strcpy(&resultString[i], newWord);
i += newWordLength;
str += oldWordLength;
}
else
{
resultString[i] = *str;
i+=1;
str+=1;
}
}
resultString[i] = '\0';
return resultString;
}
int main()
{
FILE *ptr = NULL;
FILE *ptr2 = NULL;
ptr = fopen("letter.txt", "r"); //where the template is stored
ptr2 = fopen("newLetter.txt", "w"); //where the new bill will be stored.
char str[200];
fgets(str, 200, ptr); //store the bill template in the str variable.
printf("The original bill template is : %s\n", str);
//Calling the replacing fucntion
char *newStr = str; //newStr will store the new bill i.e generated
newStr = replaceWord(str, "{{name}}", "Mary");
newStr = replaceWord(newStr, "{{item}}", "Waffle Machine");
newStr = replaceWord(newStr, "{{outlet}}", "Belgium Waffle");
printf("\nThe bill generated is:\n%s", newStr);
fprintf(ptr2, "%s", newStr);
fclose(ptr);
fclose(ptr2);
return 0;
}
Can someone explain why the pointer *str and *resultString are expressed different ways in the program and what are they doing? Sometimes it's *str, &str or str[i].
Please explain.
I know that a pointer is used to keep the address of the other variables but this code is still a mystery to me.
Also why was the function a pointer?
NOTE:"He said that's how it works" when I asked how.
Please help!! I can't focus on other things.
If you can't explain ;a link of explanation would be fine as well.
Sometimes it's *str, &str or str[i]
Those are operators.
*str
str is a poitner to a char, and having a * over it dereferences it. Meaning it fetches the value from the memory that it is pointing to. A pointer may not always point to a variable though, it can be any arbitrary memory address. But dereferencing memory that is not yours will result in Segmentation fault which is the my most beloved error that occurs almost everytime when processing arrays.
str[i]
This is the same as *(str + i). Meaning it increments the memory address by i * sizeof(<datatype of what str points to>). Then it fetches the value from that incremented address. This is used for getting elements of an array.
&str
This just given the address of the variable str, which is a pointer. So, it returns a pointer to a pointer(ie. str). A pointer to a pointer can exist.
The function is not a pointer. Instead, it returns a pointer which is *resultString. It is so that a string can be returned. The string had been initialized in this line:
resultString = (char *)malloc(i + count * (newWordLength - oldWordLength) + 1);
The comment explaining this is not complete.
//dynamically allocation memory to resultString since it can be big or samll depending on the size of the newWord.
/*i = old string length , count = no. of times the word appeared in the string,
newWordLength-oldWordLength=difference between the new word and the old word
+1 for the null character '\0'
Basically we are saying that add the size required for the newWord to the strings length i.e i;
*/
It also misses one key reason why malloc is being used instead of normal allocation. malloc allocates your variables on the heap which is shared among all functions and threads. While normal initialization would allocate it on the stack which is popped off when the function ends. So, no use after the function with the stack, so it should be used on the heap. And it is also for dynamic allocation.

How does the compiler allocate memory for an array of strings in C?

I typed up this block of code for an assignment:
char *tokens[10];
void parse(char* input);
void main(void)
{
char input[] = "Parse this please.";
parse(input);
for(int i = 2; i >= 0; i--) {
printf("%s ", tokens[i]);
}
}
void parse(char* input)
{
int i = 0;
tokens[i] = strtok(input, " ");
while(tokens[i] != NULL) {
i++;
tokens[i] = strtok(NULL, " ");
}
}
But, looking at it, I'm not sure how the memory allocation works. I didn't define the length of the individual strings as far as I know, just how many strings are in the string array tokens (10). Do I have this backwards? If not, then is the compiler allocating the length of each string dynamically? In need of some clarification.
strtok is a bad citizen.
For one thing, it retains state, as you've implicitly used when you call strtok(NULL,...) -- this state is stored in the private memory of the Standard C Library, which means only single threaded programs can use strtok. Note that there is a reentrant version called strtok_r in some libraries.
For another, and to answer your question, strtok modifies its input. It doesn't allocate space for the strings; it writes NUL characters in place of your delimiter in the input string, and returns a pointer into the input string.
You are correct that strtok can return more than 10 results. You should check for that in your code so you don't write beyond the end of tokens. A reliable program would either set an upper limit, like your 10, and check for it, reporting an error if it's exceeded, or dynamically allocate the tokens array with malloc, and realloc it if it gets too big. Then the error occurs when you fun out of memory.
Note that you can also work around the problem of strtok modifying your input string by strduping before passing it to strtok. Then you'll have to free the new string after both it and tokens, which points to it, are going out of scope.
tokens is an array of pointers.
The distinction between strings and pointers if often fuzzy. In some situations strings are better thought out as arrays, in other situations as pointers.
Anyway... in your example input is an array and tokens is an array of pointers to a place within input.
The data inside input is changed with each call to strtok()
So, step by step
// input[] = "foo bar baz";
tokens[0] = strtok(input, " ");
// input[] = "foo\0bar baz";
// ^-- tokens[0] points here
tokens[1] = strtok(NULL, " ");
// input[] = "foo\0bar\0baz";
// ^-- tokens[1] points here
tokens[2] = strtok(NULL, " ");
// input[] = "foo\0bar\0baz";
// ^-- tokens[2] points here
// next strtok returns NULL

Weird output from strtok

I was having some issues dealing with char*'s from an array of char*'s and used this for reference: Splitting C char array into words
So what I'm trying to do is read in char arrays and split them with a space delimiter so I can do stuff with it. For example if the first token in my char* is "Dog" I would send it to a different function that dealt with dogs. My problem is that I'm getting a strange output.
For example:
INPUT: *cmd = "Dog needs a vet appointment."
OUTPUT: (from print statements) "Doneeds a vet appntment."
I've checked for memory leaks using valgrind and I have none of them or other errors.
void parseCmd(char* cmd){ //passing in an individual char* from a char**
char** p_args = calloc(100, sizeof(char*));
int i = 0;
char* token;
token = strtok(cmd, " ");
while (token != NULL){
p_args[i++] = token;
printf("%s",token); //trying to debug
token = strtok(NULL, cmd);
}
free(p_args);
}
Any advice? I am new to C so please bear with me if I did something stupid. Thank you.
In your case,
token = strtok(NULL, cmd);
is not what you should be doing. You instead need:
token = strtok(NULL, " ");
As per the ISO standard:
char *strtok(char * restrict s1, const char * restrict s2);
A sequence of calls to the strtok function breaks the string pointed to by s1 into a sequence of tokens, each of which is delimited by a character from the string pointed to by s2.
The only difference between the first and subsequent calls (assuming, as per this case, you want the same delimiters) should be using NULL as the input string rather than the actual string. By using the input string as the delimiter list in subsequent calls, you change the behaviour.
You can see exactly what's happening if you try the following code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void parseCmd(char* cmd) {
char* token = strtok(cmd, " ");
while (token != NULL) {
printf("[%s] [%s]\n", cmd, token);
token = strtok(NULL, cmd);
}
}
int main(void) {
char x[] = "Dog needs a vet appointment.";
parseCmd(x);
return 0;
}
which outputs (first column will be search string to use next iteration, second is result of this iteration):
[Dog] [Dog]
[Dog] [needs a vet app]
[Dog] [intment.]
The first step worked fine since you were using space as the delimiter and it modified the string by placing a \0 at the end of Dog.
That means the next attempt (with the wrong spearator) would use one of the letters from {D,o,g} to split. The first matching letter for that set is the o in appointment which is why you see needs a vet app. The third attempt finds none of the candidate letters so you just get back the rest of the string, intment..
token = strtok(NULL, cmd); should be token = strtok(NULL, " ");.
The second argument is for delimiter.
http://man7.org/linux/man-pages/man3/strtok.3.html

C String parsing errors with strtok(),strcasecmp()

So I'm new to C and the whole string manipulation thing, but I can't seem to get strtok() to work. It seems everywhere everyone has the same template for strtok being:
char* tok = strtok(source,delim);
do
{
{code}
tok=strtok(NULL,delim);
}while(tok!=NULL);
So I try to do this with the delimiter being the space key, and it seems that strtok() no only reads NULL after the first run (the first entry into the while/do-while) no matter how big the string, but it also seems to wreck the source, turning the source string into the same thing as tok.
Here is a snippet of my code:
char* str;
scanf("%ms",&str);
char* copy = malloc(sizeof(str));
strcpy(copy,str);
char* tok = strtok(copy," ");
if(strcasecmp(tok,"insert"))
{
printf(str);
printf(copy);
printf(tok);
}
Then, here is some output for the input "insert a b c d e f g"
aaabbbcccdddeeefffggg
"Insert" seems to disappear completely, which I think is the fault of strcasecmp(). Also, I would like to note that I realize strcasecmp() seems to all-lower-case my source string, and I do not mind. Anyhoo, input "insert insert insert" yields absolutely nothing in output. It's as if those functions just eat up the word "insert" no matter how many times it is present. I may* end up just using some of the C functions that read the string char by char but I would like to avoid this if possible. Thanks a million guys, i appreciate the help.
With the second snippet of code you have five problems: The first is that your format for the scanf function is non-standard, what's the 'm' supposed to do? (See e.g. here for a good reference of the standard function.)
The second problem is that you use the address-of operator on a pointer, which means that you pass a pointer to a pointer to a char (e.g. char**) to the scanf function. As you know, the scanf function want its arguments as pointers, but since strings (either in pointer to character form, or array form) already are pointer you don't have to use the address-of operator for string arguments.
The third problem, once you fix the previous problem, is that the pointer str is uninitialized. You have to remember that uninitialized local variables are truly uninitialized, and their values are indeterminate. In reality, it means that their values will be seemingly random. So str will point to some "random" memory.
The fourth problem is with the malloc call, where you use the sizeof operator on a pointer. This will return the size of the pointer and not what it points to.
The fifth problem, is that when you do strtok on the pointer copy the contents of the memory pointed to by copy is uninitialized. You allocate memory for it (typically 4 or 8 bytes depending on you're on a 32 or 64 bit platform, see the fourth problem) but you never initialize it.
So, five problems in only four lines of code. That's pretty good! ;)
It looks like you're trying to print space delimited tokens following the word "insert" 3 times. Does this do what you want?
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
char str[BUFSIZ] = {0};
char *copy;
char *tok;
int i;
// safely read a string and chop off any trailing newline
if(fgets(str, sizeof(str), stdin)) {
int n = strlen(str);
if(n && str[n-1] == '\n')
str[n-1] = '\0';
}
// copy the string so we can trash it with strtok
copy = strdup(str);
// look for the first space-delimited token
tok = strtok(copy, " ");
// check that we found a token and that it is equal to "insert"
if(tok && strcasecmp(tok, "insert") == 0) {
// iterate over all remaining space-delimited tokens
while((tok = strtok(NULL, " "))) {
// print the token 3 times
for(i = 0; i < 3; i++) {
fputs(tok, stdout);
}
}
putchar('\n');
}
free(copy);
return 0;
}

Double pointer doubts in C

I have written a program in which in the main function I declare an array of pointers and then I call a function which splits a given sentence and then want to assign it to the array of pointers in main(). I am unable to do. Can you please check the code pasted below:
int main(void)
{
char *data[3];
allocate(data);
/* Unable to print the strings here */
printf("Main is %s\n", data[0] );
printf(""
}
void allocate(char **dt)
{
int i;
char buf[] = "The great Scorpion";
char delims[] = " ";
size_t len;
char *p;
char *result = NULL;
result = strtok(buf," ");
*dt = result;
int j = 1;
while(result!=NULL)
{
result = strtok( NULL, delims );
dt[j]=result;
j++;
}
/* able to print values here */
printf( "result is %s\n", dt[0]);
printf( "result is %s\n", dt[1] );
printf( "result is %s\n", dt[2] );
}
Can anyone please help me out?
strtok does not allocate new strings, it returns a pointer to an existing string (and substitutes delimiters with null characters in place). So in allocate, you fill dt with pointers into buf. Since buf is an automatic variable, its lifetime ends when allocate returns, and all pointers in dt are invalidated.
If I remember correctly, strtok() doesn't do dynamic allocation, it actually modifies the string that you pass to in the first time you call it. So, in this case, it modifies buf. So dt is an array of pointers into buf. And when you exit the function, buf is destroyed.
Actually you might just add static to your buf declaration.
Ok,
You have an allocate function to which you pass the return pointer array.
Inside the function you allocate a string on the stack, and,
Make the strtok return pointers to this stack area outside the function
At that point you have 'dangling' pointers to the string -- effectively
Then you call printf which kill the data in unallocated stack
you miss the strings in printf.
If you want to do this, the correct way would be to
really allocate strings in your allocate function and
then free them from main after you are done with them.
The older way to work with strtok was to use strdup in the allocate and free later.
OldStuff...
I think you need to pass the &data to your allocate function.
Either that, or i am not all awake yet.
As a small aside (that won't actually have an effect on your program), you loop is structured wrong.
int j = 1;
while(result!=NULL)
{
result = strtok( NULL, delims );
dt[j]=result;
j++;
}
You set dt[0], dt[1], and dt[2], correctly. However due the the nature of your loop (you check, call strtok, then insert into dt,) you're also assigning NULL into dt[3]. This would probably run fine in this trivial example, but you're probably trashing your stack. You should really structure your loop like
result = strtok(buf," ");
int j=0
while(result != NULL)
{
dt[j]=result;
j++;
result = strtok(NULL, delims);
}
You're trying to return the address of a local, non-static variable; once the allocate() function exits, the buf array no longer exists, so the pointers in your data array are no longer pointing to anything meaningful.
What you need to do is save a copy of the token, rather than just a pointer to it:
void allocate(char **dt)
{
char buf[] = "The Great Scorpion";
char delim[] = " ";
size_t i = 0;
char *result = strtok(buf, delim);
while (result)
{
dt[i] = malloc(strlen(result) + 1);
if (dt[i])
{
strcpy(dt[i++], result);
}
result = strtok(NULL, delim);
}
}
int main(void)
{
char *data[3];
allocate(data);
...
}
Alternately, you can define data to be a 2D array of char, rather than just an array of pointers, but you need to make sure it's sized to handle the maximum string length; also, the type passed to allocate changes:
#define STRING_SIZE ... /* large enough for longest string */
void allocate(char (*dt)[STRING_SIZE+1])
{
char buf[] = "The Great Scorpion";
char delim[] = " ";
size_t i = 0;
char *result = strtok(buf, delim);
while (result)
{
strcpy(dt[i++], result);
result = strtok(NULL, delim);
}
}
int main(void)
{
char data[3][STRING_SIZE];
allocate(data);
...
}
Dynamically allocating memory is more flexible, but requires some bookkeeping and you have to remember to free it when you're done with it.
To not trash the stack, the while loop should be guarded with the size of the passed parameter "char *data[3]" which is 3 in this case, i.e extra parameter.
But i agree adding static to "static char buf[] = ...." is the quickest answer to get out of pointing to the de-allocated stack of a terminated function.
You haven't allocated the memory to be used inside data[X], just allocated data.
data[0] == null pointer

Resources