Split a string into double pointer in C

Split a string into double pointer in C - c

i am trying to convert a string (example: "hey there mister") into a double pointer that's pointing to every word in the sentence.
so: split_string->|pointer1|pointer2|pointer3| where pointer1->"hey", pointer2->"there" and pointer3->"mister".
char **split(char *s) {
char **nystreng = malloc(strlen(s));
char str[strlen(s)];
int i;
for(i = 0; i < strlen(s); i++){
str[i] = s[i];
}
char *temp;
temp = strtok(str, " ");
int teller = 0;
while(temp != NULL){
printf("%s\n", temp);
nystreng[teller] = temp;
temp = strtok(NULL, " ");
}
nystreng[teller++] = NULL;
//free(nystreng);
return nystreng;
}
My question is, why isnt this working?

Your code has multiple problems. Among them:
char **nystreng = malloc(strlen(s)); is just wrong. The amount of space you need is the size of a char * times the number pieces into which the string will be split plus one (for the NULL pointer terminator).
You fill *nystreng with pointers obtained from strtok() operating on local array str. Those pointers are valid only for the lifetime of str, which ends when the function returns.
You do not allocate space for a string terminator in str, and you do not write one, yet you pass it to strtok() as if it were a terminated string.
You do not increment teller inside your tokenization loop, so each token pointer overwrites the previous one.
You have an essential problem here in that you do not know before splitting the string how many pieces there will be. You could nevertheless get an upper bound on that by counting the number of delimiter characters and adding 1. You could then allocate space for that many char pointers plus one. Alternatively, you could build a linked list to handle the pieces as you tokenize, then allocate the result array only after you know how many pieces there are.
As for str, if you want to return pointers into it, as apparently you do, then it needs to be dynamically allocated, too. If your platform provides strdup() then you could just use
char *str = strdup(s);
Otherwise, you'll need to check the length, allocate enough space with malloc() (including space for the terminator), and copy the input string into the allocated space, presumably with strcpy(). Normally you would want to free the string afterward, but you must not do that if you are returning pointers into that space.
On the other hand, you might consider returning an array of strings that can be individually freed. For that, you must allocate each substring individually (strdup() would again be your friend if you have it), and in that event you would want to free the working space (or allow it to be cleaned up automatically if you use a VLA).

There are two things you need to do -
char str[strlen(s)]; //size should be equal to strlen(s)+1
Extra 1 for '\0'. Right now you pass str (not terminated with '\0') to strtok which causes undefined behaviour .
And second thing ,you also need allocate memory to each pointer of nystring and then use strcpy instead of pointing to temp(don't forget space for nul terminator).

Related

C Turning a dynamic char array into an array of char arrays

I'm working on a function, that has to take a dynamic char array, separate it at spaces, and put each word in an array of char arrays. Here's the code:
char** parse_cmdline(const char *cmdline)
{
char** arguments = (char**)malloc(sizeof(char));
char* buffer;
int lineCount = 0, strCount = 0, argCount = 0;
int spaceBegin = 0;
while((cmdline[lineCount] != '\n'))
{
if(cmdline[lineCount] == ' ')
{
argCount++;
arguments[argCount] = (char*)malloc(sizeof(char));
strCount = 0;
}
else
{
buffer = realloc(arguments[argCount], strCount + 1);
arguments[argCount] = buffer;
arguments[argCount][strCount] = cmdline[lineCount];
strCount++;
}
lineCount++;
}
arguments[argCount] = '\0';
free(buffer);
return arguments;
}
The problem is that somewhere along the way I get a Segmentation fault and I don't exacly know where.
Also, this current version of the function assumes that the string does not begin with a space, that is for the next version, i can handle that, but i can't find the reason for the seg. fault

This code is surely not what you intended:
char** arguments = (char**)malloc(sizeof(char));
It allocates a block of memory large enough for one char, and sets a variable of type char ** (arguments) to point to it. But even if you wanted only enough space in arguments for a single char *, what you have allocated is not enough (not on any C system you're likely to meet, anyway). It is certainly not long enough for multiple pointers.
Supposing that pointers are indeed wider than single chars on your C system, your program invokes undefined behavior as soon as it dereferences arguments. A segmentation fault is one of the more likely results.
The simplest way forward is probably to scan the input string twice: once to count the number of individual arguments there are, so that you can allocate enough space for the pointers, and again to create the individual argument strings and record pointers to them in your array.
Note, too, that the return value does not carry any accessible information about how much space was allocated, or, therefore, how many argument strings you extracted. The usual approach to this kind of problem is to allocate space for one additional pointer, and to set that last pointer to NULL as a sentinel. This is much akin to, but not the same as, using a null char to mark the end of a C string.
Edited to add:
The allocation you want for arguments is something more like this:
arguments = malloc(sizeof(*arguments) * (argument_count + 1));
That is, allocate space for one more object than there are arguments, with each object the size of the type of thing that arguments is intended to point at. The value of arguments is not accessed by sizeof, so it doesn't matter that it is indeterminate at that point.
Edited to add:
The free() call at the end is also problematic:
free(buffer);
At that point, variable buffer points to the same allocated block as the last element of arguments points to (or is intended to point to). If you free it then all pointers to that memory are invalidated, including the one you are about to return to the caller. You don't need to free buffer at that point any more than you needed to free it after any of the other allocations.

This is probably why you have a segmentation fault:
In char** arguments = (char**)malloc(sizeof(char));, you have used malloc (sizeof (char)), this allocates space for only a single byte (enough space for one char). This is not enough to hold a single char* in arguments.
But even if it was in some system, so arguments[argCount] is only reading allocated memory for argCount = 0. For other values of argCount, the array index is out of bounds - leading to a segmentation fault.
For example, if your input string is something like this - "Hi. How are you doing?", then it has 4 ' ' characters before \n is reached, and the value of argCount will go up till 3.

What you want to do is somthing like this:
char** parse_cmdline( const char *cmdline )
{
Allocate your array of argument pointers with length for 1 pointer and init it with 0.
char** arguments = malloc( sizeof(char*) );
arguments[0] = NULL;
Set a char* pointer to the first char in yor command line and remember the
beginn of the first argument
int argCount = 0, len = 0;
const char *argStart = cmdline;
const char *actPos = argStart;
Continue until end of command line reached.
If you find a blank you have a new argument which consist of th characters between argStart and actPos . Allocate and copy argument from command line.
while( *actPos != '\n' && *actPos != '\0' )
{
if( cmdline[lineCount] == ' ' && actPos > argStart )
{
argCount++; // increment number of arguments
arguments = realloc( arguments, (argCount+1) * sizeof(char*) ); // allocate argCount + 1 (NULL at end of list of arguments)
arguments[argCount] = NULL; // list of arguments ends with NULL
len = actPos - argStart;
arguments[argCount-1] = malloc( len+1 ); // allocate number of characters + '\0'
memcpy( arguments[argCount-1], actPos, len ); // copy characters of argument
arguments[argCount-1] = 0; // set '\0' at end of argument string
argStart = actPos + 1; // next argument starts after blank
}
actPos++;
}
return arguments;
}

some suggestions i would give is, before calling malloc, you might want to first count the number of words you have. then call malloc as char ** charArray = malloc(arguments*sizeof(char*));. This will be the space for the char ** charArray. Then each element in charArray should be malloced by the size of the word you are trying to store in that element. Then you may store that word inside that index.
Ex. *charArray = malloc(sizeof(word)); Then you can store it as **charArray = word;
Be careful with pointer arithmetic however.
The segmentation fault is definitly arising from you trying to access an element in an array in an undefined space. Which arises from you not mallocing space correctly for the array.

strcat (s1, s2) continues to apparent to my temp variable array

Newbie to programming (school) and I'm a little confused on what/why this is happening.
I have a loop that is iterating over an array of elements, for each element I am taking the integer of the array, converting it to a char using the function getelementsymbol, and using strcat to append to my temp array. The problem I am having is that the elements of my temp array contain the residual of the element proceeding it. This is the snippet of my code. The output I receive is this:
word1
word1word2
word1word2word3
char* elementsBuildWord(const int symbols[], int nbSymbols){
/* ROLE takes a list of elements' atomic numbers and allocate a new string made
of the symbols of each of these elements
PARAMETERS symbols an array of nbSymbols int which each represent the atomic number
of an element
nbSymbols symbols array's size
RETURN VALUE NULL if the array is of size <= 0
or if one of the symbols is not found by our getElementSymbol function
other the address of a newly allocated string representing the concatenation
of the names of all symbols
*/
char s1[MAX_GENERATED_WORD_LENGTH];
int y;
char *s2;
size_t i;
for (i = 0; i < nbSymbols; i++){
y = symbols[i];
s2 = getElementSymbol(y);
strcat(s1, s2);
}
printf("%s ", s1);
}

Firstly, your s1 is not initialized. strcat function append a new string to an existing string. This means that your s1 has to be a string from the very beginning. An uninitialized char array is not a string. A good idea would be to declare your s1 as
char s1[MAX_GENERATED_WORD_LENGTH] = { 0 };
or at least do
s1[0] = '\0';
before starting your cycle.
Secondly, your getElementSymbol function returns a char * pointer. Where does that pointer point to? Who manages the memory it points to? This is non-obvious from your code. It is possible that the function returns an invalid pointer (like a pointer to a local buffer), which is why might see various anomalies. There's no way to say without seeing how it is implemented.

strcat is supposed to append to a string. use strcpy if you want to overwrite the existing string. You could also use s1[0] = '\0'; before strcat to "blank" the string if you really want to, but looks like you really want strcpy.
From the snippet above it's not even clear why you need s1 - you could just print s2...

strcat problem with char *a[10]

include
#include <string.h>
int main()
{
char *array[10]={};
char* token;
token = "testing";
array[0] = "again";
strcat(array[0], token);
}
why it returns Segmentation fault?
I'm a little confused.

Technically, this isn't valid C. (It is valid C++, though.)
char *array[10]={};
You should use
char *array[10] = {0};
This declares an array of 10 pointers to char and initializes them all to null pointers.
char* token;
token = "testing";
This declares token as a pointer to char and points it at a string literal which is non-modifiable.
array[0] = "again";
This points the first char pointer of array at a string literal which (again) is a non-modifiable sequence of char.
strcat(array[0], token);
strcat concatenates one string onto the end of another string. For it to work the first string must be contained in writeable storage and have enough excess storage to contain the second string at and beyond the first terminating null character ('\0') in the first string. Neither of these hold for array[0] which is pointing directly at the string literal.
What you need to do is something like this. (You need to #include <string.h> and <stdlib.h>.)
I've gone for runtime calculation of sizes and dynamic allocation of memory as I'm assuming that you are doing a test for where the strings may not be of known size in the future. With the strings known at compile time you can avoid some (or most) of the work at compile time; but then you may as well do "againtesting" as a single string literal.
char* token = "testing";
char* other_token = "again";
/* Include extra space for string terminator */
size_t required_length = strlen(token) + strlen(other_token) + 1;
/* Dynamically allocated a big enough buffer */
array[0] = malloc( required_length );
strcpy( array[0], other_token );
strcat( array[0], token );
/* More code... */
/* Free allocated buffer */
free( array[0] );

How this works: char *array[10] is an array of 10 char * pointers (basically 10 same things as token).
token = "testing" creates static space somewhere in your program's memory at build time, and puts "testing" there. Then in run time, it puts address to that static "testing" to token.
array[0] = "again" does basically the same thing.
Then, strcat(array[0], token) takes address in array[0], and tries to add token's content to string at that address. Which gives you segfault, since array[0] points to read-only data segment in your memory.
How to do this properly:
char * initial = "first"; // pointer to static "first" string
char * second = "another"; // another one
char string[20]; // local array of 20 bytes
strcpy(string, initial); // copies first string into your read-write memory
strcat(string, second); // adds the second string there
Actually, if you don't want to shoot yourself in the foot, the better way to do something like the last two lines is:
snprintf(string, sizeof(string), "%s%s", initial, second);
snprintf then makes sure that you don't use more than 20 bytes of string. strcat and strcpy would happily go over the limit into invalid memory, and cause another run-time segfault or something worse (think security exploits) if the copied string were longer then the destination space.

To create a array of characters, char *array[10]={}; should instead be char array[10]={};
the segmentation fault occurs because array[0] points to "again", a string literal, and modifying string literals is a no-no(undefined behaviour)

If you're planning on changing the strings involved you should really allocate enough memory for what you need. For example instead of char *token; token = "testing"; you could use, say char token[20] = "testing";, which allows enough room for a 19 character string (plus the null byte at the end).
Similarly, you could use char array[10][20] = {"testing"}; to create an array of 10 strings and set the first one to testing.

You are putting a string at array[0] which is only one character.
Use array[0]='a' like this.

string parsing in C

I'm trying to pass a string to chdir(). But I always seem to have some trailing stuff makes the chdir() fail.
#define IN_LEN 128
int main(int argc, char** argv) {
int counter;
char command[IN_LEN];
char** tokens = (char**) malloc(sizeof(char)*IN_LEN);
size_t path_len; char path[IN_LEN];
...
fgets(command, IN_LEN, stdin)
counter = 0;
tmp = strtok(command, delim);
while(tmp != NULL) {
*(tokens+counter) = tmp;
tmp = strtok(NULL, delim);
counter++;
}
if(strncmp(*tokens, cd_command, strlen(cd_command)) == 0) {
path_len = strlen(*(tokens+1));
strncpy(path, *(tokens+1), path_len-1);
// this is where I try to remove the trailing junk...
// but it doesn't work on a second system
if(chdir(path) < 0) {
error_string = strerror(errno);
fprintf(stderr, "path: %s\n%s\n", path, error_string);
}
// just to check if the chdir worked
char buffer[1000];
printf("%s\n", getcwd(buffer, 1000));
}
return 0;
}
There must be a better way to do this. Can any help out? I'vr tried to use scanf but when the program calls scanf, it just hangs.
Thanks

It looks like you've forgotten to append a null '\0' to path string after calling strncpy(). Without the null terminator chdir() doesn't know where the string ends and it just keeps looking until it finds one. This would make it appear like there are extra characters at the end of your path.

You have (at least) 2 problems in your example.
The first one (which is causing the immediately obvious problems) is the use of strncpy() which doesn't necessarily place a '\0' terminator at the end of the buffer it copies into. In your case there's no need to use strncpy() (which I consider dangerous for exactly the reason you ran into). Your tokens will be '\0' terminated by strtok(), and they are guaranteed to be smaller than the path buffer (since the tokens come from a buffer that's the same size as the path buffer). Just use strcpy(), or if you want the code to be resiliant of someone coming along later and mucking with the buffer sizes use something like the non-standard strlcpy().
As a rule of thumb don't use strncpy().
Another problem with your code is that the tokens allocation isn't right.
char** tokens = (char**) malloc(sizeof(char)*IN_LEN);
will allocate an area as large as your input string buffer, but you're storing pointers to strings in that allocation, not chars. You'll have fewer tokens than characters (by definition), but each token pointer is probably 4 times larger than a character (depending on the platform's pointer size). If your string has enough tokens, you'll overrun this buffer.
For example, assume IN_LEN is 14 and the input string is "a b c d e f g". If you use spaces as the delimiter, there will be 7 tokens, which will require a pointer array with 28 bytes. Quite a few more than the 14 allocated by the malloc() call.
A simple change to:
char** tokens = (char**) malloc((sizeof(char*) * IN_LEN) / 2);
should allocate enough space (is there an off-by-one error in there? Maybe a +1 is needed).
A third problem is that you potentially access *tokens and *(tokens+1) even if zero or only one token was added to the array. You'll need to add some checks of the counter variable before dereferencing those pointers.

strcat() new line, duplicate string

I'm writing a function that gets the path environment variable of a system, splits up each path, then concats on some other extra characters onto the end of each path.
Everything works fine until I use the strcat() function (see code below).
char* prependPath( char* exeName )
{
char* path = getenv("PATH");
char* pathDeepCopy = (char *)malloc(strlen(path) + 1);
char* token[80];
int j, i=0; // used to iterate through array
strcpy(pathDeepCopy, path);
//parse and split
token[0] = strtok(pathDeepCopy, ":"); //get pointer to first token found and store in 0
//place in array
while(token[i]!= NULL) { //ensure a pointer was found
i++;
token[i] = strtok(NULL, ":"); //continue to tokenize the string
}
for(j = 0; j <= i-1; j++) {
strcat(token[j], "/");
//strcat(token[j], exeName);
printf("%s\n", token[j]); //print out all of the tokens
}
}
My shell output is like this (I'm concatenating "/which" onto everything):
...
/usr/local/applic/Maple/bin/which
which/which
/usr/local/applic/opnet/8.1.A.wdmguru/sys/unix/bin/which
which/which
Bus error (core dumped)
I'm wondering why strcat is displaying a new line and then repeating which/which.
I'm also wondering about the Bus error (core dumped) at the end.
Has anyone seen this before when using strcat()?
And if so, anyone know how to fix it?
Thanks

strtok() does not give you a new string.
It mutilates the input string by inserting the char '\0' where the split character was.
So your use of strcat(token[j],"/") will put the '/' character where the '\0' was.
Also the last token will start appending 'which' past the end of your allocated memory into uncharted memory.
You can use strtok() to split a string into chunks. But if you want to append anything onto a token you need to make a copy of the token otherwise what your appending will spill over onto the next token.
Also you need to take more care with your memory allocation you are leaking memory all over the place :-)
PS. If you must use C-Strings. use strdup() to copy the string.
char* prependPath( char* exeName )
{
char* path = getenv("PATH");
char* pathDeepCopy = strdup(path);
char* token[80];
int j, i; // used to iterate through array
token[0] = strtok(pathDeepCopy, ":");
for(i = 0;(token[i] != NULL) && (i < 80);++i)
{
token[i] = strtok(NULL, ":");
}
for(j = 0; j <= i; ++j)
{
char* tmp = (char*)malloc(strlen(token[j]) + 1 + strlen(exeName) + 1);
strcpy(tmp,token[j]);
strcat(tmp,"/");
strcat(tmp,exeName);
printf("%s\n",tmp); //print out all of the tokens
free(tmp);
}
free(pathDeepCopy);
}

strtok does not duplicate the token but instead just points to it within the string. So when you cat '/' onto the end of a token, you're writing a '\0' either over the start of the next token, or past the end of the buffer.
Also note that even if strtok did returning copies of the tokens instead of the originals (which it doesn't), it wouldn't allocate the additional space for you to append characters so it'd still be a buffer overrun bug.

strtok() tokenizes in place. When you start appending characters to the tokens, you're overwriting the next token's data.
Also, in general it's not safe to simply concatenate to an existing string unless you know that the size of the buffer the string is in is large enough to hold the resulting string. This is a major cause of bugs in C programs (including the dreaded buffer overflow security bugs).
So even if strtok() returned brand-new strings unrelated to your original string (which it doesn't), you'd still be overrunning the string buffers when you concatenated to them.
Some safer alternatives to strcpy()/strcat() that you might want to look into (you may need to track down implementations for some of these - they're not all standard):
strncpy() - includes the target buffer size to avoid overruns. Has the drawback of not always terminating the result string
strncat()
strlcpy() - similar to strncpy(), but intended to be simpler to use and more robust (http://en.wikipedia.org/wiki/Strlcat)
strlcat()
strcpy_s() - Microsoft variants of these functions
strncat_s()
And the API you should strive to use if you can use C++: the std::string class. If you use the C++ std::string class, you pretty much do not have to worry about the buffer containing the string - the class manages all of that for you.

OK, first of all, be careful. You are losing memory.
Strtok() returns a pointer to the next token and you are storing it in an array of chars.
Instead of char token[80] it should be char *token.
Be careful also when using strtok. strtok practically destroys the char array called pathDeepCopy because it will replace every occurrence of ":" with '\0'.As Mike F told you above.
Be sure to initialize pathDeppCopy using memset of calloc.
So when you are coding token[i] there is no way of knowing what is being point at.
And as token has no data valid in it, it is likely to throw a core dump because you are trying to concat. a string to another that has no valida data (token).
Perphaps th thing you are looking for is and array of pointers to char in which to store all the pointer to the token that strtok is returnin in which case, token will be like char *token[];
Hope this helps a bit.

If you're using C++, consider boost::tokenizer as discussed over here.
If you're stuck in C, consider using strtok_r because it's re-entrant and thread-safe. Not that you need it in this specific case, but it's a good habit to establish.
Oh, and use strdup to create your duplicate string in one step.

replace that with
strcpy(pathDeepCopy, path);
//parse and split
token[0] = strtok(pathDeepCopy, ":");//get pointer to first token found and store in 0
//place in array
while(token[i]!= NULL) { //ensure a pointer was found
i++;
token[i] = strtok(NULL, ":"); //continue to tokenize the string
}
// use new array for storing the new tokens
// pardon my C lang skills. IT's been a "while" since I wrote device drivers in C.
const int I = i;
const int MAX_SIZE = MAX_PATH;
char ** newTokens = new char [MAX_PATH][I];
for (int k = 0; k < i; ++k) {
sprintf(newTokens[k], "%s%c", token[j], '/');
printf("%s\n", newtoken[j]); //print out all of the tokens
}
this will replace overwriting the contents and prevent the core dump.

and don't forget to check if malloc returns NULL!

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight