C: creating array of strings from delimited source string - c

What would be an efficient way of converting a delimited string into an array of strings in C (not C++)? For example, I might have:
char *input = "valgrind --leak-check=yes --track-origins=yes ./a.out"
The source string will always have only a single space as the delimiter. And I would like a malloc'ed array of malloc'ed strings char *myarray[] such that:
myarray[0]=="valgrind"
myarray[1]=="--leak-check=yes"
...
Edit I have to assume that there are an arbitrary number of tokens in the inputString so I can't just limit it to 10 or something.
I've attempted a messy solution with strtok and a linked list I've implemented, but valgrind complained so much that I gave up.
(If you're wondering, this is for a basic Unix shell I'm trying to write.)

What's about something like:
char* string = "valgrind --leak-check=yes --track-origins=yes ./a.out";
char** args = (char**)malloc(MAX_ARGS*sizeof(char*));
memset(args, 0, sizeof(char*)*MAX_ARGS);
char* curToken = strtok(string, " \t");
for (int i = 0; curToken != NULL; ++i)
{
args[i] = strdup(curToken);
curToken = strtok(NULL, " \t");
}

if you have all of the input in input to begin with then you can never have more tokens than strlen(input). If you don't allow "" as a token, then you can never have more than strlen(input)/2 tokens. So unless input is huge you can safely write.
char ** myarray = malloc( (strlen(input)/2) * sizeof(char*) );
int NumActualTokens = 0;
while (char * pToken = get_token_copy(input))
{
myarray[++NumActualTokens] = pToken;
input = skip_token(input);
}
char ** myarray = (char**) realloc(myarray, NumActualTokens * sizeof(char*));
As a further optimization, you can keep input around and just replace spaces with \0 and put pointers into the input buffer into myarray[]. No need for a separate malloc for each token unless for some reason you need to free them individually.

Were you remembering to malloc an extra byte for the terminating null that marks the end of string?

From the strsep(3) manpage on OSX:
char **ap, *argv[10], *inputstring;
for (ap = argv; (*ap = strsep(&inputstring, " \t")) != NULL;)
if (**ap != '\0')
if (++ap >= &argv[10])
break;
Edited for arbitrary # of tokens:
char **ap, **argv, *inputstring;
int arglen = 10;
argv = calloc(arglen, sizeof(char*));
for (ap = argv; (*ap = strsep(&inputstring, " \t")) != NULL;)
if (**ap != '\0')
if (++ap >= &argv[arglen])
{
arglen += 10;
argv = realloc(argv, arglen);
ap = &argv[arglen-10];
}
Or something close to that. The above may not work, but if not it's not far off. Building a linked list would be more efficient than continually calling realloc, but that's really besides the point - the point is how best to make use of strsep.

Looking at the other answers, for a beginner in C, it would look complex due to the tight size of code, I thought I would put this in for a beginner, it might be easier to actually parse the string instead of using strtok...something like this:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
char **parseInput(const char *str, int *nLen);
void resizeptr(char ***, int nLen);
int main(int argc, char **argv){
int maxLen = 0;
int i = 0;
char **ptr = NULL;
char *str = "valgrind --leak-check=yes --track-origins=yes ./a.out";
ptr = parseInput(str, &maxLen);
if (!ptr) printf("Error!\n");
else{
for (i = 0; i < maxLen; i++) printf("%s\n", ptr[i]);
}
for (i = 0; i < maxLen; i++) free(ptr[i]);
free(ptr);
return 0;
}
char **parseInput(const char *str, int *Index){
char **pStr = NULL;
char *ptr = (char *)str;
int charPos = 0, indx = 0;
while (ptr++ && *ptr){
if (!isspace(*ptr) && *ptr) charPos++;
else{
resizeptr(&ptr, ++indx);
pStr[indx-1] = (char *)malloc(((charPos+1) * sizeof(char))+1);
if (!pStr[indx-1]) return NULL;
strncpy(pStr[indx-1], ptr - (charPos+1), charPos+1);
pStr[indx-1][charPos+1]='\0';
charPos = 0;
}
}
if (charPos > 0){
resizeptr(&pStr, ++indx);
pStr[indx-1] = (char *)malloc(((charPos+1) * sizeof(char))+1);
if (!pStr[indx-1]) return NULL;
strncpy(pStr[indx-1], ptr - (charPos+1), charPos+1);
pStr[indx-1][charPos+1]='\0';
}
*Index = indx;
return (char **)pStr;
}
void resizeptr(char ***ptr, int nLen){
if (*(ptr) == (char **)NULL){
*(ptr) = (char **)malloc(nLen * sizeof(char*));
if (!*(ptr)) perror("error!");
}else{
char **tmp = (char **)realloc(*(ptr),nLen);
if (!tmp) perror("error!");
*(ptr) = tmp;
}
}
I slightly modified the code to make it easier. The only string function that I used was strncpy..sure it is a bit long-winded but it does reallocate the array of strings dynamically instead of using a hard-coded MAX_ARGS, which means that the double pointer is already hogging up memory when only 3 or 4 would do, also which would make the memory usage efficient and tiny, by using realloc, the simple parsing is covered by employing isspace, as it iterates using the pointer. When a space is encountered, it reallocates the double pointer, and malloc the offset to hold the string.
Notice how the triple pointers are used in the resizeptr function.. in fact, I thought this would serve an excellent example of a simple C program, pointers, realloc, malloc, passing-by-reference, basic element of parsing a string...
Hope this helps,
Best regards,
Tom.

Related

Dynamically adding elements to array C

I am trying to build a function that takes a string of chars and provides a list of those chars separated by a token.
This is what I have so far:
char * decode_args(char arguments[]){
char* token = strtok(arguments, "00");
while (token != NULL){
printf("%s\n", token);
token = strtok(NULL, "00");
}
return 0;
}
This function prints the desired values that I am looking for. For example:
>decode_args("77900289008764")
779
289
8764
The next step is to build an array that can be used in the execv command. The arguments need to be an array. An example here. I am a beginner so I don't even know if "array" is the right word. What data type should be built and how can I do that so I can call execv with the arguments that are currently being printed in a list?
For starters let me explain some stuff about storage and strings.
There are 3 basic storage types. Automatic, dynamic, static. And static one usually split into two: read-only and read-write. Dynamic and static ones will be useful for you soon.
Automatic variables are function parameters and local variables. When you call a function they pushed into stack and when function returns they got unwind.
Dynamic one is the one you allocate in runtime with malloc family. This is how we create a dynamic array. And you need to return this source when you are done with free. If you don't, it is called memory leak and you can check for memory leaks with the tool valgrind beside other memory errors. It is quite useful for systems programming class.
And static ones are the ones stay there for lifetime of the program.
If you define a global variable or static int i = 42 it will create static read-write variable so you can change it. Now here is the trick.
void foo() {
char *string1 = "hello, world" //static read-only
char string2[] = "hello, world" //automatic
}
So if you try to change string1 you will get a segmentation fault but it is OK to change string2. I don't know why you don't get segmentation fault by doing decode_args("77900289008764") but I get on my machine. :D
Now C-strings. They are null terminated which means there is a (char) 0 character end of each string that says it is end of the string. strtok basically replace the pattern with NULL character so you have multiple substrings instead of one. So from your example it converts "77900289008764 NULL" to "779 NULL 289 NULL 8764 NULL"
So if I were you I would count encounters of "00" in the string and allocate that much character pointer. Which is something like:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char ** decode_args(char *arguments, char *delim) {
int num_substrings = 1; // Note that C don't initialize with 0 like java etc.
int len_delim = strlen(delim);
for (int i = 0;
arguments[i] != '\0' && arguments[i + 1] != '\0'; // Any of last 2 chars are terminator?
++i)
if (strncmp(arguments + i /* same as &arguments[i] */, delim, len_delim) == 0)
++num_substrings;
char **result = (char **) malloc(sizeof(char *) * (num_substrings + 1));
int i = 0;
char* token = strtok(arguments, delim);
while (token != NULL){
result[i] = token;
++i;
token = strtok(NULL, delim);
}
result[i] = NULL; //End of results. execv wants this as I remember
return result;
}
int main(int argc, char *argv[])
{
char str[] = "foo00bar00baz";
char **results = decode_args(str, "00");
for (int i = 0; results[i] != NULL; ++i) {
char *result = results[i];
puts(result);
}
free(results);
return 0;
}
Try something like this:
#define MAX_ARGUMENTS 10
int decode_args(char arguments[], char ** pcListeArgs)
{
int iNumElet = 0;
char* token = strtok(arguments, "00");
while ((token != NULL) && (iNumElet < MAX_ARGUMENTS -1))
{
size_t len = strlen(token);
pcListeArgs [iNumElet] = (char*) calloc (len+1, sizeof (char));
memset(pcListeArgs [iNumElet], 0, len+1); // reset content
memcpy(pcListeArgs [iNumElet], token, len); // copy data
token = strtok(NULL, "00");
iNumElet++;
}
if ( iNumElet >= MAX_ARGUMENTS)
return -1;
return iNumElet;
}
And int the main :
int main() {
char *pListArgs[MAX_ARGUMENTS];
char args[] = "77900289008764";
int iNbArgs = decode_args (args, pListArgs);
if ( iNbArgs > 0)
{
for ( int i=0; i<iNbArgs; i++)
printf ("Argument number %d = %s\n", i, pListArgs[i]);
for ( int i=0; i<iNbArgs; i++)
free (pListArgs[i]);
}
return 0;
}
output:

Overlaying array of strings over char array

I am working a function that needs to be re-entrant - the function is given a memory buffer as an argument and should use such buffer for all its memory needs. In other words, it can't use malloc, but rather should draw the memory the supplied buffer.
The challenge that I ran into is how to overlay an array of strings over a char array of given size (the buffer is supplied as char *), but my result is array of strings (char **).
Below is a repro:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define BUFFER_SIZE 100
#define INPUT_ARRAY_SIZE 3
char *members[] = {
"alex",
"danny",
"max"
};
int main() {
// this simulates a buffer that is presented to my func
char *buffer = malloc(BUFFER_SIZE);
char *orig = buffer;
memset(buffer, NULL, BUFFER_SIZE);
// pointers will be stored at the beginning of the buffer
char **pointers = &buffer;
// strings will be stored after the pointers
char *strings = buffer + (sizeof(char *) * INPUT_ARRAY_SIZE);
for(int i = 0; i < INPUT_ARRAY_SIZE; i++) {
strncpy(strings, members[i], (strlen(members[i]) + 1));
// Need to store pointer to string in the pointers section
// pointers[i] = strings; // This does not do what I expect
strings += ((strlen(members[i]) + 1));
}
for (int i=0; i < BUFFER_SIZE; i++) {
printf("%c", orig[i]);
}
// Need to return pointers
}
With the problematic line commented out, the code above prints:
alex danny max
However, I need some assistance in figuring out how to write addresses of the strings at the beginning.
Of course, if there an easier way of accomplishing this task, please, let me know.
Here take a look at this.
/* conditions :
*
* 'buffer' should be large enough, 'arr_length','arr' should be valid.
*
*/
char ** pack_strings(char *buffer, char * arr[], int arr_length)
{
char **ptr = (char**) buffer;
char *string;
int index = 0;
string = buffer + (sizeof(char *) * (arr_length+1)); /* +1 for NULL */
while(index < arr_length)
{
size_t offset;
ptr[index] = string;
offset = strlen(arr[index])+1;
strcpy(string,arr[index]);
string += offset;
++index;
}
ptr[index] = NULL;
return ptr;
}
usage
char **ptr = pack_strings(buffer,members,INPUT_ARRAY_SIZE);
for (int i=0; ptr[i] != NULL; i++)
puts(ptr[i]);

How do I create a function in C that allows me to split a string based on a delimiter into an array?

I want to create a function in C, so that I can pass the function a string, and a delimiter, and it will return to me an array with the parts of the string split up based on the delimiter. Commonly used to separate a sentence into words.
e.g.: "hello world foo" -> ["hello", "world", "foo"]
However, I'm new to C and a lot of the pointer things are confusing me. I got an answer mostly from this question, but it does it inline, so when I try to separate it into a function the logistics of the pointers are confusing me:
This is what I have so far:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void split_string(char string[], char *delimiter, char ***result) {
char *p = strtok(string, delimiter);
int i, num_spaces = 0;
while (p != NULL) {
num_spaces++;
&result = realloc(&result, sizeof(char *) * num_spaces);
if (&result == NULL) {
printf("Memory reallocation failed.");
exit(-1);
}
&result[num_spaces - 1] = p;
p = strtok(NULL, " ");
}
// Add the null pointer to the end of our array
&result = realloc(split_array, sizeof(char *) * num_spaces + 1);
&result[num_spaces] = 0;
for (i = 0; i < num_spaces; i++) {
printf("%s\n", &result[i]);
}
free(&result);
}
int main(int argc, char *argv[]) {
char str[] = "hello world 1 foo";
char **split_array = NULL;
split_string(str, " ", &split_array);
return 0;
}
The gist of it being that I have a function that accepts a string, accepts a delimiter and accepts a pointer to where to save the result. Then it constructs the result. The variable for the result starts out as NULL and without memory, but I gradually reallocate memory for it as needed.
But I'm really confused as to the pointers, like I said. I know my result is of type char ** as a string it of type char * and there are many of them so you need pointers to each, but then I'm supposed to pass the location of that char ** to the new function, right, so it becomes a char ***? When I try to access it with & though it doesn't seem to like it.
I feel like I'm missing something fundamental here, I'd really appreciate insight into what is going wrong with the code.
You confusing dereferencing with addressing (which is the complete opposite). Btw, I couldn't find split_array anywhere in the function, as it was down in main. Even if you had the dereferencing and addressing correct, this would still have other issues.
I'm fairly sure you're trying to do this:
#include <stdio.h>
#include <stdlib.h>
void split_string(char string[], const char *delimiter, char ***result)
{
char *p = strtok(string, delimiter);
void *tmp = NULL;
int count=0;
*result = NULL;
while (p != NULL)
{
tmp = realloc(*result, (count+1)*sizeof **result);
if (tmp)
{
*result = tmp;
(*result)[count++] = p;
}
else
{ // failed to expand
perror("Failed to expand result array");
exit(EXIT_FAILURE);
}
p = strtok(NULL, delimiter);
}
// add null pointer
tmp = realloc(*result, (count+1)*sizeof(**result));
if (tmp)
{
*result = tmp;
(*result)[count] = NULL;
}
else
{
perror("Failed to expand result array");
exit(EXIT_FAILURE);
}
}
int main()
{
char str[] = "hello world 1 foo", **toks = NULL;
char **it;
split_string(str, " ", &toks);
for (it = toks; *it; ++it)
printf("%s\n", *it);
free(toks);
}
Output
hello
world
1
foo
Honestly this would be cleaner if the function result were utilized rather than an in/out parameter, but you choice the latter, so there you go.
Best of luck.

Parsing a string with multi-char delimiter and got junk data along with the result in the second call in c

I have writen a code to split the string with multiple char delimiter.
It is working fine for first time of calling to this function
but i calling it second time it retuns the correct word with some unwanted symbol.
I think this problem occurs because of not clearing the buffer.I have tried a lot but cant solve this. please help me to solve this problem.
char **split(char *phrase, char *delimiter) {
int i = 0;
char **arraylist= malloc(10 *sizeof(char *));
char *loc1=NULL;
char *loc=NULL;
loc1 = phrase;
while (loc1 != NULL) {
loc = strstr(loc1, delimiter);
if (loc == NULL) {
arraylist[i]=malloc(sizeof(loc1));
arraylist[i]=loc1;
break;
}
char *buf = malloc(sizeof(char) * 256); // memory for 256 char
int length = strlen(delimiter);
strncpy(buf, loc1, loc-loc1);
arraylist[i]=malloc(sizeof(buf));
arraylist[i]=buf;
i++;
loc = loc+length;
loc1 = loc;
}
return arraylist;
}
called this function first time
char **splitdetails = split("100000000<delimit>0<delimit>hellooo" , "<delimit>");
It gives
splitdetails[0]=100000000
splitdetails[1]=0
splitdetails[2]=hellooo
but i called this second time
char **splitdetails = split("20000000<delimit>10<delimit>testing" , "<delimit>");
splitdetails[0]=20000000��������������������������
splitdetails[1]=10����
splitdetails[2]=testing
Update:-
thanks to #fatelerror. i have change my code as
char** split(char *phrase, char *delimiter) {
int i = 0;
char **arraylist = malloc(10 *sizeof(char *));
char *loc1=NULL;
char *loc=NULL;
loc1 = phrase;
while (loc1 != NULL) {
loc = strstr(loc1, delimiter);
if (loc == NULL) {
arraylist[i]=malloc(strlen(loc1) + 1);
strcpy(arraylist[i], loc1);
break;
}
char *buf = malloc(sizeof(char) * 256); // memory for 256 char
int length = strlen(delimiter);
strncpy(buf, loc1, loc-loc1);
buf[loc - loc1] = '\0';
arraylist[i]=malloc(strlen(buf));
strcpy(arraylist[i], buf);
i++;
loc = loc+length;
loc1 = loc;
}
}
In the caller function, i used it as
char *id
char **splitdetails = split("20000000<delimit>10<delimit>testing" , "<delimit>");
id = splitdetails[0];
//some works done with id
//free the split details with this code.
for(int i=0;i<3;i++) {
free(domaindetails[i]);
}free(domaindetails);
domaindetails=NULL;
then i called the same for the second as,
char **splitdetails1= split("10000000<delimit>1000<delimit>testing1" , "<delimit>");
it makes error and i can't free the function.
thanks in advance.
Your problem boils down to three basic things:
sizeof is not strlen()
Assignment doesn't copy strings in C.
strncpy() doesn't always nul-terminate strings.
So, when you say something like:
arraylist[i]=malloc(sizeof(loc1));
arraylist[i]=loc1;
thisdoes not copy the string. The first one allocates the size of loc1, which is a char *. In other words, you allocated the size of a pointer. You want to allocate storage to store the string, i.e. using strlen():
arraylist[i]=malloc(strlen(loc1) + 1);
Note the + 1 as well, because you also need room for the nul-terminator. Then, to copy the string you want to use strcpy():
strcpy(arraylist[i], loc1);
The way you had it was just assigning a pointer to your old string (and in the process leaing the memory you had just allocated). It's also common to use strdup() which combines both of these steps, i.e.
arraylist[i] = strdup(loc1);
This is convenient but strdup() is not part of the official C library. You need to assess the portability needs of your code before you consider using it.
Additionally, with strncpy(), you should be aware that it does not always nul-terminate:
strncpy(buf, loc1, loc-loc1);
This copies less bytes than were in the original string and doesn't terminate buf. Thus, it's necessary to include a nul terminator yourself:
buf[loc - loc1] = '\0';
This is the root cause of what you are seeing with the garbage. Since you didn't nul terminate, C doesn't know where your string ends and so it keeps on reading whatever happens to be in memory.

Reversing a string in C using pointers?

Language: C
I am trying to program a C function which uses the header char *strrev2(const char *string) as part of interview preparation, the closest (working) solution is below, however I would like an implementation which does not include malloc... Is this possible? As it returns a character meaning if I use malloc, a free would have to be used within another function.
char *strrev2(const char *string){
int l=strlen(string);
char *r=malloc(l+1);
for(int j=0;j<l;j++){
r[j] = string[l-j-1];
}
r[l] = '\0';
return r;
}
[EDIT] I have already written implementations using a buffer and without the char. Thanks tho!
No - you need a malloc.
Other options are:
Modify the string in-place, but since you have a const char * and you aren't allowed to change the function signature, this is not possible here.
Add a parameter so that the user provides a buffer into which the result is written, but again this is not possible without changing the signature (or using globals, which is a really bad idea).
You may do it this way and let the caller responsible for freeing the memory. Or you can allow the caller to pass in an allocated char buffer, thus the allocation and the free are all done by caller:
void strrev2(const char *string, char* output)
{
// place the reversed string onto 'output' here
}
For caller:
char buffer[100];
char *input = "Hello World";
strrev2(input, buffer);
// the reversed string now in buffer
You could use a static char[1024]; (1024 is an example size), store all strings used in this buffer and return the memory address which contains each string. The following code snippet may contain bugs but will probably give you the idea.
#include <stdio.h>
#include <string.h>
char* strrev2(const char* str)
{
static char buffer[1024];
static int last_access; //Points to leftmost available byte;
//Check if buffer has enough place to store the new string
if( strlen(str) <= (1024 - last_access) )
{
char* return_address = &(buffer[last_access]);
int i;
//FixMe - Make me faster
for( i = 0; i < strlen(str) ; ++i )
{
buffer[last_access++] = str[strlen(str) - 1 - i];
}
buffer[last_access] = 0;
++last_access;
return return_address;
}else
{
return 0;
}
}
int main()
{
char* test1 = "This is a test String";
char* test2 = "George!";
puts(strrev2(test1));
puts(strrev2(test2));
return 0 ;
}
reverse string in place
char *reverse (char *str)
{
register char c, *begin, *end;
begin = end = str;
while (*end != '\0') end ++;
while (begin < --end)
{
c = *begin;
*begin++ = *end;
*end = c;
}
return str;
}

Resources