separating a string with strtok - c

I'm looking to separate a line (given as one string) into words. for example:
" Hello world". I can have one or more tabs or spaces between the words and in the beginning. I'm trying to do something like this:
(findCommand is a function and line is the string I get as input, for this part I only need the first 2 words)
CommandResult findCommand (const char* line){
char* commandLine = malloc(strlen(line)+1);
strcpy(commandLine, line);
char space[] = " \t";
char* word1 = strtok(commandLine,space);
char* word2 = strtok(NULL,space);
I've tried to run this in Eclipse with different variations of spaces and tabs. some of them worked fine, on others I get a sigmentation fault and I can't figure out why.

This:
char* commandLine = malloc(sizeof(strlen(line)));
is wrong. You shouldn't use sizeof here, and certainly not on the result of calling strlen(). The above is the same as:
char *commandLine = malloc(sizeof (size_t));
since the return type of strlen() is size_t. Thus, the actual strlen() return value is ignored.
The proper code is:
char *commandLine = malloc(strlen(line) + 1);
since you must add 1 for the terminator, which is not included in the length returned by strlen().
There is no need for any sizeof here, since you're very obviously working with characters.

Use malloc((strlen(line) + 1)* sizeof(char)) instead of malloc(sizeof(strlen(line))).
You allocate only space for an integer because sizeof returns an integer.

Related

Is this unsafe to use in C?

Hello I've come upon a problem. Im not very experienced in C.
I am trying to concatenate one char to my path variable.
But when I am running this line of code my other string variable gets "overriden" or behaves weird afterwards. When commented out everything works normally. I don't want to post the whole code here inseat I am just curios if this single line is somehow unsafe to run.
strcat(path, "/");
I also tried:
//edit i actually tried strcat but later strncat copied the line while reversing the changes//
char temp = '/';
strncat(path, &temp);
I am stuck wayyy to long on this so maybe someone can help.
For starters the function strncat has three parameters
char *strncat(char * restrict s1, const char * restrict s2, size_t n);
So this call
strncat(path, "/");
will not compile.
Apart from this error this code snippet
char temp = '/';
strncat(path, &temp);
has one more error that is the expression &temp does not point to a string.
You can append a character to a string only if the array containing the string has enough space to accommodate one more character. For example you may not change a string literal.
If the array containing the string has enough memory to accommodate the character '/' then you can write
strcat( path, "/" );
or
size_t n = strlen( path );
path[n++] = '/';
path[n] = '\0';
Or as #Barmar correctly pointed in his comment to the answer you could use strncat the following way
char temp = '/';
strncat(path, &temp, 1);

Create a file of a specific dimension filled with zeroes

I'm trying to create a method which, given multiple strings, merges them together. Now, this is what I came up with. The main is purely for test so in the end I'll have only one part string at the time, the location on which that part belongs in the full string, and the final string where I need to put these parts.
int mergeParts(char* text, char* part, int position){
int printSpot = position * CONTENTSIZE;
strcat(text[printSpot], part);
printf("%s\n", text);
return 0;
}
Now, the problem with this code is a segmentation error, I tried multiple things but the only one that seems to work is using strcat(text, part); without using the "location" on which the part of the string must be copy.
#define CONTENTSIZE 10
int main(){
int i;
char* part1 = "This is a ";
char* part2 = "test with ";
char* part3 = "something ";
char* part4 = "that i wro";
char* part5 = "te in it";
int totParts = 5;
char* parts[totParts] = {part1,part2,part3,part4,part5};
int stringSize = totParts * CONTENTSIZE;
char* finalString = malloc(stringSize);
for(i = 0; i<totParts; i++){
mergeParts(finalString, parts[i], i);
}
return 0;
}
How can I do this specifying to the string the location where to copy the parts.
A good example that I can give you to explain better what I'm looking for is:
I have a empty string "------------------------------"
I have to write inside "This "; "is an"; " exam"; "ple o"; "f the"; " text";
If I receive " exam";, the result in my string has to be "---------- exam---------------".
Then I receive " text"; and so the result will be "---------- exam---------- text"
And so on until I have "This is an example of the text";
It seems that strcat(text[printSpot], part); is the point. I think you should use strcat(&(text[printSpot]), part); instead. text[printSpot] will have the actual char data like 'e', not the address of the string which is required for strcat.
Or you can use strcat(text+printSpot,part) simply.
The main issue causing the segfault is that you're not passing the correct argument to strcat:
strcat(text[printSpot], part);
Both arguments are expected to be of type char *, but for the first argument you're passing in a single char. Passing a non-pointer where a pointer is expected invokes undefined behavior. In this case, the character being passed in is being interpreted as an address (which is invalid), and that invalid address is dereferenced, causing a crash.
You should be passing in the address of that array element:
strcat(&text[printSpot], part);
You also haven't initialized the bytes in finalString. The strcat function expects its first argument to point to a null terminated string, but because none of the allocated bytes have been initialized, you potentially read past the end of allocated memory, which again invokes undefined behavior.
Putting an empty string in finalString will take care of this:
strcpy(finalString, "");
Or equivalently:
finalString[0] = '\x0';
This allows the test program to work properly, where you're appending to an empty string in order, but it doesn't satisfy the requirement of updating parts of an existing string, possibly in the middle. Using strcat will null-terminate the destination string after the second argument is appended, resulting in anything that might have come after it to be lost.
Assuming finalString is initially set with an "empty" string as in your example of the proper length, you should instead use memcpy. This will copy over only the characters in the string and not add a null terminating byte:
memcpy(&text[printSpot], part, strlen(part));
You'll also want to populate finalString with '-' characters to start:
char* finalString = malloc(stringSize + 1);
memset(finalString, '-', stringSize);
finalString[stringSize]=0;
Output:
This is a ----------------------------------------
This is a test with ------------------------------
This is a test with something --------------------
This is a test with something that i wro----------
This is a test with something that i wrote in it--

Split a string with delimiters with support for missing values C99 [duplicate]

I am trying to tokenize a string but I need to know exactly when no data is seen between two tokens. e.g when tokenizing the following string "a,b,c,,,d,e" I need to know about the two empty slots between 'd' and 'e'... which I am unable to find out simply using strtok(). My attempt is shown below:
char arr_fields[num_of_fields];
char delim[]=",\n";
char *tok;
tok=strtok(line,delim);//line contains the data
for(i=0;i<num_of_fields;i++,tok=strtok(NULL,delim))
{
if(tok)
sprintf(arr_fields[i], "%s", tok);
else
sprintf(arr_fields[i], "%s", "-");
}
Executing the above code with the aforementioned examples put characters a,b,c,d,e into first five elements of arr_fields which is not desirable. I need the position of each character to go in specific indexes of array: i.e if there is a character missing between two characters, it should be recorded as is.
7.21.5.8 the strtok function
The standard says the following regarding strtok:
[#3] The first call in the sequence searches the string
pointed to by s1 for the first character that is not
contained in the current separator string pointed to by s2.
If no such character is found, then there are no tokens in
the string pointed to by s1 and the strtok function returns
a null pointer. If such a character is found, it is the
start of the first token.
In the above quote we can read you cannot use strtok as a solution to your specific problem, since it will treat any sequential characters found in delims as a single token.
Am I doomed to weep in silence, or can somebody help me out?
You can easily implement your own version of strtok that does what you want, see the snippets at the end of this post.
strtok_single makes use of strpbrk (char const* src, const char* delims) which will return a pointer to the first occurrence of any character in delims that is found in the null-terminated string src.
If no matching character is found the function will return NULL.
strtok_single
char *
strtok_single (char * str, char const * delims)
{
static char * src = NULL;
char * p, * ret = 0;
if (str != NULL)
src = str;
if (src == NULL)
return NULL;
if ((p = strpbrk (src, delims)) != NULL) {
*p = 0;
ret = src;
src = ++p;
} else if (*src) {
ret = src;
src = NULL;
}
return ret;
}
sample use
char delims[] = ",";
char data [] = "foo,bar,,baz,biz";
char * p = strtok_single (data, delims);
while (p) {
printf ("%s\n", *p ? p : "<empty>");
p = strtok_single (NULL, delims);
}
output
foo
bar
<empty>
baz
biz
You can't use strtok() if that's what you want. From the man page:
A sequence of two or more contiguous delimiter characters in the parsed
string is considered to be a single delimiter. Delimiter characters at
the start or end of the string are ignored. Put another way: the
tokens returned by strtok() are always nonempty strings.
Therefore it is just going to jump from c to d in your example.
You're going to have to parse the string manually or perhaps search for a CSV parsing library that would make your life easier.
Lately I was looking for a solution to the same problem and found this thread.
You can use strsep().
From the manual:
The strsep() function was introduced as a replacement for strtok(3),
since the latter cannot handle empty fields.
As mentioned in this answer, you'll want to implement something like strtok yourself. I prefer using strcspn (as opposed to strpbrk), as it allows for fewer if statements:
char arr_fields[num_of_fields];
char delim[]=",\n";
char *tok;
int current_token= 0;
int token_length;
for (i = 0; i < num_of_fields; i++, token_length = strcspn(line + current_token,delim))
{
if(token_length)
sprintf(arr_fields[i], "%.*s", token_length, line + current_token);
else
sprintf(arr_fields[i], "%s", "-");
current_token += token_length;
}
Parse (for example, strtok)
Sort
Insert
Rinse and repeat as needed :)
You could try using strchr to find out the locations of the , symbols. Tokenize manually your string up to the token you found (using memcpy or strncpy) and then use again strchr. You will be able to see if two or more commas are next to each other this way (strchr will return numbers that their subtraction will equal 1) and you can write an if statement to handle that case.

C String parsing errors with strtok(),strcasecmp()

So I'm new to C and the whole string manipulation thing, but I can't seem to get strtok() to work. It seems everywhere everyone has the same template for strtok being:
char* tok = strtok(source,delim);
do
{
{code}
tok=strtok(NULL,delim);
}while(tok!=NULL);
So I try to do this with the delimiter being the space key, and it seems that strtok() no only reads NULL after the first run (the first entry into the while/do-while) no matter how big the string, but it also seems to wreck the source, turning the source string into the same thing as tok.
Here is a snippet of my code:
char* str;
scanf("%ms",&str);
char* copy = malloc(sizeof(str));
strcpy(copy,str);
char* tok = strtok(copy," ");
if(strcasecmp(tok,"insert"))
{
printf(str);
printf(copy);
printf(tok);
}
Then, here is some output for the input "insert a b c d e f g"
aaabbbcccdddeeefffggg
"Insert" seems to disappear completely, which I think is the fault of strcasecmp(). Also, I would like to note that I realize strcasecmp() seems to all-lower-case my source string, and I do not mind. Anyhoo, input "insert insert insert" yields absolutely nothing in output. It's as if those functions just eat up the word "insert" no matter how many times it is present. I may* end up just using some of the C functions that read the string char by char but I would like to avoid this if possible. Thanks a million guys, i appreciate the help.
With the second snippet of code you have five problems: The first is that your format for the scanf function is non-standard, what's the 'm' supposed to do? (See e.g. here for a good reference of the standard function.)
The second problem is that you use the address-of operator on a pointer, which means that you pass a pointer to a pointer to a char (e.g. char**) to the scanf function. As you know, the scanf function want its arguments as pointers, but since strings (either in pointer to character form, or array form) already are pointer you don't have to use the address-of operator for string arguments.
The third problem, once you fix the previous problem, is that the pointer str is uninitialized. You have to remember that uninitialized local variables are truly uninitialized, and their values are indeterminate. In reality, it means that their values will be seemingly random. So str will point to some "random" memory.
The fourth problem is with the malloc call, where you use the sizeof operator on a pointer. This will return the size of the pointer and not what it points to.
The fifth problem, is that when you do strtok on the pointer copy the contents of the memory pointed to by copy is uninitialized. You allocate memory for it (typically 4 or 8 bytes depending on you're on a 32 or 64 bit platform, see the fourth problem) but you never initialize it.
So, five problems in only four lines of code. That's pretty good! ;)
It looks like you're trying to print space delimited tokens following the word "insert" 3 times. Does this do what you want?
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
char str[BUFSIZ] = {0};
char *copy;
char *tok;
int i;
// safely read a string and chop off any trailing newline
if(fgets(str, sizeof(str), stdin)) {
int n = strlen(str);
if(n && str[n-1] == '\n')
str[n-1] = '\0';
}
// copy the string so we can trash it with strtok
copy = strdup(str);
// look for the first space-delimited token
tok = strtok(copy, " ");
// check that we found a token and that it is equal to "insert"
if(tok && strcasecmp(tok, "insert") == 0) {
// iterate over all remaining space-delimited tokens
while((tok = strtok(NULL, " "))) {
// print the token 3 times
for(i = 0; i < 3; i++) {
fputs(tok, stdout);
}
}
putchar('\n');
}
free(copy);
return 0;
}

Need to know when no data appears between two token separators using strtok()

I am trying to tokenize a string but I need to know exactly when no data is seen between two tokens. e.g when tokenizing the following string "a,b,c,,,d,e" I need to know about the two empty slots between 'd' and 'e'... which I am unable to find out simply using strtok(). My attempt is shown below:
char arr_fields[num_of_fields];
char delim[]=",\n";
char *tok;
tok=strtok(line,delim);//line contains the data
for(i=0;i<num_of_fields;i++,tok=strtok(NULL,delim))
{
if(tok)
sprintf(arr_fields[i], "%s", tok);
else
sprintf(arr_fields[i], "%s", "-");
}
Executing the above code with the aforementioned examples put characters a,b,c,d,e into first five elements of arr_fields which is not desirable. I need the position of each character to go in specific indexes of array: i.e if there is a character missing between two characters, it should be recorded as is.
7.21.5.8 the strtok function
The standard says the following regarding strtok:
[#3] The first call in the sequence searches the string
pointed to by s1 for the first character that is not
contained in the current separator string pointed to by s2.
If no such character is found, then there are no tokens in
the string pointed to by s1 and the strtok function returns
a null pointer. If such a character is found, it is the
start of the first token.
In the above quote we can read you cannot use strtok as a solution to your specific problem, since it will treat any sequential characters found in delims as a single token.
Am I doomed to weep in silence, or can somebody help me out?
You can easily implement your own version of strtok that does what you want, see the snippets at the end of this post.
strtok_single makes use of strpbrk (char const* src, const char* delims) which will return a pointer to the first occurrence of any character in delims that is found in the null-terminated string src.
If no matching character is found the function will return NULL.
strtok_single
char *
strtok_single (char * str, char const * delims)
{
static char * src = NULL;
char * p, * ret = 0;
if (str != NULL)
src = str;
if (src == NULL)
return NULL;
if ((p = strpbrk (src, delims)) != NULL) {
*p = 0;
ret = src;
src = ++p;
} else if (*src) {
ret = src;
src = NULL;
}
return ret;
}
sample use
char delims[] = ",";
char data [] = "foo,bar,,baz,biz";
char * p = strtok_single (data, delims);
while (p) {
printf ("%s\n", *p ? p : "<empty>");
p = strtok_single (NULL, delims);
}
output
foo
bar
<empty>
baz
biz
You can't use strtok() if that's what you want. From the man page:
A sequence of two or more contiguous delimiter characters in the parsed
string is considered to be a single delimiter. Delimiter characters at
the start or end of the string are ignored. Put another way: the
tokens returned by strtok() are always nonempty strings.
Therefore it is just going to jump from c to d in your example.
You're going to have to parse the string manually or perhaps search for a CSV parsing library that would make your life easier.
Lately I was looking for a solution to the same problem and found this thread.
You can use strsep().
From the manual:
The strsep() function was introduced as a replacement for strtok(3),
since the latter cannot handle empty fields.
As mentioned in this answer, you'll want to implement something like strtok yourself. I prefer using strcspn (as opposed to strpbrk), as it allows for fewer if statements:
char arr_fields[num_of_fields];
char delim[]=",\n";
char *tok;
int current_token= 0;
int token_length;
for (i = 0; i < num_of_fields; i++, token_length = strcspn(line + current_token,delim))
{
if(token_length)
sprintf(arr_fields[i], "%.*s", token_length, line + current_token);
else
sprintf(arr_fields[i], "%s", "-");
current_token += token_length;
}
Parse (for example, strtok)
Sort
Insert
Rinse and repeat as needed :)
You could try using strchr to find out the locations of the , symbols. Tokenize manually your string up to the token you found (using memcpy or strncpy) and then use again strchr. You will be able to see if two or more commas are next to each other this way (strchr will return numbers that their subtraction will equal 1) and you can write an if statement to handle that case.

Resources