String split function with delimiters in C - c

I am having trouble writing a string split function with delimiters. I based my function off of the main function featured here: http://www.cplusplus.com/reference/cstring/strtok/.
When I test it via main, I am only able to pass it char[], but not char*. When passing a char*, the program seg faults.
I.e. passing some char str[] through str_split works but not some char* str. Any help would be greatly appreciated.
char** str_split(char* str, const char* delim)
{
char* tmp;
char** t = (char**)malloc(sizeof(char*) * 1024);
char** tokens = t;
tmp = strtok(str, delim);
while(tmp != NULL)
{
*tokens = (char*)malloc(sizeof(char) * strlen(tmp));
*tokens = strdup(tmp);
tokens++;
tmp = strtok(NULL, delim);
}
return t;
}

These two lines gives you two different problems:
*tokens = (char*)malloc(sizeof(char) * strlen(tmp));
*tokens = strdup(tmp);
The first line will allocate strlen(tmp) bytes, but the problem is that strings have an extra character to terminate the string, so you really need to allocate strlen(tmp) + 1 bytes.
The second line overwrites the original pointer you got from malloc, causing a memory leak.
Also, in C you should not cast the return of malloc.
Oh, and another note: sizeof(char) is specified to always return 1, no matter the actual bit-size of the char type.
As for your seg-faulting, I'm guessing you are calling your function with a string literal, like e.g.
some_var = str_split("hello world", " ");
Or possibly
char *string = "hello world";
some_var = str_split(string, " ");
This will cause undefine behavior, because string literals are actually a pointer to a constant array of characters, and strtok modifies the string. Undefined behavior is arguably the most common cause of crashes.
If you enable more warnings when building you would have gotten a warning about this, or maybe you did get a warning but ignored it, or used casting to get rid of it. Warnings from the compilers are often good indicators of you doing something you should not do, hiding it by e.g. casting will only silent the warning but not fix the problem.
There are also a couple of other problems with your code. One is that if there are only one "word"/"token" in the "sentence" you pass in to the function, you waste 4092 or 8184 bytes (depending on 32- or 64-bit platform) in that allocation. You might want to do a separate tokenization loop first (on a temporary copy of the string) first to find out the exact number of "tokens" or "words" in the input.
Doing this counting will also solve the other problem: What if there are more than 1024 tokens/words? Your loop will blissfully write out of bounds in that case.
Both of these cases are extremes, and your standard use-case may be a better fit for your current code, but it's still something to think about.

you may be assigning value to char * at declaration
char *str="abcdef";
or you may not have allocated memory to the string pointed by char * str. In both of these cases strtok() will result in segmentation fault.

When I test it via main, I am only able to pass it char[], but not char*. When passing a char*, the program seg faults.
Going by the above chances are that you are not allocating memory for your char * in main or you are passing a string literal.

Related

How to take string as input using pointers without declaration of variable for string?

char *p="This is anonymous string literal";
Doing this I can access this string easily but tell me that how can I get string literal from user using this p pointer without declaring the other variable for string. Check my images from button to top.
A string literal declares a const character array. That means that any try to change the characters through the p pointer would invoke Undefined Behaviour (the hell for C programmers...).
To be able to read a string from the user (note, not a literal one), you must either declare a character array, or allocate one with malloc and friends.
Examples:
char str[80];
char *p = str; // Ok, you can do what you want with 79 charactes (last should be null)
or
char *p = malloc(80); // again 80 characters to play with
...
free(p); // but you are responsable for releasing allocated memory when done
Changing the size of your string requires memory allocation.
Maybe you could just use a pointer to this string in your program and change the target of the pointer.
That may not answer your question but it can be interesting.
char *str = "My string";
char *str2 = "My string 2";
char **ptr = &str;
printf("%s\n", *ptr);
ptr = &str2;
printf("%s\n", *ptr);
Edit
From gets() documentation:
str − This is the pointer to an array of chars where the C string is stored
Means that the pointer passed as parameters of gets() must already have allocated memory, you could try:
char p[50];
// instead of
char *p; // No memory allocated to put user input in
That's why you got a segfault (with memory observation tools like valgrind you'll see errors on all compilers).
Also printf() is not flushing the output without a carriage return (\n), you could try:
printf("No carriage return");
fflush(stdout); // Force stdout update

How does memory allocation work with char pointers(string literals, arrays)?

Currently reading K&R and just got stumbled across the char pointers. There's nothing about memory allocation when defining char pointers in the book rn, maybe it'll be explained later. But it just doesn't make sense so I'm seeking help :)
1
// No errors
char *name;
char *altname;
strcpy(altname,name);
2
// No errors, internals of *name have been successfully moved to *altname
char *name = "HI";
char *altname;
strcpy(altname, name);
3
// Segmentation fault, regardless of how I define *altname
char *name = "HI";
char *altname = "randomstring";
strcpy(altname, name);
4
// Segmentation fault, regardless of how I define *altname
char *name;
char *altname = " ";
strcpy(altname, name);
5
// copies internals of *name only if size of char s[] > 8???
char s[9];
char n[] = {'c', 'b'};
char *name = n;
char *altname = s;
strcpy(altname, name);
Why does the first example produce no error, even though there's no memory allocated?
Why does the second one successfully copy name to altname, even though there's no memory allocated for altname;
Why do the third and forth one core dump?
Why does the fifth one require size of s as >8?
I'm a bit surprised that the first two examples worked. strcpy takes the character array pointed to by the second argument and copies it to the character array pointed to by the first argument.
In your first example, the pointers name and altname aren't pointing to anything. They're uninitialized pointers. Now they have some value based on either whatever gunk was in memory when the main function was executed or just however the compiler decided to initialize them. That's why I'm surprised that your code isn't crashing.
Anyway, the reason why your third example is segfaulting is you've set both pointers to point to string literals (i.e., "HI" and "randomstring"). String literals are (most likely) stored in a read-only section of memory. So, when you run strcpy with the destination pointer set to altname, you're trying to write to read-only memory. That's an invalid use of memory and hence the crash. The same goes for your fourth example.
In the final example, your problem is that n is not null-terminated. strcpy keeps copying characters until it reaches a terminator (i.e., '\0'). You've put a 'c' and a 'b' in your array but no terminator. Therefore, strcpy is going to keep copying even after it's reached the end of n. The fact that it works when sizeof(s) is greater than 8 probably involves where a null byte happens to exist on your stack. That is, if you make s too small, you'll write past the end of it and corrupt the memory on your stack. It's possible you're overriding your return pointer and thus returning to some invalid memory address when your function is done executing.
What you need to do is, instead of using pointers, use arrays. This will give you the storage space you need in a writable section of memory. You also need to make sure that your arrays are null-terminated. This can be done automatically by initializing your arrays with string literals.
char name[] = "HI";
char altname[] = "randomstring";
strcpy(altname, name);
char s[3]; // Enough space to hold "cb" plus a null-terminator.
char n[] = "cb";
char *name = n;
char *altname = s;
strcpy(altname, name);
All code snippets are undefined behavior. There is no guarantee that you will get an error. As the term says, the behavior is undefined.
In 1 and 2, one or both pointers are uninitialized and may point to memory that may or may not be protected against the programmed access, so you may or may not get an error or program crash.
In 3 and 4 you may get a reproducible segmentation fault because altname points to a string literal "randomstring" or " " which can be read-only, so you are not allowed to overwrite the memory where altname points to. In 4, name is uninitialized, so this might also be the cause of the segmentation fault.
In 5, n is not a valid string that could be used for strcpy. The terminating '\0' is missing. strcpy will (try to) copy everything after the end of n until it finds a '\0'. To fix it use
char n[] = {'c', 'b', '\0'};
or
char n[] = "cb";

C - printing a char* through strcpy is acting contradictory?

First post here, I've used this site for years but this dilemma is really annoying. So when I run this C code through the VS2015 Developer Command prompt with the C compiler:
#include <stdio.h>
#include <string.h>
int main(){
char* ptr = NULL;
const char* pt2 = "hello";
strcpy(&ptr, pt2);
printf("%s",&ptr);
return 0;
}
I get these warnings:
midterm_c.c(35): warning C4047: 'function': 'char *' differs in levels of indirection from 'char **'
midterm_c.c(35): warning C4024: 'strcpy': different types for formal and actual parameter 1
midterm_c.c(36): warning C4477: 'printf' : format string '%s' requires an argument of type 'char *', but variadic argument 1 has type 'char **'
Yet when I run it, it prints "hello", which shouldn't happen because it doesn't make sense to use the reference operators on the ptr variable. But when I run this code without them:
char* ptr = NULL;
const char* ptr2 = "hello";
strcpy(ptr, ptr2);
printf("%s",ptr);
return 0;
I get no warnings, successfully compiles, yet when I run it the program crashes even though this should work properly. I'm getting really frustrated because this doesn't make any sense to me and one of the reasons why I'm really hating C right now. If anyone has any idea how this is happening I'd really appreciate it. Thank you.
In your working code, you basically just treated part of your stack as a string buffer; it's flagrantly illegal by the standard, but it happens to work because you don't read from anything you stomp on after the stomping.
You're correct that passing a char** to strcpy makes no sense, but what ends up happening is that it writes to the char* variable itself (and some neighboring memory if it's a 32 bit program), treating it as an array of char (after all, an eight byte pointer has enough room for a seven character string plus NUL terminator). Effectively, your code is incorrect code that still happens to behave like correct code, since this:
char* ptr = NULL;
const char* pt2 = "hello";
strcpy(&ptr, pt2);
printf("%s",&ptr);
compiles to code that behaves almost exactly like this (aside from the warnings):
char ptr[sizeof(char*)]; // Your char* behaves like an array
const char* pt2 = "hello";
strcpy(ptr, pt2);
printf("%s", ptr);
When you don't take the address of ptr, you pass a NULL pointer to strcpy, which crashes immediately (because you can't write to a NULL pointer's target on almost all systems).
Of course, the correct solution would be to make ptr point to allocated memory, either global, automatic, or dynamically allocated memory. For example, using a stack buffer, this would be legal:
char buf[8];
char *ptr = buf; // = &buf[0]; would be equivalent
const char* pt2 = "hello";
strcpy(ptr, pt2);
printf("%s", ptr);
although it would be rather pointless, since ptr could just be replaced with buf everywhere and ptr removed:
char buf[8];
const char* pt2 = "hello";
strcpy(buf, pt2);
printf("%s", buf);
First of all, you don't know how to use strcpy. The first parameter needs to point to space available to copy into, it doesn't allocate space for you.
Your first example probably only "works" because it is copying into stack-space. This is undefined behavior and will cause all kinds of problems.
Your second example (the more correct one) will generally cause a segfault because you are trying to copy into (or read from) a null address.
You're writing to random memory, which is undefined behavior. This means that according to the C standard, the compiler is allowed to generate code that does literally anything, including but not limited to:
Printing "hello".
Printing random gibberish.
Crashing.
Corrupting some other variable in your app.
Ripping a hole in the fabric of space-time, causing an invasion by evil space lemurs from the fifth dimension that ruin the market for Kopi Luwak coffee by causing a glut in the supply.
Tl;dr: Don't do that.
You need to allocate memory for ptr variable (malloc). Then, a usefull function is strdup().
It take a string, allocate the good memory's space for a new string and copy the content of the source in it before returning the new string.
You can do:
ptr = strdup(pt2);
The documentation: http://manpagesfr.free.fr/man/man3/strdup.3.html

Why I cannot free a malloc'd string?

I have this code:
#define ABC "abc"
void main()
{
char *s = malloc(sizeof(char)*3);
printf("%p ", s);
s = ABC;
printf("%p ", s);
free(s);
}
This is the output:
0x8927008 0x8048574 Segmentation fault (core dumped)
As you can see, the address of string s changes after assignment (I think this is why free() gives segfault).
Can anyone explain me why and how this happens?
Thank you!
The line
s = ABC;
changes s to point to a different string which may well be in read-only memory. Attempting to free such memory results in undefined behaviour. A crash is likely.
I think you wanted
strcpy(s, ABC);
instead. This would copy the char array "abc" into s. Note that this will cause a further bug - s is too short and doesn't have space for the nul terminator at the end of ABC. Change you allocation to 4 bytes to fix this
char *s = malloc(4);
or use
char *s = malloc(sizeof(ABC));
if ABC is the max length you want to store.
void main() UB: main returns int
printf("%p", s) UB: calling a function accepting a variable number of arguments without a prototype in scope; UB: using a value of type char* where a value of type void* is expected.
free(s) UB: s is not the result of a malloc()
UB is Undefined Behaviour.
After
char *s = malloc(sizeof(char)*3);
s points to the memory allocated by malloc.
The line
s = ABC;
will change to
s = "abc";
after the prepossessing. Now this step makes s point to the string literal "abc" in the read-only area. Note that this also leaks the malloced memory.
Now since s is pointing to a non-malloced read-only memory, free it is a undefined behavior.
If you wanted to copy the string "abc" into memory pointed to by s you need to use strcpy as:
strcpy(s, ABC);
but note that to accommodate "abc", s must be at least 4 characters long, to accommodate the NUL character as well.
Change the line: char *s = malloc(sizeof(char)*3);
to
char *s = (char *)malloc(sizeof(char)*3);
This clears up warnings many compilers warn about.
More than that I recommend that you make your code more flexible depending on what you do. If for some reason you change ABC to "ABCD", you will not have allocated enough space for all the characters.
Also, the string "ABC" has actually 4 characters (since there is a null at the end that terminates the strings). You'll see issues without that extra terminator.

string manipulations in C

Following are some basic questions that I have with respect to strings in C.
If string literals are stored in read-only data segment and cannot be changed after initialisation, then what is the difference between the following two initialisations.
char *string = "Hello world";
const char *string = "Hello world";
When we dynamically allocate memory for strings, I see the following allocation is capable enough to hold a string of arbitary length.Though this allocation work, I undersand/beleive that it is always good practice to allocate the actual size of actual string rather than the size of data type.Please guide on proper usage of dynamic allocation for strings.
char *str = (char *)malloc(sizeof(char));
scanf("%s",str);
printf("%s\n",str);
1.what is the difference between the following two initialisations.
The difference is the compilation and runtime checking of the error as others already told about this.
char *string = "Hello world";--->stored in read only data segment and
can't be changed,but if you change the value then compiler won't give
any error it only comes at runtime.
const char *string = "Hello world";--->This is also stored in the read
only data segment with a compile time checking as it is declared as
const so if you are changing the value of string then you will get an
error at compile time ,which is far better than a failure at run time.
2.Please guide on proper usage of dynamic allocation for strings.
char *str = (char *)malloc(sizeof(char));
scanf("%s",str);
printf("%s\n",str);
This code may work some time but not always.The problem comes at run-time when you will get a segmentation fault,as you are accessing the area of memory which is not own by your program.You should always very careful in this dynamic memory allocation as it will leads to very dangerous error at run time.
You should always allocate the amount of memory you need correctly.
The most error comes during the use of string.You should always keep in mind that there is a '\0' character present at last of the string and during the allocation its your responsibility to allocate memory for this.
Hope this helps.
what is the difference between the following two initialisations.
String literals have type char* for legacy reasons. It's good practice to only point to them via const char*, because it's not allowed to modify them.
I see the following allocation is capable enough to hold a string of arbitary length.
Wrong. This allocation only allocates memory for one character. If you tried to write more than one byte into string, you'd have a buffer overflow.
The proper way to dynamically allocate memory is simply:
char *string = malloc(your_desired_max_length);
Explicit cast is redundant here and sizeof(char) is 1 by definition.
Also: Remember that the string terminator (0) has to fit into the string too.
In the first case, you're explicitly casting the char* to a const one, meaning you're disallowing changes, at the compiler level, to the characters behind it. In C, it's actually undefined behaviour (at runtime) to try and modify those characters regardless of their const-ness, but a string literal is a char *, not a const char *.
In the second case, I see two problems.
The first is that you should never cast the return value from malloc since it can mask certain errors (especially on systems where pointers and integers are different sizes). Specifically, unless there is an active malloc prototype in place, a compiler may assume that it returns an int rather than the correct void *.
So, if you forget to include stdlib.h, you may experience some funny behaviour that the compiler couldn't warn you about, because you told it with an explicit cast that you knew what you were doing.
C is perfectly capable of implicit casting between the void * returned from malloc and any other pointer type.
The second problem is that it only allocates space for one character, which will be the terminating null for a string.
It would be better written as:
char *string = malloc (max_str_size + 1);
(and don't ever multiply by sizeof(char), that's a waste of time - it's always 1).
The difference between the two declarations is that the compiler will produce an error (which is much preferable to a runtime failure) if an attempt to modify the string literal is made via the const char* declared pointer. The following code:
const char* s = "hello"; /* 's' is a pointer to 'const char'. */
*s = 'a';
results in the VC2010 emitted the following error:
error C2166: l-value specifies const object
An attempt to modify the string literal made via the char* declared pointer won't be detected until runtime (VC2010 emits no error), the behaviour of which is undefined.
When malloc()ing memory for storing of strings you must remember to allocate one extra char for storing the null terminator as all (or nearly all) C string handling functions require the null terminator. For example, to allocate a buffer for storing "hello":
char* p = malloc(6); /* 5 for "hello" and 1 for null terminator. */
sizeof(char) is guaranteed to be 1 so is unrequired and it is not necessary to cast the return value of malloc(). When p is no longer required remember to free() the allocated memory:
free(p);
Difference between the following two initialisations.
first, char *string = "Hello world";
- "Hello world" stored in stack segment as constant string and its address is assigned to pointer'string' variable.
"Hello world" is constant. And you can't do string[5]='g', and doing this will cause a segmentation fault.
Where as 'string' variable itself is not constant. And you can change its binding:
string= "Some other string"; //this is correct, no segmentation fault
const char *string = "Hello world";
Again "Hello world" stored in stack segment as constant string and its address is assigned to 'string' variable.
And string[5]='g', and this cause segmentation fault.
No use of const keyword here!
Now,
char *string = (char *)malloc(sizeof(char));
Above declaration same as first one but this time you are assignment is dynamic from Heap segment (not from stack)
The code:
char *string = (char *)malloc(sizeof(char));
Will not hold a string of arbitrary length. It will allocate a single character and return a pointer to char character. Note that a pointer to a character and a pointer to what you call a string are the same thing.
To allocate space for a string you must do something like this:
char *data="Hello, world";
char *copy=(char*)malloc(strlen(data)+1);
strcpy(copy,data);
You need to tell malloc exactly how many bytes to allocate. The +1 is for the null terminator that needs to go onto the end.
As for literal string being stored in a read-only segment, this is an implementation issue, although is pretty much always the case. Most C compilers are pretty relaxed about const'ing access to these strings, but attempting to modify them is asking for trouble, so you should always declare them const char * to avoid any issues.
That particular allocation may appear to work as there's probably plenty of space in the program heap, but it doesn't. You can verify it by allocating two "arbitrary" strings with the proposed method and memcpy:ing some long enough string to the corresponding addresses. In the best case you see garbage, in the worst case you'll have segmentation fault or assert from malloc or free.

Resources