What is the difference between
char *key_str="kiwi";
and
char *key_str = strdup("kiwi");
For example:
int strCmp(void *vp1, void *vp2) {
char * s1 = (char *) vp1;
char * s2 = (char *) vp2;
return strcmp(s1, s2);
}
Why do those *key_str behave differently when used in the function strCmp()?
src code : https://github.com/alexparkjw/typing/blob/master/pa2.c
In C all literal strings are really arrays of (effectively read-only) characters.
With
char *str = "kiwi";
you make str point to the first element of such an array.
It is somewhat equivalent to
char internal_array_for_kiwi[5] = { 'k', 'i', 'w', 'i', '\0' };
char *str = &internal_array_for_kiwi[0];
The strdup function dynamically allocates memory and copies the passed string into this memory, creating a copy of the string.
So after
char *str = strdup("kiwi");
you have two arrays containing the same contents.
It's equivalent to
char internal_array_for_kiwi[5] = { 'k', 'i', 'w', 'i', '\0' };
char *str = malloc(strlen(internal_array_for_kiwi) + 1);
strcpy(str, internal_array_for_kiwi);
An important difference between the two needs to be emphasized: Literal strings in C can't be modified. Attempting to modify such a string will lead to undefined behavior. The arrays aren't const, but are effectively read-only.
If you create your own array (as an array or allocated dynamically) then you can modify its contents as much as you want, as long as you don't go out of bounds or change the string null-terminator.
So if we have
char *str1 = "kiwi";
char *str2 = strdup("kiwi");
then
str1[0] = 'l'; // Undefined behavior, attempting to modify a literal string
str2[0] = 'l'; // Valid, strdup returns memory you can modify
Because literal strings can't be modified, it's recommended that you use const char * when pointing to them:
const char *str1 = "kiwi";
Another important thing to remember: Since strdup allocates memory dynamically (using malloc) you need to free that memory once you're done with the string:
free(str2);
If you don't free the memory then you will have a memory leak.
Beyond the above, there's no effective difference between the two variants. Both can be used interchangeably when calling functions, for example.
Related
Lately I've been learning all about the C language, and am confused as to when to use
char a[];
over
char *p;
when it comes to string manipulation. For instance, I can assign a string to them both like so:
char a[] = "Hello World!";
char *p = "Hello World!";
and view/access them both like:
printf("%s\n", a);
printf("%s\n", p);
and manipulate them both like:
printf("%c\n", &a[6]);
printf("%c\n", &p[6]);
So, what am I missing?
char a[] = "Hello World!";
This allocates modifiable array just big enough to hold the string literal (including terminating NUL char). Then it initializes the array with contents of string literal. If it is a local variable, then this effectively means it does memcpy at runtime, every time the local variable is created.
Use this when you need to modify the string, but don't need to make it bigger.
Also, if you have char *ap = a;, when a goes out of scope ap becomes a dangling pointer. Or, same thing, you can't do return a; when a is local to that function, because return value will be dangling pointer to now destroyed local variables of that function.
Note that using exactly this is rare. Usually you don't want an array with contents from string literal. It's much more common to have something like:
char buf[100]; // contents are undefined
snprintf(buf, sizeof buf, "%s/%s.%d", pathString, nameString, counter);
char *p = "Hello World!";
This defines pointer, and initializes it to point to string literal. Note that string literals are (normally) non-writable, so you really should have this instead:
const char *p = "Hello World!";
Use this when you need pointer to non-modifiable string.
In contrast to a above, if you have const char *p2 = p; or do return p;, these are fine, because pointer points to the string literal in program's constant data, and is valid for the whole execution of the program.
The string literals themselves, text withing double quotes, the actual bytes making up the strings, are created at compile time and normally placed with other constant data within the application. And then string literal in code concretely means address of this constant data blob.
char * strings are read-only. They cannot be modified while char[] strings can be.
char *str = "hello";
str[0] = 't'; // This is an illegal operation
Whereas
char str[] = "hello"; str[0] = 't'; // Legal, string becomes tello
Let's say I have a char pointer called string1 that points to the first character in the word "hahahaha". I want to create a char[] that contains the same string that string1 points to.
How come this does not work?
char string2[] = string1;
"How come this does not work?"
Because that's not how the C language was defined.
You can create a copy using strdup() [Note that strdup() is not ANSI C]
Refs:
C string handling
strdup() - what does it do in C?
1) pointer string2 == pointer string1
change in value of either will change the other
From poster poida
char string1[] = "hahahahaha";
char* string2 = string1;
2) Make a Copy
char string1[] = "hahahahaha";
char string2[11]; /* allocate sufficient memory plus null character */
strcpy(string2, string1);
change in value of one of them will not change the other
What you write like this:
char str[] = "hello";
... actually becomes this:
char str[] = {'h', 'e', 'l', 'l', 'o'};
Here we are implicitly invoking something called the initializer.
Initializer is responsible for making the character array, in the above scenario.
Initializer does this, behind the scene:
char str[5];
str[0] = 'h';
str[1] = 'e';
str[2] = 'l';
str[3] = 'l';
str[4] = 'o';
C is a very low level language. Your statement:
char str[] = another_str;
doesn't make sense to C.
It is not possible to assign an entire array, to another in C. You have to copy letter by letter, either manually or using the strcpy() function.
In the above statement, the initializer does not know the length of the another_str array variable. If you hard code the string instead of putting another_str, then it will work.
Some other languages might allow to do such things... but you can't expect a manual car to switch gears automatically. You are in charge of it.
In C you have to reserve memory to hold a string.
This is done automatically when you define a constant string, and then assign to a char[].
On the other hand, when you write string2 = string1,
what you are actually doing is assigning the memory addresses of pointer-to-char objects. If string2 is declares as char* (pointer-to-char), then it is valid the assignment:
char* string2 = "Hello.";
The variable string2 now holds the address of the first character of the constanta array of char "Hello.".
It is fine, also, to write string2 = string1 when string2 is a char* and string1 is a char[].
However, it is supposed that a char[] has constant address in memory. Is not modifiable.
So, it is not allowed to write sentences like that:
char string2[];
string2 = (something...);
However, you are able to modify the individual characters of string2, because is an array of characters:
string2[0] = 'x'; /* That's ok! */
I want to initialize arbitrary large strings. It is null terminated string of characters, but I cannot print its content.
Can anybody tell me why?
char* b;
char c;
b = &c;
*b = 'm';
*(b+1) = 'o';
*(b+2) = 'j';
*(b+3) = 'a';
*(b+4) = '\0';
printf("%s\n", *b);
Your solution invokes undefined behaviour, because *(b+1) etc. are outside the bounds of the stack variable c. So when you write to them, you're writing all over memory that you don't own, which can cause all sorts of corruption. Also, you need to printf("%s\n", b) (printf expects a pointer for %s).
The solution depends on what you want to do. You can initialize a pointer to a string literal:
const char *str1 = "moja";
You can initialize a character array:
char str2[] = "moja";
This can also be written as:
char str2[] = { 'm', 'o', 'j', 'a', '\0' };
Or you can manually assign the values of your string:
char *str3 = malloc(5);
str3[0] = 'm';
str3[1] = 'o';
str3[2] = 'j';
str3[3] = 'a';
str3[4] = '\0';
...
free(str3);
This might result in a segmentation fault! *(b+1), *(b+2) etc refer to unallocated areas. First allocate memory and then write into it!
b doesn't have enough space to hold all those characters. Allocate enough space using malloc or declare b as a char array.
Your code is not safe at all! You allocate only 1 char on the stack with char c; but write 5 chars into it! this will give you a stack-overflow which can be very dangerous.
Another thing: you mustn't dereference the string when printing it: printf("%s\n", b);
Why not simply write const char *b = "mojo";?
You need to assign memory space for it, either with malloc or using a static array. Here, in your code, you're using the address of just one character to store at the addresses of that characters, and others following it. This is not defined.
Note, step by step, what you're doing. First, you assign the pointer to point to a single char space in memory. Then, by using *b = 'm' you set that memory to the character 'm'. But then, you access to the next memory position (that is undefined, because no memory is reserved for that position) to store another value. This won't work.
How to do it?
You have two options. For example:
char *b;
char c[5];
b = &c[0];
*b = 'm';
... //rest of your code
This will work because you have space for 5 chars in c. The other option is to directly assign memory for b using malloc:
char * b = (char*) malloc(5);
*b = 'm';
... // rest of your code
Finally, maybe not what you want, but you can either initialize a char array or pointer using a string literal:
char c[] = "hello";
const char* b = "abcdef";
The printf does not print because it expect a char*, so you should pass b, not *b.
To initialize a pointer to a string constant you can do something like:
char *s1 = "A string"
or
char s2[] = "Another string"
or allocate a buffer with char *b = malloc(5) and then write to this buffer (as you did, or with the string functions)
what you did was taking the address of a single char memory location and then write past to it, possibly overwriting other variables or instructions and thus possibly leading to data corruption or crash.
If you write the following instead of your printf, it will print the first character.
printf("%c\n", *b);
In order for you to have arbitrarily large strings, you will need to use a library such as bstring or write one of your own.
This is because, in C one needs to get memory, use it and free it accordingly. b in your case only points to a character unless you allocate memory to it using malloc. And for malloc you have to specify a fixed size.
For arbitrarily large string, you need to encapsulate the actual pointer to character in a data structure of your own, and then manage its size according to the length of the string that is to be set as its value.
printf("%s\n", *b);
why *?
printf("%s\n", b);
is what you want
While coding a simple function to remove a particular character from a string, I fell on this strange issue:
void str_remove_chars( char *str, char to_remove)
{
if(str && to_remove)
{
char *ptr = str;
char *cur = str;
while(*ptr != '\0')
{
if(*ptr != to_remove)
{
if(ptr != cur)
{
cur[0] = ptr[0];
}
cur++;
}
ptr++;
}
cur[0] = '\0';
}
}
int main()
{
setbuf(stdout, NULL);
{
char test[] = "string test"; // stack allocation?
printf("Test: %s\n", test);
str_remove_chars(test, ' '); // works
printf("After: %s\n",test);
}
{
char *test = "string test"; // non-writable?
printf("Test: %s\n", test);
str_remove_chars(test, ' '); // crash!!
printf("After: %s\n",test);
}
return 0;
}
What I don't get is why the second test fails?
To me it looks like the first notation char *ptr = "string"; is equivalent to this one: char ptr[] = "string";.
Isn't it the case?
The two declarations are not the same.
char ptr[] = "string"; declares a char array of size 7 and initializes it with the characters s ,t,r,i,n,g and \0. You are allowed to modify the contents of this array.
char *ptr = "string"; declares ptr as a char pointer and initializes it with address of string literal "string" which is read-only. Modifying a string literal is an undefined behavior. What you saw(seg fault) is one manifestation of the undefined behavior.
Strictly speaking a declaration of char *ptr only guarantees you a pointer to the character type. It is not unusual for the string to form part of the code segment of the compiled application which would be set read-only by some operating systems. The problem lies in the fact that you are making an assumption about the nature of the pre-defined string (that it is writeable) when, in fact, you never explicitly created memory for that string yourself. It is possible that some implementations of compiler and operating system will allow you to do what you've attempted to do.
On the other hand the declaration of char test[], by definition, actually allocates readable-and-writeable memory for the entire array of characters on the stack in this case.
As far as I remember
char ptr[] = "string";
creates a copy of "string" on the stack, so this one is mutable.
The form
char *ptr = "string";
is just backwards compatibility for
const char *ptr = "string";
and you are not allowed (in terms of undefined behavior) to modify it's content.
The compiler may place such strings in a read only section of memory.
char *test = "string test"; is wrong, it should have been const char*. This code compiles just because of backward comptability reasons. The memory pointed by const char* is a read-only memory and whenever you try to write to it, it will invoke undefined behavior. On the other hand char test[] = "string test" creates a writable character array on stack. This like any other regualr local variable to which you can write.
Good answer #codaddict.
Also, a sizeof(ptr) will give different results for the different declarations.
The first one, the array declaration, will return the length of the array including the terminating null character.
The second one, char* ptr = "a long text..."; will return the length of a pointer, usually 4 or 8.
char *str = strdup("test");
str[0] = 'r';
is proper code and creates a mutable string. str is assigned a memory in the heap, the value 'test' filled in it.
void reverse(char *str){
int i,j;
char temp;
for(i=0,j=strlen(str)-1; i<j; i++, j--){
temp = *(str + i);
*(str + i) = *(str + j);
*(str + j) = temp;
printf("%c",*(str + j));
}
}
int main (int argc, char const *argv[])
{
char *str = "Shiv";
reverse(str);
printf("%s",str);
return 0;
}
When I use char *str = "Shiv" the lines in the swapping part of my reverse function i.e str[i]=str[j] dont seem to work, however if I declare str as char str[] = "Shiv", the swapping part works? What is the reason for this. I was a bit puzzled by the behavior, I kept getting the message "Bus error" when I tried to run the program.
When you use char *str = "Shiv";, you don't own the memory pointed to, and you're not allowed to write to it. The actual bytes for the string could be a constant inside the program's code.
When you use char str[] = "Shiv";, the 4(+1) char bytes and the array itself are on your stack, and you're allowed to write to them as much as you please.
The char *str = "Shiv" gets a pointer to a string constant, which may be loaded into a protected area of memory (e.g. part of the executable code) that is read only.
char *str = "Shiv";
This should be :
const char *str = "Shiv";
And now you'll have an error ;)
Try
int main (int argc, char const *argv[])
{
char *str = malloc(5*sizeof(char)); //4 chars + '\0'
strcpy(str,"Shiv");
reverse(str);
printf("%s",str);
free(str); //Not needed for such a small example, but to illustrate
return 0;
}
instead. That will get you read/write memory when using pointers. Using [] notation allocates space in the stack directly, but using const pointers doesn't.
String literals are non-modifiable objects in both C and C++. An attempt to modify a string literal always results in undefined behavior. This is exactly what you observe when you get your "Bus error" with
char *str = "Shiv";
variant. In this case your 'reverse' function will make an attempt to modify a string literal. Thus, the behavior is undefined.
The
char str[] = "Shiv";
variant will create a copy of the string literal in a modifiable array 'str', and then 'reverse' will operate on that copy. This will work fine.
P.S. Don't create non-const-qualified pointers to string literals. You first variant should have been
const char *str = "Shiv";
(note the extra 'const').
String literals (your "Shiv") are not modifiable.
You assign to a pointer the address of such a string literal, then you try to change the contents of the string literal by dereferencing the pointer value. That's a big NO-NO.
Declare str as an array instead:
char str[] = "Shiv";
This creates str as an array of 5 characters and copies the characters 'S', 'h', 'i', 'v' and '\0' to str[0], str[1], ..., str[4]. The values in each element of str are modifiable.
When I want to use a pointer to a string literal, I usually declare it const. That way, the compiler can help me by issuing a message when my code wants to change the contents of a string literal
const char *str = "Shiv";
Imagine you could do the same with integers.
/* Just having fun, this is not C! */
int *ptr = &5; /* address of 5 */
*ptr = 42; /* change 5 to 42 */
printf("5 + 1 is %d\n", *(&5) + 1); /* 6? or 43? :) */
Quote from the Standard:
6.4.5 String literals
...
6 ... If the program attempts to modify such an array [a string literal], the behavior is undefined.
char *str is a pointer / reference to a block of characters (the string). But its sitting somewhere in a block of memory so you cannot just assign it like that.
Interesting that I've never noticed this. I was able to replicate this condition in VS2008 C++.
Typically, it is a bad idea to do in-place modification of constants.
In any case, this post explains this situation pretty clearly.
The first (char[]) is local data you can edit
(since the array is local data).
The second (char *) is a local pointer to
global, static (constant) data. You
are not allowed to modify constant
data.
If you have GNU C, you can compile
with -fwritable-strings to keep the
global string from being made
constant, but this is not recommended.