I feel confusion about bus error in string (C) - c

I feel confusion about the swap two characters in one string with C.
It works well when I set it as an array:
char strBase[8] = "acbdefg";
in this case I could swap any character.
But it trigger the bus error when I set it as a string:
char *strBase = "acbdefg";
Thanks a lot for anyone could explain it or give me some hint!

The difference here is that
char *strBase = "acbdefg";
will place acbdefg in the read-only parts of the memory and making strBase a pointer to that, making any writing operation on this memory illegal.
It has no name and has static storage duration (meaning that it lives for the entire life of the program); and
a variable of type pointer-to-char, called strBase, which is initialised with the location of the first character in that unnamed, read-only array.
While doing:
char strBase[8] = "acbdefg";
puts the literal string in read-only memory and copies the string to newly allocated memory on the stack.
So this array is allocated in memory, and how long it lives for, depends on where the declaration appears. If the declaration is within a function, it will live until the end of the block that it is declared in, and almost certainly be allocated on the stack; if it's outside a function, it will probably be stored within an "initialized data segment" that is loaded from the executable file into write able memory when the program is run.
Making
strBase[0] = 'x';
legal.

Your problem is one of memory allocation. You need space to store your characters. When you wrote:
char strBase[8] = "acbdefg";
you created automatic storage (often called the stack) and initialized it with a string of characters. But when you wrote:
char *strBase = "acbdefg";
you created a pointer and pointed it at a constant string. The compiler puts that in a part of memory that is marked as read-only. If you try to change that it will result in a memory access violation.
Instead you could do something like:
const char* strData = "acbdefg";
int size = 1024;
char *strBase = (char*)malloc(size);
strncpy(strBase, strData, size);
ProcessString(strBase);
free(strBase);

The most likely cause is that
char strBase[8] = "abcdefg";
causes the compiler to reserve memory for an eight-character array, and initializes it with the value "abcdefg\0". In contrast,
char *strBase = "abcdefg";
only reserves memory for a pointer, initialized with the address of the string. "abcdefg" is a constant string, and as a result, the compiler stores it in a section of memory that gets marked read-only. Attempting to modify read-only memory causes a CPU fault.
Your compiler should be giving you a warning about const mismatch in the second case. Alternatively, your compiler may have a setting that changes the read-only-ness of constant strings.

Related

How do strings and char arrays work in C?

No guides I've seen seem to explain this very well.
I mean, you can allocate memory for a char*, or write char[25] instead? What's the difference? And then there are literals, which can't be manipulated? What if you want to assign a fixed string to a variable? Like, stringVariable = "thisIsALiteral", then how do you manipulate it afterwards?
Can someone set the record straight here? And in the last case, with the literal, how do you take care of null-termination? I find this very confusing.
EDIT: The real problem seems to be that as I understand it, you have to juggle these different constructs in order to accomplish even simple things. For instance, only char * can be passed as an argument or return value, but only char[] can be assigned a literal and modified. I feel like it's obvious that we frequently/always needs to be able to do both, and that's where my pitfall is.
What is the difference between an allocated char* and char[25]?
The lifetime of a malloc-ed string is not limited by the scope of its declaration. In plain language, you can return malloc-ed string from a function; you cannot do the same with char[25] allocated in the automatic storage, because its memory will be reclaimed upon return from the function.
Can literals be manipulated?
String literals cannot be manipulated in place, because they are allocated in read-only storage. You need to copy them into a modifiable space, such as static, automatic, or dynamic one, in order to manipulate them. This cannot be done:
char *str = "hello";
str[0] = 'H'; // <<== WRONG! This is undefined behavior.
This will work:
char str[] = "hello";
str[0] = 'H'; // <<=== This is OK
This works too:
char *str = malloc(6);
strcpy(str, "hello");
str[0] = 'H'; // <<=== This is OK too
How do you take care of null termination of string literals?
C compiler takes care of null termination for you: all string literals have an extra character at the end, filled with \0.
Your question refers to three different constructs in C: char arrays, char pointers allocated on the heap, and string literals. These are all different is subtle ways.
Char arrays, which you get by declaring char foo[25] inside a function, that memory is allocated on the stack, it exists only within the scope you declared it, but exactly 25 bytes have been allocated for you. You may store whatever you want in those bytes, but if you want a string, don't forget to use the last byte to null-terminate it.
Character pointers defined with char *bar only hold a pointer to some unallocated memory. To make use of them you need to point them to something, either an array as before (bar = foo) or allocate space bar = malloc(sizeof(char) * 25);. If you do the latter, you should eventually free the space.
String literals behave differently depending on how you use them. If you use them to initialize a char array char s[] = "String"; then you're simply declaring an array large enough to exactly hold that string (and the null terminator) and putting that string there. It's the same as declaring a char array and then filling it up.
On the other hand, if you assign a string literal to a char * then the pointer is pointing to memory you are not supposed to modify. Attempting to modify it may or may not crash, and leads to undefined behavior, which means you shouldn't do it.
Since other aspects are answered already, i would only add to the question "what if you want the flexibility of function passing using char * but modifiability of char []"
You can allocate an array and pass the same array to a function as char *. This is called pass by reference and internally only passes the address of actual array (precisely address of first element) instead of copying the whole. The other effect is that any change made inside the function modifies the original array.
void fun(char *a) {
a[0] = 'y'; // changes hello to yello
}
main() {
char arr[6] = "hello"; // Note that its not char * arr
fun(arr); // arr now contains yello
}
The same could have been done for an array allocated with malloc
char * arr = malloc(6);
strcpy(arr, "hello");
fun(arr); // note that fun remains same.
Latter you can free the malloc memory
free(arr);
char * a, is just a pointer that can store address, which might be of a single variable or might be the first element of an array. Be ware, we have to assign to this pointer before actually using it.
Contrary to that char arr[SIZE] creates an array on the stack i.e. it also allocates SIZE bytes. So you can directly access arr[3] (assuming 3 is less than SIZE) without any issues.
Now it makes sense to allow assigning any address to a, but not allowing this for arr, since there is no other way except using arr to access its memory.

Pointer to C style string isn't allocated

#include <stdio.h>
int main() {
char *t = "hello world";
puts(t);
//printf("%s", t);
t = "goodbye world";
puts(t);
}
The memory for t isn't allocated, so why I don't get segfault when I run it?
t is a pointer, so you are just making t point to another string.
Because string literals are allocated statically in your program memory - you do not need to allocate memory for them explicitly.
Memory is allocated for t; enough memory is allocated for it to hold a pointer (typically, 4 bytes in a 32-bit program, 8 bytes in a 64-bit program).
Further, the initialization for t ensures that the pointer points somewhere:
char *t = "hello world";
String literals are also allocated space, somewhere. Often, that is in the read-only portion of memory, so you should really be using const char *t = "hello world"; and even if you don't use the explicit const, you should not try to modify the string that t points at. But it is the compiler's problem to ensure that t is pointing somewhere valid.
Similarly, after the assignment:
t = "goodbye, Cruel World!";
the variable is pointing at space allocated by the compiler. As long as you don't abuse it (and your code doesn't), this is fine.
What would get you into trouble is something like this:
char *t;
puts(t); // t is uninitialized; undefined behaviour
t = 0; // equivalently, t = NULL;
puts(t); // t contains the null pointer; undefined behaviour
The uninitialized local variable could contain any value; you cannot predict reliably what will happen. On some machines, it may contain a null pointer and cause a crash, but that is not something you can rely on.
A null pointer doesn't point at anything valid, so dereferencing a null pointer leads to undefined behaviour, and very often that undefined behaviour is a crash. (Classically, on DEC VAX machines, you got a zero byte at address zero instead of a crash. That led (in part) to one of Henry Spencer's Ten Commandments "All the world is not a VAX" — and also "Thou shalt not follow the NULL pointer, for chaos and madness await thee at its end.")
So, in your program, memory is allocated for t and t is initialized and assigned to point to (read-only) string constants, so there is no excuse for the program to crash.
t is here a pointer to the first character of an anonymous string, which can be in read-only memory. A good idea is to declare the pointer as pointer to const char :
const char *t = "hello world";
See also here.
All the memory the compiler needs to allocate for t is 4 bytes on a 32-bit system. Remember that it's just a pointer. In the first couple of lines it's pointing to "hello world", but after that you change it so it points to "goodbye world". C will have allocated enough memory for the strings you have defined and passes you the pointer so you can point to them. You don't need to worry about that. Also remember that these string are static and read-only, which means you can't safely say t[4] = 'b';.

difference between char* and char[] with strcpy()

I've been having trouble the past couple hours on a problem I though I understood. Here's my trouble:
void cut_str(char* entry, int offset) {
strcpy(entry, entry + offset);
}
char works[128] = "example1\0";
char* doesnt = "example2\0";
printf("output:\n");
cut_str(works, 2);
printf("%s\n", works);
cut_str(doesnt, 2);
printf("%s\n", doesnt);
// output:
// ample1
// Segmentation: fault
I feel like there's something important about char*/char[] that I'm not getting here.
The difference is in that doesnt points to memory that belongs to a string constant, and is therefore not writable.
When you do this
char works[128] = "example1\0";
the compiler copies the content of a non-writable string into a writable array. \0 is not required, by the way.
When you do this, however,
char* doesnt = "example2\0";
the compiler leaves the pointer pointing to a non-writable memory region. Again, \0 will be inserted by compiler.
If you are using gcc, you can have it warn you about initializing writable char * with string literals. The option is -Wwrite-strings. You will get a warning that looks like this:
warning: initialization discards qualifiers from pointer target type
The proper way to declare your doesnt pointer is as follows:
const char* doesnt = "example2\0";
The types char[] and char * are quite similar, so you are right about that. The difference lies in what happens when objects of the types are initialized. Your object works, of type char[], has 128 bytes of variable storage allocated for it on the stack. Your object doesnt, of type char *, has no storage on the stack.
Where exactly the string of doesnt is stored is not specified by the C standard, but most likely it is stored in a nonmodifiable data segment loaded when your program is loaded for execution. This isn't variable storage. Thus the segfault when you try to vary it.
This allocates 128 bytes on the stack, and uses the name works to refer to its address:
char works[128];
So works is a pointer to writable memory.
This creates a string literal, which is in read-only memory, and uses the name doesnt to refer to its address:
char * doesnt = "example2\0";
You can write data to works, because it points to writable memory. You can't write data to doesnt, because it points to read-only memory.
Also, note that you don't have to end your string literals with "\0", since all string literals implicitly add a zero byte to the end of the string.

Memory allocated in char * var; declaration

In C, declaring a char pointer like this
char* p="Hello";
allocates some memory for a string literal Hello\0. When I do this afterwards
p="FTW";
what happens to the memory allocated to Hello\0? Is the address p points to changed?
There is no dynamic memory allocation in either statement.
Those strings are stored in your executable, loaded in a (likely read-only) section of memory that will live as long as your process does.
The second assignment only changes what p points to. Nothing else happens.
The memory remains occupied by "Hello". It is lost (unless you have other references to it).
The address p is pointing to (the value of p) is changed of course.
In this case, "Hello" is created at compile time and is part of the binary. In most situation "Hello" is stored in read only memory. "FTW" is also part of the binary. Second assignment will only change the pointer.
in addition - "Hello" and "FTW" have static storge duration as Met have pointed out
It creates a string constant that cannot be modified and should be used as it is.
If you try doing
p[0]='m';
It would give segmentation fault since this is not string literal with allocated memory in which you can reassign and read back values.
what if
p = getbuffer();
getbuffer()
{
return buf = malloc(buf, size);
}
how can free this memory before allocating new memory to p! imagine that p should use getbuffer() many times.

C Are string literals created on the stack?

I'm a little bit confused about this expression:
char *s = "abc";
Does the string literal get created on the stack?
I know that this expression
char *s = (char *)malloc(10 * sizeof(char));
allocates memory on the heap and this expression
char s[] = "abc";
allocates memory on the stack, but I'm totally unsure what the first expression does.
Typically, the string literal "abc" is stored in a read only part of the executable. The pointer s would be created on the stack(or placed in a register, or just optimized away) - and point to that string literal which lives "elsewhere".
"abc"
String literals are stored in the __TEXT,__cstring (or rodata or whatever depends on the object format) section of your program, if string pooling is enabled. That means, it's neither on the stack, nor in the heap, but sticks in the read-only memory region near your code.
char *s = "abc";
This statement will be assign the memory location of the string literal "abc" to s, i.e. s points to a read-only memory region.
"Stacks" and "heaps" are implementation details and depend on the platform (all the world is not x86). From the language POV, what matters is storage class and extent.
String literals have static extent; storage for them is allocated at program startup and held until the program terminates. It is also assumed that string literals cannot be modified (attempting to do so invokes undefined behavior). Contrast this with local, block-scope (auto) variables, whose storage is allocated on block entry and released on block exit. Typically, this means that string literals are not stored in the same memory as block-scope variables.

Resources