difference between char* and char[] with strcpy() - c

I've been having trouble the past couple hours on a problem I though I understood. Here's my trouble:
void cut_str(char* entry, int offset) {
strcpy(entry, entry + offset);
}
char works[128] = "example1\0";
char* doesnt = "example2\0";
printf("output:\n");
cut_str(works, 2);
printf("%s\n", works);
cut_str(doesnt, 2);
printf("%s\n", doesnt);
// output:
// ample1
// Segmentation: fault
I feel like there's something important about char*/char[] that I'm not getting here.

The difference is in that doesnt points to memory that belongs to a string constant, and is therefore not writable.
When you do this
char works[128] = "example1\0";
the compiler copies the content of a non-writable string into a writable array. \0 is not required, by the way.
When you do this, however,
char* doesnt = "example2\0";
the compiler leaves the pointer pointing to a non-writable memory region. Again, \0 will be inserted by compiler.
If you are using gcc, you can have it warn you about initializing writable char * with string literals. The option is -Wwrite-strings. You will get a warning that looks like this:
warning: initialization discards qualifiers from pointer target type
The proper way to declare your doesnt pointer is as follows:
const char* doesnt = "example2\0";

The types char[] and char * are quite similar, so you are right about that. The difference lies in what happens when objects of the types are initialized. Your object works, of type char[], has 128 bytes of variable storage allocated for it on the stack. Your object doesnt, of type char *, has no storage on the stack.
Where exactly the string of doesnt is stored is not specified by the C standard, but most likely it is stored in a nonmodifiable data segment loaded when your program is loaded for execution. This isn't variable storage. Thus the segfault when you try to vary it.

This allocates 128 bytes on the stack, and uses the name works to refer to its address:
char works[128];
So works is a pointer to writable memory.
This creates a string literal, which is in read-only memory, and uses the name doesnt to refer to its address:
char * doesnt = "example2\0";
You can write data to works, because it points to writable memory. You can't write data to doesnt, because it points to read-only memory.
Also, note that you don't have to end your string literals with "\0", since all string literals implicitly add a zero byte to the end of the string.

Related

I feel confusion about bus error in string (C)

I feel confusion about the swap two characters in one string with C.
It works well when I set it as an array:
char strBase[8] = "acbdefg";
in this case I could swap any character.
But it trigger the bus error when I set it as a string:
char *strBase = "acbdefg";
Thanks a lot for anyone could explain it or give me some hint!
The difference here is that
char *strBase = "acbdefg";
will place acbdefg in the read-only parts of the memory and making strBase a pointer to that, making any writing operation on this memory illegal.
It has no name and has static storage duration (meaning that it lives for the entire life of the program); and
a variable of type pointer-to-char, called strBase, which is initialised with the location of the first character in that unnamed, read-only array.
While doing:
char strBase[8] = "acbdefg";
puts the literal string in read-only memory and copies the string to newly allocated memory on the stack.
So this array is allocated in memory, and how long it lives for, depends on where the declaration appears. If the declaration is within a function, it will live until the end of the block that it is declared in, and almost certainly be allocated on the stack; if it's outside a function, it will probably be stored within an "initialized data segment" that is loaded from the executable file into write able memory when the program is run.
Making
strBase[0] = 'x';
legal.
Your problem is one of memory allocation. You need space to store your characters. When you wrote:
char strBase[8] = "acbdefg";
you created automatic storage (often called the stack) and initialized it with a string of characters. But when you wrote:
char *strBase = "acbdefg";
you created a pointer and pointed it at a constant string. The compiler puts that in a part of memory that is marked as read-only. If you try to change that it will result in a memory access violation.
Instead you could do something like:
const char* strData = "acbdefg";
int size = 1024;
char *strBase = (char*)malloc(size);
strncpy(strBase, strData, size);
ProcessString(strBase);
free(strBase);
The most likely cause is that
char strBase[8] = "abcdefg";
causes the compiler to reserve memory for an eight-character array, and initializes it with the value "abcdefg\0". In contrast,
char *strBase = "abcdefg";
only reserves memory for a pointer, initialized with the address of the string. "abcdefg" is a constant string, and as a result, the compiler stores it in a section of memory that gets marked read-only. Attempting to modify read-only memory causes a CPU fault.
Your compiler should be giving you a warning about const mismatch in the second case. Alternatively, your compiler may have a setting that changes the read-only-ness of constant strings.

How memory allocation works for char* without explicitly allocating

In the following (legal) c code, there is no memory explicitly allotted to the pointer p.
AFAIK, i cannot get an int *p to point to a value 5 without explicitly allocating memory.
int main()
{
char *p;
p = "Hello";
return 0;
}
How's char* pointers different
Where's the memory allocated for "Hello"
When you put literal strings such as "Hello" in your program, the compiler creates an array of characters in the data area of the program (called the Data Segment).
When you assign:
p="Hello";
the compiler takes the address of the string literal in the data segment and puts it in the pointer variable p.
Notice that string literals are different to numeric literals. The type of a string literal is const char[] - which you can assign to a char* pointer. An integer literal is just type int.
Also note that a numeric literal does not need to be stored in the data segment - in most cases such literals are placed directly in the machine code instructions. Thus there is no address which you could point p towards.
If you tried to do this (as per your comment):
int *p;
*p = 5;
what you are actually saying is that you want to store the number 5 into the location pointed to by p (which would be undefined in this case, since we never set p to anything). You would probably get a segfault.
If you tried to do this:
int *p;
p = 5;
what you would be telling the compiler to do is to convert the value 5 into a pointer to an integer and store that in p, with the result that the pointer p now points at address 5. And you would probably get a warning like this:
t.c:7: warning: assignment makes pointer from integer without a cast
In other words, you are trying to convert an integer to a pointer - probably not what you thought you were trying to do.
In C, string literal like "Hello" will be exaluated to a char * type value, i.e. its base address, you could use it to initialize a pointer to char.
In the runtime, string literals usually is located in some read-only memory segmentation.
By the way, it looks like you confused pointer variable and the address pointed to by a pointer variable.
By statements like char *p;, you defined a pointer variable, compiler will allocate the necessary memory to store this variable. But in this time, this pointer has not been initialized, so it does not have a determined value, which means it could point to anywhere.
By statements like p = "Hello";, you assigned pointer p a value, now it points to some determined memory address, this memory segmentation could be allocated by compiler, or be allocated by yourself, for example p = malloc(...);.
p is pointing to string literal Hello where memory is allocated from read only data section. And the moment when you modify it you get a Undefined Behavior and may result in to segmentation fault.
C standard says,
String literals - An ordinary string literal has type “array of n const char” and static storage duration (3.7)
The jobs was done for you by the compileer. It allocates space in the data segment and stores the string "Hello" and assigns the base address of this array to the pointer that is p1 in your case.
However this array "Hello" will be a const array. You will not be able to modify the data once it created.
If you try to modify the data then you will get unexpected results.

Is char pointer address initialization necessary in C?

I'm learning C programming in a self-taught fashion. I know that numeric pointer addresses must always be initialized, either statically or dynamically.
However, I haven't read about the compulsory need of initializing char pointer addresses yet.
For example, would this code be correct, or is a pointer address initialization needed?
char *p_message;
*p_message = "Pointer";
I'm not entirely sure what you mean by "numeric pointer" as opposed to "char pointer". In C, a char is an integer type, so it is an arithmetic type. In any case, initialization is not required for a pointer, regardless of whether or not it's a pointer to char.
Your code has the mistake of using *p_message instead of p_message to set the value of the pointer:
*p_message = "Pointer" // Error!
This wrong because given that p_message is a pointer to char, *p_message should be a char, not an entire string. But as far as the need for initializing a char pointer when first declared, it's not a requirement. So this would be fine:
char *p_message;
p_message = "Pointer";
I'm guessing part of your confusion comes from the fact that this would not be legal:
char *p_message;
*p_message = 'A';
But then, that has nothing to do with whether or not the pointer was initialized correctly. Even as an initialization, this would fail:
char *p_message = 'A';
It is wrong for the same reason that int *a = 5; is wrong. So why is that wrong? Why does this work:
char *p_message;
p_message = "Pointer";
but this fail?
char *p_message;
*p_message = 'A';
It's because there is no memory allocated for the 'A'. When you have p_message = "Pointer", you are assigning p_message the address of the first character 'P' of the string literal "Pointer". String literals live in a different memory segment, they are considered immutable, and the memory for them doesn't need to be specifically allocated on the stack or the heap.
But chars, like ints, need to be allocated either on the stack or the heap. Either you need to declare a char variable so that there is memory on the stack:
char myChar;
char *pChar;
pChar = &myChar;
*pChar = 'A';
Or you need to allocate memory dynamically on the heap:
char* pChar;
pChar = malloc (1); // or pChar = malloc (sizeof (char)), but sizeof(char) is always 1
*pChar = 'A';
So in one sense char pointers are different from int or double pointers, in that they can be used to point to string literals, for which you don't have to allocate memory on the stack (statically) or heap (dynamically). I think this might have been your actual question, having to do with memory allocation rather than initialization.
If you are really asking about initialization and not memory allocation: A pointer variable is no different from any other variable with regard to initialization. Just as an uninitialized int variable will have some garbage value before it is initialized, a pointer too will have some garbage value before it is initialized. As you know, you can declare a variable:
double someVal; // no initialization, will contain garbage value
and later in the code have an assignment that sets its value:
someVal = 3.14;
Similarly, with a pointer variable, you can have something like this:
int ary [] = { 1, 2, 3, 4, 5 };
int *ptr; // no initialization, will contain garbage value
ptr = ary;
Here, ptr is not initialized to anything, but is later assigned the address of the first element of the array.
Some might say that it's always good to initialize pointers, at least to NULL, because you could inadvertently try to dereference the pointer before it gets assigned any actual (non-garbage) value, and dereferencing a garbage address might cause your program to crash, or worse, might corrupt memory. But that's not all that different from the caution to always initialize, say, int variables to zero when you declare them. If your code is mistakenly using a variable before setting its value as intended, I'm not sure it matters all that much whether that value is zero, NULL, or garbage.
Edit. OP asks in a comment: You say that "String literals live in a different memory segment, they are considered immutable, and the memory for them doesn't need to be specifically allocated on the stack or the heap", so how does allocation occur?
That's just how the language works. In C, a string literal is an element of the language. The C11 standard specifies in §6.4.5 that when the compiler translates the source code into machine language, it should transform any sequence of characters in double quotes to a static array of char (or wchar_t if they are wide characters) and append a NUL character as the last element of the array. This array is then considered immutable. The standard says: If the program attempts to modify such an array, the behavior is undefined.
So basically, when you have a statement like:
char *p_message = "Pointer";
the standard requires that the double-quoted sequence of characters "Pointer" be implemented as a static, immutable, NUL-terminated array of char somewhere in memory. Typically implementations place such string literals in a read-only area of memory such as the text block (along with program instructions). But this is not required. The exact way in which a given implementation handles memory allocation for this array / NUL terminated sequence of char / string literal is up to the particular compiler. However, because this array exists somewhere in memory, you can have a pointer to it, so the above statement does work legally.
An analogy with function pointers might be useful. Just as the code for a function exists somewhere in memory as a sequence of instructions, and you can have a function pointer that points to that code, but you cannot change the function code itself, so also the string literal exists in memory as a sequence of char and you can have a char pointer that points to that string, but you cannot change the string literal itself.
The C standard specifies this behavior only for string literals, not for character constants like 'A' or integer constants like 5. Setting aside memory to hold such constants / non-string literals is the programmer's responsibility. So when the compiler comes across statements like:
char *charPtr = 'A'; // illegal!
int *intPtr = 5; // illegal!
the compiler does not know what to do with them. The programmer has not set aside such memory on the stack or the heap to hold those values. Unlike with string literals, the compiler is not going to set aside any memory for them either. So these statements are illegal.
Hopefully this is clearer. If not, please comment again and I'll try to clarify some more.
Initialisation is not needed, regardless of what type the pointer points to. The only requirement is that you must not attempt to use an uninitialised pointer (that has never been assigned to) for anything.
However, for aesthetic and maintenance reasons, one should always initialise where possible (even if that's just to NULL).
First of all, char is a numeric type, so the distinction in your question doesn't make sense. As written, your example code does not even compile:
char *p_message;
*p_message = "Pointer";
The second line is a constraint violation, since the left-hand side has arithmetic type and the right-hand side has pointer type (actually, originally array type, but it decays to pointer type in this context). If you had written:
char *p_message;
p_message = "Pointer";
then the code is perfectly valid: it makes p_message point to the string literal. However, this may or may not be what you want. If on the other hand you had written:
char *p_message;
*p_message = 'P';
or
char *p_message;
strcpy(p_message, "Pointer");
then the code would be invoking undefined behavior by either (first example) applying the * operator to an invalid pointer, or (second example) passing an invalid pointer to a standard library function which expects a valid pointer to an object able to store the correct number of characters.
not needed, but is still recommended for a clean coding style.
Also the code you posted is completely wrong and won't work, but you know that and only wrote that as a quick example, right?

string manipulations in C

Following are some basic questions that I have with respect to strings in C.
If string literals are stored in read-only data segment and cannot be changed after initialisation, then what is the difference between the following two initialisations.
char *string = "Hello world";
const char *string = "Hello world";
When we dynamically allocate memory for strings, I see the following allocation is capable enough to hold a string of arbitary length.Though this allocation work, I undersand/beleive that it is always good practice to allocate the actual size of actual string rather than the size of data type.Please guide on proper usage of dynamic allocation for strings.
char *str = (char *)malloc(sizeof(char));
scanf("%s",str);
printf("%s\n",str);
1.what is the difference between the following two initialisations.
The difference is the compilation and runtime checking of the error as others already told about this.
char *string = "Hello world";--->stored in read only data segment and
can't be changed,but if you change the value then compiler won't give
any error it only comes at runtime.
const char *string = "Hello world";--->This is also stored in the read
only data segment with a compile time checking as it is declared as
const so if you are changing the value of string then you will get an
error at compile time ,which is far better than a failure at run time.
2.Please guide on proper usage of dynamic allocation for strings.
char *str = (char *)malloc(sizeof(char));
scanf("%s",str);
printf("%s\n",str);
This code may work some time but not always.The problem comes at run-time when you will get a segmentation fault,as you are accessing the area of memory which is not own by your program.You should always very careful in this dynamic memory allocation as it will leads to very dangerous error at run time.
You should always allocate the amount of memory you need correctly.
The most error comes during the use of string.You should always keep in mind that there is a '\0' character present at last of the string and during the allocation its your responsibility to allocate memory for this.
Hope this helps.
what is the difference between the following two initialisations.
String literals have type char* for legacy reasons. It's good practice to only point to them via const char*, because it's not allowed to modify them.
I see the following allocation is capable enough to hold a string of arbitary length.
Wrong. This allocation only allocates memory for one character. If you tried to write more than one byte into string, you'd have a buffer overflow.
The proper way to dynamically allocate memory is simply:
char *string = malloc(your_desired_max_length);
Explicit cast is redundant here and sizeof(char) is 1 by definition.
Also: Remember that the string terminator (0) has to fit into the string too.
In the first case, you're explicitly casting the char* to a const one, meaning you're disallowing changes, at the compiler level, to the characters behind it. In C, it's actually undefined behaviour (at runtime) to try and modify those characters regardless of their const-ness, but a string literal is a char *, not a const char *.
In the second case, I see two problems.
The first is that you should never cast the return value from malloc since it can mask certain errors (especially on systems where pointers and integers are different sizes). Specifically, unless there is an active malloc prototype in place, a compiler may assume that it returns an int rather than the correct void *.
So, if you forget to include stdlib.h, you may experience some funny behaviour that the compiler couldn't warn you about, because you told it with an explicit cast that you knew what you were doing.
C is perfectly capable of implicit casting between the void * returned from malloc and any other pointer type.
The second problem is that it only allocates space for one character, which will be the terminating null for a string.
It would be better written as:
char *string = malloc (max_str_size + 1);
(and don't ever multiply by sizeof(char), that's a waste of time - it's always 1).
The difference between the two declarations is that the compiler will produce an error (which is much preferable to a runtime failure) if an attempt to modify the string literal is made via the const char* declared pointer. The following code:
const char* s = "hello"; /* 's' is a pointer to 'const char'. */
*s = 'a';
results in the VC2010 emitted the following error:
error C2166: l-value specifies const object
An attempt to modify the string literal made via the char* declared pointer won't be detected until runtime (VC2010 emits no error), the behaviour of which is undefined.
When malloc()ing memory for storing of strings you must remember to allocate one extra char for storing the null terminator as all (or nearly all) C string handling functions require the null terminator. For example, to allocate a buffer for storing "hello":
char* p = malloc(6); /* 5 for "hello" and 1 for null terminator. */
sizeof(char) is guaranteed to be 1 so is unrequired and it is not necessary to cast the return value of malloc(). When p is no longer required remember to free() the allocated memory:
free(p);
Difference between the following two initialisations.
first, char *string = "Hello world";
- "Hello world" stored in stack segment as constant string and its address is assigned to pointer'string' variable.
"Hello world" is constant. And you can't do string[5]='g', and doing this will cause a segmentation fault.
Where as 'string' variable itself is not constant. And you can change its binding:
string= "Some other string"; //this is correct, no segmentation fault
const char *string = "Hello world";
Again "Hello world" stored in stack segment as constant string and its address is assigned to 'string' variable.
And string[5]='g', and this cause segmentation fault.
No use of const keyword here!
Now,
char *string = (char *)malloc(sizeof(char));
Above declaration same as first one but this time you are assignment is dynamic from Heap segment (not from stack)
The code:
char *string = (char *)malloc(sizeof(char));
Will not hold a string of arbitrary length. It will allocate a single character and return a pointer to char character. Note that a pointer to a character and a pointer to what you call a string are the same thing.
To allocate space for a string you must do something like this:
char *data="Hello, world";
char *copy=(char*)malloc(strlen(data)+1);
strcpy(copy,data);
You need to tell malloc exactly how many bytes to allocate. The +1 is for the null terminator that needs to go onto the end.
As for literal string being stored in a read-only segment, this is an implementation issue, although is pretty much always the case. Most C compilers are pretty relaxed about const'ing access to these strings, but attempting to modify them is asking for trouble, so you should always declare them const char * to avoid any issues.
That particular allocation may appear to work as there's probably plenty of space in the program heap, but it doesn't. You can verify it by allocating two "arbitrary" strings with the proposed method and memcpy:ing some long enough string to the corresponding addresses. In the best case you see garbage, in the worst case you'll have segmentation fault or assert from malloc or free.

Turbo C strcpy library function

I discovered that the strcpy function simply copied one string to anther. For instance, if a program included the following statements:
char buffer[10];
----------
strcpy(buffer, "Dante");
the string "Dante" would be placed in the array buffer[]. The string would include the terminating null(\0), which means that six characters in all would be copied. I'm just wondering why we can't achieve the same effect more simply by saying?:
buffer = "Dante";
If I'm not mistaken, C treats strings far more like arrays than BASIC does.
Because strings aren't a data type in C. "Strings" are char*s, so when you try to assign them, you are simply copying the memory address and not the characters into the buffer.
Consider this:
char* buffer;
buffer = malloc(20);
buffer = "Dante";
Why should it magically place "Dante" into the buffer?
Because an "array" in C is a chunk of memory. There's no pointer to assign to.
If you're asking why the syntax isn't like that: Well, what would happen if the lengths were different?
Address of array is not changeable. In other sense you can consider,
char buffer[20];
is a compile time equivalent of,
char* const buffer = (char*)malloc(20);
Now, since buffer address cannot be changed, one cannot perform operations like:
buffer = "Dante"; // error 'buffer' address is not modifiable
you can't do buffer = "Dante" because there is no "string" data type in C, only arrays.
Now you CAN however do...
char buffer[10] = "Dante";
but if the length of the string is unknown you can do...
char buffer[] = "Dante123456678";
but only during initialization, meaning you can't do...
char buffer[];
buffer = "Dante";
When you write buffer, it is treated as a pointer to the first element of the array. Of course, *buffer or buffer[0] is the first element. Since buffer is just a pointer, you can't assign a whole bunch of data like "Dante" to it.
If char buffer[128]; was the declaration, then buffer refers the first location of the array, so buffer = "Dante" will try assigning the address of the string onto the address which is stored in the array. Memory address location in the array are read only and statically assigned when compiled. So you cannot do buffer = "Dante" as it attempts to change an address location which points to some other location fixed at compile time. These locations cannot be written.
If char *buffer; was the declaration, then buffer is a pointer to a char type variable which can point to a starring chunk of memory blocks. So when you do buffer = "Dante" the address of the string into buffer. When will show the string when you print it, as it points to a string starting address, which is compiled in and stored in the executable. But this is not a preferred method.
If you do char arr[] = "Dante"; the string "Dante" gets stored in the .text section where you can write, so arr[0] = 'K' something like this , ie modification is possible.
If you do char *arr = "Dante"; then the string "Dante" gets stored in the .rodata or similar location which is not writeable. This is because string literals are not modifiable as per the standards.

Resources