I discovered that the strcpy function simply copied one string to anther. For instance, if a program included the following statements:
char buffer[10];
----------
strcpy(buffer, "Dante");
the string "Dante" would be placed in the array buffer[]. The string would include the terminating null(\0), which means that six characters in all would be copied. I'm just wondering why we can't achieve the same effect more simply by saying?:
buffer = "Dante";
If I'm not mistaken, C treats strings far more like arrays than BASIC does.
Because strings aren't a data type in C. "Strings" are char*s, so when you try to assign them, you are simply copying the memory address and not the characters into the buffer.
Consider this:
char* buffer;
buffer = malloc(20);
buffer = "Dante";
Why should it magically place "Dante" into the buffer?
Because an "array" in C is a chunk of memory. There's no pointer to assign to.
If you're asking why the syntax isn't like that: Well, what would happen if the lengths were different?
Address of array is not changeable. In other sense you can consider,
char buffer[20];
is a compile time equivalent of,
char* const buffer = (char*)malloc(20);
Now, since buffer address cannot be changed, one cannot perform operations like:
buffer = "Dante"; // error 'buffer' address is not modifiable
you can't do buffer = "Dante" because there is no "string" data type in C, only arrays.
Now you CAN however do...
char buffer[10] = "Dante";
but if the length of the string is unknown you can do...
char buffer[] = "Dante123456678";
but only during initialization, meaning you can't do...
char buffer[];
buffer = "Dante";
When you write buffer, it is treated as a pointer to the first element of the array. Of course, *buffer or buffer[0] is the first element. Since buffer is just a pointer, you can't assign a whole bunch of data like "Dante" to it.
If char buffer[128]; was the declaration, then buffer refers the first location of the array, so buffer = "Dante" will try assigning the address of the string onto the address which is stored in the array. Memory address location in the array are read only and statically assigned when compiled. So you cannot do buffer = "Dante" as it attempts to change an address location which points to some other location fixed at compile time. These locations cannot be written.
If char *buffer; was the declaration, then buffer is a pointer to a char type variable which can point to a starring chunk of memory blocks. So when you do buffer = "Dante" the address of the string into buffer. When will show the string when you print it, as it points to a string starting address, which is compiled in and stored in the executable. But this is not a preferred method.
If you do char arr[] = "Dante"; the string "Dante" gets stored in the .text section where you can write, so arr[0] = 'K' something like this , ie modification is possible.
If you do char *arr = "Dante"; then the string "Dante" gets stored in the .rodata or similar location which is not writeable. This is because string literals are not modifiable as per the standards.
Related
A code I am reviewing uses following string assignment
char *str;
str ="";
The coder then uses this 'str' to temporarily hold a string like.
str = "This is a message";
fwrite(str, 1 ,strlen(str), fp);
Then this str is used again at some other place to assign a new string with a similar use.
I know that this works, I want to find out how exactly does this work.
How can you declare a char pointer and make it point to a string like that?
What could be the maximum string length such a pointer can hold?
Where is this string stored? Is it automatically malloc'd?
A pointer doesn't "hold" a string, it just points to where the original string is located. In this case the string literal is kept as part of the program and the pointer is set to it; when you reassign the pointer, you're not making any copies, just setting the pointer to a different address.
The maximum size of the string is thus the maximum size of a string literal, which will depend on the compiler and the amount of available program space.
If you want to actually make a copy of a string, first you must allocate some storage for it which must be one greater than the number of characters. Then use strcpy to make the copy.
This string is statically contained in the object module. You don't need to malloc memory for such strings, because they already have a memory assigned by the compiler. Because of this, you also can not free such a pointer. If you look with an hex editor in your exe file, you can see that such a string is contained inside it, as opposed to a dynamically allocated string, which only exists in memory as long as the executable runs.
The maximum size of such a string depends on your compiler.
char* is just a pointer to a char (or series of them).
You can have it pointing to any "string" you like. In the examples given, they are just changing the pointer's value (i.e. what str points to).
char *name;
name="some string";//the name points to the address of location of
the string.
is not similar to :
char str[];
str="some string";// remember this type of statement
won't work,because str is about to store characters but you are
assigning the pointer.
A constant character string always represents a pointer to that string.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
What is the difference between char s[] and char *s in C?
Question about pointers and strings in C
I'm reading about the strings in C and I'm confused. I can "declare" strings in two ways:
char *str = "This is string";
char str2[20] = "This is string";
What is the difference between the two declarations? When would char str2[20] be preferred over char *str?
In C, strings are represented as sequences of chars, with a NULL character (aka 0, '\0'). They are stored in memory and you work with a way of referencing it. You have identified the two ways of referencing it, a char *, which is a pointer to a sequence of chars and an array, which is an immediate string of chars as an actual variable. Be aware that the string "abc" is 4 bytes long as there is an additional NULL character to represent the end of the string.
In addition to this, you are actually assigning strings in the example, which also involves the strings to given at compile-time.
So two questions. First is about how you represent strings (char * vs char[]) the second is about compile-time strings.
To come to your examples:
The first one creates a constant string in the text of the program and a pointer to it. Depending on the compiler it may be stored anywhere. It is the equivalent of mallocing a string and storing a pointer to it, except you must not change the contents of the memory. It is a char *, so you can change the pointer to point to somewhere else, like another malloced string or to the start of an array that you defined in example 2.
The second creates a char array (which a way of representing a string). The array is stored and allocated on the stack for the duration of the function, and you may change the contents. Because it is not a pointer, you cannot change it to point to a different string.
char *str = "This is string";
Puts the string in the constant data section (also known as .rdata) of the program.This data can't be modified.
char str2[20] = "This is string";
In this type of declaration data is preferably stored in the stack area of the program, if declared inside the function scope and in data section if declared in global scope.This data can be modified.
So if you have a necessity to modify data then use the second approach.
C has no strings. All there is is char arrays. And arrays in C are just pointers to the first element.
The easiest way of doing it is in fact your first variant. Not specifying an explicit length of the array for literals will save you from accidentally doing something like
char[3] = "abc";
C strings are constant in memory, so:
char *str = "This is string";
Stores "This is string" in memory and it is not mutable, you can only assign another address to str.
However
char str2[20] = "This si string";
is a shorthand of
char String2[20]={'T','h','i','s',' ','s','i',' ','s','t','r','i','n','g','\0'};
which does not stores a string in memory, stores independent bytes.
If you want to use constant strings like messages, then use first line.
If you want to use and manipulate strings like in a word processor then use second.
Regards
char *str = "This is string"; - This will keep the string in text segment as a read only data and it will store the address in the local pointer variable str.
str[0] = 'a'; //This will leads to crash, because strings are in read only segment.
printf("%d",sizeof(str)); //This will print 4(in 32bit m/c) or 8(in 64 bit m/c)
char str2[20] = "This is string"; - This will keep the string as character array in local stack.
str2[0] = 'a'; //This will change the first character to a
printf("%d",sizeof(str2)); //This will print 20
I've been having trouble the past couple hours on a problem I though I understood. Here's my trouble:
void cut_str(char* entry, int offset) {
strcpy(entry, entry + offset);
}
char works[128] = "example1\0";
char* doesnt = "example2\0";
printf("output:\n");
cut_str(works, 2);
printf("%s\n", works);
cut_str(doesnt, 2);
printf("%s\n", doesnt);
// output:
// ample1
// Segmentation: fault
I feel like there's something important about char*/char[] that I'm not getting here.
The difference is in that doesnt points to memory that belongs to a string constant, and is therefore not writable.
When you do this
char works[128] = "example1\0";
the compiler copies the content of a non-writable string into a writable array. \0 is not required, by the way.
When you do this, however,
char* doesnt = "example2\0";
the compiler leaves the pointer pointing to a non-writable memory region. Again, \0 will be inserted by compiler.
If you are using gcc, you can have it warn you about initializing writable char * with string literals. The option is -Wwrite-strings. You will get a warning that looks like this:
warning: initialization discards qualifiers from pointer target type
The proper way to declare your doesnt pointer is as follows:
const char* doesnt = "example2\0";
The types char[] and char * are quite similar, so you are right about that. The difference lies in what happens when objects of the types are initialized. Your object works, of type char[], has 128 bytes of variable storage allocated for it on the stack. Your object doesnt, of type char *, has no storage on the stack.
Where exactly the string of doesnt is stored is not specified by the C standard, but most likely it is stored in a nonmodifiable data segment loaded when your program is loaded for execution. This isn't variable storage. Thus the segfault when you try to vary it.
This allocates 128 bytes on the stack, and uses the name works to refer to its address:
char works[128];
So works is a pointer to writable memory.
This creates a string literal, which is in read-only memory, and uses the name doesnt to refer to its address:
char * doesnt = "example2\0";
You can write data to works, because it points to writable memory. You can't write data to doesnt, because it points to read-only memory.
Also, note that you don't have to end your string literals with "\0", since all string literals implicitly add a zero byte to the end of the string.
Why does the following happen:
char s[2] = "a";
strcpy(s,"b");
printf("%s",s);
--> executed without problem
char *s = "a";
strcpy(s,"b");
printf("%s",s);
--> segfault
Shouldn't the second variation also allocate 2 bytes of memory for s and thus have enough memory to copy "b" there?
char *s = "a";
The pointer s is pointing to the string literal "a". Trying to write to this has undefined behaviour, as on many systems string literals live in a read-only part of the program.
It is an accident of history that string literals are of type char[N] rather than const char[N] which would make it much clearer.
Shouldn't the second variation also allocate 2 bytes of memory for s and thus have enough memory to copy "b" there?
No, char *s is pointing to a static memory address containing the string "a" (writing to that location results in the segfault you are experiencing) whereas char s[2]; itself provides the space required for the string.
If you want to manually allocate the space for your string you can use dynamic allocation:
char *s = strdup("a"); /* or malloc(sizeof(char)*2); */
strcpy(s,"b");
printf("%s",s); /* should work fine */
Don't forget to free() your string afterwards.
Altogather a different way/answer : I think the mistake is that you are not creating a variable the pointer has to point to and hence the seg fault.
A rule which I follow : Declaring a pointer variable will not create the type of variable, it points at. It creates a pointer variable. So in case you are pointing to a string buffer you need to specify the character array and a buffer pointer and point to the address of the character array.
Say I do initialize an array like this:
char a[]="test";
What's the purpose of this? We know that the content might immediately get changed, as it is not allocated, and thus why would someone initialize the array like this?
To clarify, this code is wrong for the reasons stated by the OP:
char* a;
strcpy(a, "test");
As noted by other responses, the syntax "char a[] = "test"" does not actually do this. The actual effect is more like this:
char a[5];
strcpy(a, "test");
The first statement allocates a fixed-size static character array on the local stack, and the second initializes the data in it. The size is determined from the length of the string literal. Like all stack variables, the array is automatically deallocated on exiting the function scope.
The purpose of this is to allocate five bytes on the stack or the static data segment (depending on where this snippet occurs), then set those bytes to the array {'t','e','s','t','\0'}.
This syntax allocates an array of five characters on the stack, equivalent to this:
char a[5] = "test";
The elements of the array are initialized to the characters in the string given as an initializer. The size of the array is determined to fit the size of the initializer.
It is allocated. That code is equivalent to
char a[5]="test";
When you leave the number out, the compiler simply calculates the length of the character-array for you by counting the characters in the literal string. It then adds 1 to the length in order to include the necessary terminating nul '\0'. Hence, the length of the array is 5 while the length of the string is 4.
The array is allocated; its size is inferred from the string literal being used to initialize it (5 chars total).
Had you written
char *a = "test";
then all that would get allocated would be a pointer variable, not an array (the string literal "test" lives in memory such that it's allocated at program startup and held until the program exits).