What is the difference of these array declarations? [duplicate] - c

This question already has answers here:
Difference between char* and char[]
(8 answers)
String Literals
(3 answers)
Closed 9 years ago.
#include <stdio.h>
#include <string.h>
int main(void){
char s1[30]="abcdefghijklmnopqrstuvwxyz";
printf("%s\n",s1);
printf("%s",memset(s1,'b',7));
getch();
return 0;
}
Above code works but when I create s1 array like this,
char *s1="abcdefghijklmnopqrstuvwxyz";
it does not give any errors in compile time but fails to run in runtime.
I am using Visual Studio 2012.
Do you know why?
I found prototype of memset is:
void *memset( void *s, int c, size_t n );

char s1[30] allocates a writable memory segment to store the contents of the array, char *s1="Sisi is an enemy of Egypt."; doesn't - the latter only sets a pointer to the address of a string constant, which the compiler will typically place in a read-only section of the object code.

String literals gets space in "read-only-data" section which gets mapped into the process space as read-only (So you can't change it).

char s1[30]="abcdefghijklmnopqrstuvwxyz";
This declares s1 as array of type char, and initialized it.
char *s1="abcdefghijklmnopqrstuvwxyz";
Will place "abcdefghijklmnopqrstuvwxyz" in the read-only parts of the memory and making a pointer to that.
However modifying s1 through memset yields an undefined behavior.

An very good question!.
If you make gcc output the assembly, and compare the output, you could find out the answer, and the following is why:
char s1[30]="abcdef";
when defined in a function, it will define an array of char, and s1 is the name of the array. The program will allocate memory in stack.
when define globally, it will define a object in the program, and the object is not an read only data.
char* s2 = "abcdef"; only define a point of char, which point to an const char stored in the .rodata, that is the read only data in the program.
To make program run efficiently and make the progress management easily, the compiler will generate different sections for a given code. Constant chars, like the char* s2 = "abcdef"; and the printf format string will be stored in the .section rodata section. After loading into the main memory by the loader of the OS, this section will be marked as read only. That is why when you use memset to modify the memory which s2 point to, it will complain Segment fault.
Here is an explaination: Difference between char* and char[]

Related

Explanation on how does the memcpy function behaves? [duplicate]

This question already has answers here:
No out of bounds error
(7 answers)
Closed 5 years ago.
#include <stdio.h>
#include <string.h>
char lists[10][25];
char name[10];
void main()
{
scanf("%s" , lists[0]);
memcpy(name , lists[0], 25);
printf("%s\n" , name);
}
In the above code I am predefining the size of character array "name" as 10.
Now when I gave the input as :
Input - abcdefghijklmnopqrstuvwxy
The output I got was the same string : abcdefghijklmnopqrstuvwxy
Should'nt I get the output as : abcdefghij ???
how this is becoming possible even though the size of array is limited to 10?
Because it doesn't know the size of the allocated memory it's writing into, and you got away with where the extra data got written. You might not on another platform, or using a different compiler, or different optimisation settings.
When passing the size parameter to memcpy (), it's a good idea to take the size of the destination memory into account.
When using char arrays, if you want to be safer about not overrunning memory, you can use strncpy (). It'll take care of inserting the trailing NULL in the right place.
To start with, arrays are pointers. In C there are no length checks like on Java for example.
When you write char a[2]; the OS gives you space on the memory for 2 chars.
For example, let the memory be
|1|2|3|4|5|6|7|8|9|10|11|12|
a
a is a pointer to the address 1. The a[0] = 0; is equal with *(a+0) = 0, meaning write 0 to the address a + offset 0.
So if you try to write to an address that you have not allocated, unexpected things can happen.
For example, lets say we have char a[2];char b[2]; and the memory map is
|1|2|3|4|5|6|7|8|9|10|11|12|
a b
Then the a[2] = 0 is equal to b[0] = 0. But if this address is an address of an other program, then a segmentation error will be raised.
Try the program (it may work with no optimizations of the compiler):
#include <stdio.h>
#include <string.h>
char a[4];
char b[4];
void main()
{
scanf("%s" , a); // input "12345678"
printf("%s\n" , b); // print "5678"
}
memcpy just copies from an address to the other the size of data you said.
In your example, you were luky because all the addresses you accessed where assigned to your program (inside your's memory page).
In C/C++ you are responsible to handle the memory correctly. Also, keep in mind that strings end at the char \0 so inside an array char str[10]; we usually have tops 9 chars and the \0.

C program trying to modify a location in text segment

#include<stdio.h>
#include<string.h>
int main()
{
char str[]="somethingisbetterthannothing";
memset(str,'-',6);
puts(str);
return 0;
}
I was expecting a segmentation fault when this program is executed .
But it printed
------ingisbetterthannothing
Does this indicate that the string literal is not stored in read only text segment?
char str[]="somethingisbetterthannothing";
There is no string literal in the above line.
There is only an initializer for a char-array.
char* str = "somethingisbetterthannothing";
That would be a pointer to a string-literal.
And there is no guarantee what happens when you try to modify a string literal.
It is literally and explicitly Undefined Behavior (BTW: The example in the accepted answer is modifying a string-literal).
When strings are declared as character arrays, they are stored like other types of arrays in C.
For eg if str[] is an auto variable then string is stored in stack segment, if it’s a global or static variable then stored in data segment.
Using character pointer strings can be stored in two ways:
---> Read only string in a shared segment.
When string value is directly assigned to a pointer, in most of the compilers, it’s stored in a read only block (generally in data segment) that is shared among functions.
char *str = "vinay";
"vinay" is stored in a shared read only location, but pointer str is stored in a read-write memory
--> dynamic allocation using malloc
If you try to modify string literals or constants segmentation fault will get since change of RO section not allowed. But in your case you changing WR section i.e stack section so obviuolsy no error

Memory allocation for pointers [duplicate]

This question already has answers here:
Where does the compound/string literals get stored in the memory?
(4 answers)
Closed 9 years ago.
I have a following program:
#include<stdio.h>
char * test()
{
char * rt="hello";
return rt;
}
void main()
{
printf("\n %s \n", test());
}
here, it correctly prints hello while if rt is not a constant pointer like char rt[]="hello" it prints garbage. My understanding, in latter stack gets freed when function returns from test but what happens with above case? where does the memory for char *rt is allocated?
Extending above part, If I try to do char rt[]="hello" and if i try rt="hrer" it throws error while with char *rt="hello" it works fine but we can not change particular character in a string with later case. Please help me to understand it. Thanks.
Your string "hello" is what is called a string literal. It resides in what is called the data segment of your program, which is a region of memory. Any other string literals throughout your code are put there as well. This region is loaded once, and never destroyed.
So, your pointer rt is pointing somewhere into that region.
But, if you declare char rt[] = "hello", you are declaring an array named rt[] on the stack and the array is 6 bytes long (hello + null terminator). When the function returns, the stack is freed, so, this memory will be invalid.
Some more information on string literals are here: C String literals: Where do they go?
The string Hello gets set into the read portion of the executable part of the program. The function returns a pointer to that.
The use of an array (in the second case) means that it gets copied onto the stack.
End of the function it gets zapped - hence garbage

C Duration of strings, constants, compound literals, and why not, the code itself

I didn't remember where I read, that If I pass a string to a function like.
char *string;
string = func ("heyapple!");
char *func (char *string) {
char *p
p = string;
return p;
}
printf ("%s\n", string);
The string pointer continue to be valid because the "heyapple!" is in memory, it IS in the code the I wrote, so it never will be take off, right?
And about constants like 1, 2.10, 'a'?
And compound literals?
like If I do it:
func (1, 'a', "string");
Only the string will be all of my program execution, or the constans will be too?
For example I learned that I can take the address of string doing it
&"string";
Can I take the address of the constants literals? like 1, 2.10, 'a'?
I'm passing theses to functions arguments and it need to have static duration like strings without the word static.
Thanks a lot.
This doesn't make a whole lot of sense.
Values that are not pointers cannot be "freed", they are values, they can't go away.
If I do:
int c = 1;
The variable 'c' is not a pointer, it cannot do anything else than contain an integer value, to be more specific it can't NOT contain an integer value. That's all it does, there are no alternatives.
In practice, the literals will be compiled into the generated machine-code, so that somewhere in the code resulting from the above will be something like
load r0, 1
Or whatever the assembler for the underlying instruction set looks like. The '1' is a part of the instruction encoding, it can't go away.
Make sure you distinguish between values and pointers to memory. Pointers are themselves values, but a special kind of value that contains an address to memory.
With char* hello = "hello";, there are two things happening:
the string "hello" and a null-terminator are written somewhere in memory
a variable named hello contains a value which is the address to that memory
With int i = 0; only one thing happens:
a variable named i contains the value 0
When you pass around variables to functions their values are always copied. This is called pass by value and works fine for primitive types like int, double, etc. With pointers this is tricky because only the address is copied; you have to make sure that the contents of that address remain valid.
Short answer: yes. 1 and 'a' stick around due to pass by value semantics and "hello" sticks around due to string literal allocation.
Stuff like 1, 'a', and "heyapple!" are called literals, and they get stored in the compiled code, and in memory for when they have to be used. If they remain or not in memory for the duration of the program depends on where they are declared in the program, their size, and the compiler's characteristics, but you can generally assume that yes, they are stored somewhere in memory, and that they don't go away.
Note that, depending on the compiler and OS, it may be possible to change the value of literals, inadvertently or purposely. Many systems store literals in read-only areas (CONST sections) of memory to avoid nasty and hard-to-debug accidents.
For literals that fit into a memory word, like ints and chars it doesn't matter how they are stored: one repeats the literal throughout the code and lets the compiler decide how to make it available. For larger literals, like strings and structures, it would be bad practice to repeat, so a reference should be kept.
Note that if you use macros (#define HELLO "Hello!") it is up to the compiler to decide how many copies of the literal to store, because macro expansion is exactly that, a substitution of macros for their expansion that happens before the compiler takes a shot at the source code. If you want to make sure that only one copy exists, then you must write something like:
#define HELLO "Hello!"
char* hello = HELLO;
Which is equivalent to:
char* hello = "Hello!";
Also note that a declaration like:
const char* hello = "Hello!";
Keeps hello immutable, but not necessarily the memory it points to, because of:
char h = (char) hello;
h[3] = 'n';
I don't know if this case is defined in the C reference, but I would not rely on it:
char* hello = "Hello!";
char* hello2 = "Hello!"; // is it the same memory?
It is better to think of literals as unique and constant, and treat them accordingly in the code.
If you do want to modify a copy of a literal, use arrays instead of pointers, so it's guaranteed a different copy of the literal (and not an alias) is used each time:
char hello[] = "Hello!";
Back to your original question, the memory for the literal "heyapple!" will be available (will be referenceable) as long as a reference is kept to it in the running code. Keeping a whole module (a loadable library) in memory because of a literal may have consequences on overall memory use, but that's another concern (you could also force the unloading of the module that defines the literal and get all kind of strange results).
First,it IS in the code the I wrote, so it never will be take off, right? my answer is yes. I recommend you to have a look at the structure of ELF or runtime structure of executable. The position that the string literal stored is implementation dependent, in gcc, string literal is store in the .rdata segment. As the name implies, the .rdata is read-only. In your code
char *p
p = string;
the pointer p now point to an address in a readonly segment, so even after the end of function call, that address is still valid. But if you try to return a pointer point to a local variable then it is dangerous and may cause hard-to-find bugs:
int *func () {
int localVal = 100;
int *ptr = localVal;
return p;
}
int val = func ();
printf ("%d\n", val);
after the execution of func, as the stack space of func is retrieve by the c runtime, the memory address where localVal was stored will no longer guarantee to hold the original localVal value. It can be overidden by operation following the func.
Back to your question title
-
string literal have static duration.
As for "And about constants like 1, 2.10, 'a'?"
my answer is NO, your can't get address of a integer literal using &1. You may be confused by the name 'integer constant', but 1,2.10,'a' is not right value ! They do not identify a memory place,thus, they don't have duration, a variable contain their value can have duration
compound literals, well, I am not sure about this.

C's strtok() and read only string literals

char *strtok(char *s1, const char *s2)
repeated calls to this function break string s1 into "tokens"--that is
the string is broken into substrings,
each terminating with a '\0', where
the '\0' replaces any characters
contained in string s2. The first call
uses the string to be tokenized as s1;
subsequent calls use NULL as the first
argument. A pointer to the beginning
of the current token is returned; NULL
is returned if there are no more
tokens.
Hi,
I have been trying to use strtok just now and found out that if I pass in a char* into s1, I get a segmentation fault. If I pass in a char[], strtok works fine.
Why is this?
I googled around and the reason seems to be something about how char* is read only and char[] is writeable. A more thorough explanation would be much appreciated.
What did you initialize the char * to?
If something like
char *text = "foobar";
then you have a pointer to some read-only characters
For
char text[7] = "foobar";
then you have a seven element array of characters that you can do what you like with.
strtok writes into the string you give it - overwriting the separator character with null and keeping a pointer to the rest of the string.
Hence, if you pass it a read-only string, it will attempt to write to it, and you get a segfault.
Also, becasue strtok keeps a reference to the rest of the string, it's not reeentrant - you can use it only on one string at a time. It's best avoided, really - consider strsep(3) instead - see, for example, here: http://www.rt.com/man/strsep.3.html (although that still writes into the string so has the same read-only/segfault issue)
An important point that's inferred but not stated explicitly:
Based on your question, I'm guessing that you're fairly new to programming in C, so I'd like to explain a little more about your situation. Forgive me if I'm mistaken; C can be hard to learn mostly because of subtle misunderstanding in underlying mechanisms so I like to make things as plain as possible.
As you know, when you write out your C program the compiler pre-creates everything for you based on the syntax. When you declare a variable anywhere in your code, e.g.:
int x = 0;
The compiler reads this line of text and says to itself: OK, I need to replace all occurrences in the current code scope of x with a constant reference to a region of memory I've allocated to hold an integer.
When your program is run, this line leads to a new action: I need to set the region of memory that x references to int value 0.
Note the subtle difference here: the memory location that reference point x holds is constant (and cannot be changed). However, the value that x points can be changed. You do it in your code through assignment, e.g. x = 15;. Also note that the single line of code actually amounts to two separate commands to the compiler.
When you have a statement like:
char *name = "Tom";
The compiler's process is like this: OK, I need to replace all occurrences in the current code scope of name with a constant reference to a region of memory I've allocated to hold a char pointer value. And it does so.
But there's that second step, which amounts to this: I need to create a constant array of characters which holds the values 'T', 'o', 'm', and NULL. Then I need to replace the part of the code which says "Tom" with the memory address of that constant string.
When your program is run, the final step occurs: setting the pointer to char's value (which isn't constant) to the memory address of that automatically created string (which is constant).
So a char * is not read-only. Only a const char * is read-only. But your problem in this case isn't that char *s are read-only, it's that your pointer references a read-only regions of memory.
I bring all this up because understanding this issue is the barrier between you looking at the definition of that function from the library and understanding the issue yourself versus having to ask us. And I've somewhat simplified some of the details in the hopes of making the issue more understandable.
I hope this was helpful. ;)
I blame the C standard.
char *s = "abc";
could have been defined to give the same error as
const char *cs = "abc";
char *s = cs;
on grounds that string literals are unmodifiable. But it wasn't, it was defined to compile. Go figure. [Edit: Mike B has gone figured - "const" didn't exist at all in K&R C. ISO C, plus every version of C and C++ since, has wanted to be backward-compatible. So it has to be valid.]
If it had been defined to give an error, then you couldn't have got as far as the segfault, because strtok's first parameter is char*, so the compiler would have prevented you passing in the pointer generated from the literal.
It may be of interest that there was at one time a plan in C++ for this to be deprecated (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/1996/N0896.asc). But 12 years later I can't persuade either gcc or g++ to give me any kind of warning for assigning a literal to non-const char*, so it isn't all that loudly deprecated.
[Edit: aha: -Wwrite-strings, which isn't included in -Wall or -Wextra]
In brief:
char *s = "HAPPY DAY";
printf("\n %s ", s);
s = "NEW YEAR"; /* Valid */
printf("\n %s ", s);
s[0] = 'c'; /* Invalid */
If you look at your compiler documentation, odds are there is a option you can set to make those strings writable.

Resources