I read that when you declare strings using a pointer, the pointer contains the memory address of the string literal. Therefore I expected to get the memory address from this code but rather I got some random numbers. Please help me understand why it didn't work.
int main()
{
char *hi = "Greeting!";
printf("%p",hi);
return(0);
}
If the pointer hi contains the memory address of the string literal, then why did it not display the memory address?
It did work. It's just that you can consider the address as being arbitrarily chosen by the C runtime. hi is a pointer set to the address of the capital G in your string. You own all the memory from hi up to and including the nul-terminator at the end of that string.
Also, use const char *hi = "Greeting!"; rather than char *: the memory starting at hi is read-only. Don't try to modify the string: the behaviour on attempting to do that is undefined.
The "random numbers" you got are the Memory addresses. They are not constant, since on each execution of your program, other Memory addresses are used.
A pointer could be represented in several ways. The format string "%p" "writes an implementation defined character sequence defining a pointer" link. In most cases, it's the pointed object's address interpreted as an appropriately sized unsigned integer, which looks like "a bunch of random number".
A user readable pointer representation is generally only useful for debugging. It allows you to compare two different representations (are the pointers the same?) and, in some cases, the relative order and distance between pointers (which pointer comes "first", and how far apart are they?). Interpreting pointers as integers works well in this optic.
It would be helpful to us if you could clarify what output you expected. Perhaps you expected pointers to be zero-based?
Note that, while some compilers might accept your example, it would be wiser to use const char *. Using a char * would allow you to try to modify your string literal, which is undefined behavior.
Related
In C99 a string is typically initialized by using the char* data type since there is no primitive "string" data type. This effectively creates an array of chars by storing the address of the first char in the variable:
FILE* out = fopen("out.txt", "w");
char* s = argv[1];
fwrite(s, 12, 1, out);
fclose(out);
//successfully prints out 12 characters from argv[1] as a consecutive string.
How does the compiler know that char* s is a string and not just the address of a singular char? If I use int* it will only allow one int, not an array of them. Why the difference?
My main focus is understanding how pointers, referencing and de-referencing work, but the whole char* keeps messing with my head.
How does the compiler know that char* s is a string and not just the address of a singular char?
It doesn't. As far as "the compiler" is concerned, char* s is a pointer to char.
On the other hand, there are many library functions that assume that a char* points to an element of a null-terminated sequence of char (see for example strlen, strcmp etc.).
Note that fwrite does not make this assumption. It requires that you tell it how many bytes you want to write (and that this number doesn't take you beyond the bounds of the buffer pointed at by the first argument.)
If I use int* it will only allow one int, not an array of them. Why the difference?
That is incorrect. The C language does not have a special case for char*. An int* can also point to an element of an array of int. In fact, you could write a library that uses 0 or another sentinel value to indicate the end of a sequence of int, and use it much in the same was as char* are used by convention.
In your code
fwrite(s, 12, 1, out);
is equivalent to writing
write 12 elements of size 1 byte, location starting from address s to the file pointed by out.
Here. a char is of one byte exactly, so you get the desired output.
How does the compiler know that char* s is a string and not just the address of a singular char?
Well, it does not (and does not need to). You asked to (read from s and) write 12 bytes, so it will do that. If the memory is inaccessible, that's a programming mistake. fwrite() itself won't handle that.
Beware:
If s is not allocated memory to be accessed upto s[11] (technically), it will be undefined behaviour. It's upto the programmer to pass the valid values as argument.
In case of int, the size is 4 bytes (usually, on 32 bit system) and printing byte-by-byte won't give you the desired result.
In that case, you need to make use of fprintf() to print formatted output.
Compiler won't have any idea other than char* is an address of a character. We can make it read the characters following by incrementing the address of the first character. The case is similar to any pointer int*,long* etc., compiler just treat a pointer as something points to an address of its type.
I'm learning C programming in a self-taught fashion. I know that numeric pointer addresses must always be initialized, either statically or dynamically.
However, I haven't read about the compulsory need of initializing char pointer addresses yet.
For example, would this code be correct, or is a pointer address initialization needed?
char *p_message;
*p_message = "Pointer";
I'm not entirely sure what you mean by "numeric pointer" as opposed to "char pointer". In C, a char is an integer type, so it is an arithmetic type. In any case, initialization is not required for a pointer, regardless of whether or not it's a pointer to char.
Your code has the mistake of using *p_message instead of p_message to set the value of the pointer:
*p_message = "Pointer" // Error!
This wrong because given that p_message is a pointer to char, *p_message should be a char, not an entire string. But as far as the need for initializing a char pointer when first declared, it's not a requirement. So this would be fine:
char *p_message;
p_message = "Pointer";
I'm guessing part of your confusion comes from the fact that this would not be legal:
char *p_message;
*p_message = 'A';
But then, that has nothing to do with whether or not the pointer was initialized correctly. Even as an initialization, this would fail:
char *p_message = 'A';
It is wrong for the same reason that int *a = 5; is wrong. So why is that wrong? Why does this work:
char *p_message;
p_message = "Pointer";
but this fail?
char *p_message;
*p_message = 'A';
It's because there is no memory allocated for the 'A'. When you have p_message = "Pointer", you are assigning p_message the address of the first character 'P' of the string literal "Pointer". String literals live in a different memory segment, they are considered immutable, and the memory for them doesn't need to be specifically allocated on the stack or the heap.
But chars, like ints, need to be allocated either on the stack or the heap. Either you need to declare a char variable so that there is memory on the stack:
char myChar;
char *pChar;
pChar = &myChar;
*pChar = 'A';
Or you need to allocate memory dynamically on the heap:
char* pChar;
pChar = malloc (1); // or pChar = malloc (sizeof (char)), but sizeof(char) is always 1
*pChar = 'A';
So in one sense char pointers are different from int or double pointers, in that they can be used to point to string literals, for which you don't have to allocate memory on the stack (statically) or heap (dynamically). I think this might have been your actual question, having to do with memory allocation rather than initialization.
If you are really asking about initialization and not memory allocation: A pointer variable is no different from any other variable with regard to initialization. Just as an uninitialized int variable will have some garbage value before it is initialized, a pointer too will have some garbage value before it is initialized. As you know, you can declare a variable:
double someVal; // no initialization, will contain garbage value
and later in the code have an assignment that sets its value:
someVal = 3.14;
Similarly, with a pointer variable, you can have something like this:
int ary [] = { 1, 2, 3, 4, 5 };
int *ptr; // no initialization, will contain garbage value
ptr = ary;
Here, ptr is not initialized to anything, but is later assigned the address of the first element of the array.
Some might say that it's always good to initialize pointers, at least to NULL, because you could inadvertently try to dereference the pointer before it gets assigned any actual (non-garbage) value, and dereferencing a garbage address might cause your program to crash, or worse, might corrupt memory. But that's not all that different from the caution to always initialize, say, int variables to zero when you declare them. If your code is mistakenly using a variable before setting its value as intended, I'm not sure it matters all that much whether that value is zero, NULL, or garbage.
Edit. OP asks in a comment: You say that "String literals live in a different memory segment, they are considered immutable, and the memory for them doesn't need to be specifically allocated on the stack or the heap", so how does allocation occur?
That's just how the language works. In C, a string literal is an element of the language. The C11 standard specifies in ยง6.4.5 that when the compiler translates the source code into machine language, it should transform any sequence of characters in double quotes to a static array of char (or wchar_t if they are wide characters) and append a NUL character as the last element of the array. This array is then considered immutable. The standard says: If the program attempts to modify such an array, the behavior is undefined.
So basically, when you have a statement like:
char *p_message = "Pointer";
the standard requires that the double-quoted sequence of characters "Pointer" be implemented as a static, immutable, NUL-terminated array of char somewhere in memory. Typically implementations place such string literals in a read-only area of memory such as the text block (along with program instructions). But this is not required. The exact way in which a given implementation handles memory allocation for this array / NUL terminated sequence of char / string literal is up to the particular compiler. However, because this array exists somewhere in memory, you can have a pointer to it, so the above statement does work legally.
An analogy with function pointers might be useful. Just as the code for a function exists somewhere in memory as a sequence of instructions, and you can have a function pointer that points to that code, but you cannot change the function code itself, so also the string literal exists in memory as a sequence of char and you can have a char pointer that points to that string, but you cannot change the string literal itself.
The C standard specifies this behavior only for string literals, not for character constants like 'A' or integer constants like 5. Setting aside memory to hold such constants / non-string literals is the programmer's responsibility. So when the compiler comes across statements like:
char *charPtr = 'A'; // illegal!
int *intPtr = 5; // illegal!
the compiler does not know what to do with them. The programmer has not set aside such memory on the stack or the heap to hold those values. Unlike with string literals, the compiler is not going to set aside any memory for them either. So these statements are illegal.
Hopefully this is clearer. If not, please comment again and I'll try to clarify some more.
Initialisation is not needed, regardless of what type the pointer points to. The only requirement is that you must not attempt to use an uninitialised pointer (that has never been assigned to) for anything.
However, for aesthetic and maintenance reasons, one should always initialise where possible (even if that's just to NULL).
First of all, char is a numeric type, so the distinction in your question doesn't make sense. As written, your example code does not even compile:
char *p_message;
*p_message = "Pointer";
The second line is a constraint violation, since the left-hand side has arithmetic type and the right-hand side has pointer type (actually, originally array type, but it decays to pointer type in this context). If you had written:
char *p_message;
p_message = "Pointer";
then the code is perfectly valid: it makes p_message point to the string literal. However, this may or may not be what you want. If on the other hand you had written:
char *p_message;
*p_message = 'P';
or
char *p_message;
strcpy(p_message, "Pointer");
then the code would be invoking undefined behavior by either (first example) applying the * operator to an invalid pointer, or (second example) passing an invalid pointer to a standard library function which expects a valid pointer to an object able to store the correct number of characters.
not needed, but is still recommended for a clean coding style.
Also the code you posted is completely wrong and won't work, but you know that and only wrote that as a quick example, right?
I am a newbie to C programming. And I am confused with the chaotic behavior of pointers. Specially when it comes to strings and arrays.
I know that I can't write like,
#include <stdio.h>
int main()
{
int *number=5;
printf("%d",*number);
}
Because clearly it will try to write to the 5th location of the memory.And that will crash the program.I have to initialize the "number".
But when it comes to strings I can write like,
#include <stdio.h>
int main()
{
char *name="xxxxx";
printf(name);
}
And it works too. So that means it implicitly initialize the "name" pointer.I also know that name=&name[0] But I have found that name=&name too. How can it be?
Because, to me it looks two variables with the same name. Can anybody tell me how strings are created in memory?(All this time time I assumed it creates name[0].....name[n-1] and another variable(a pointer) called "name", inside that we put the location of name[0].Seem to be I was wrong.)
PS:-My English may not be good and if somebody can give me a link regarding above matter that would be grateful.
C, like many programming languages, supports the concept of "literals" - special syntax that, when encountered, causes the compiler to create a value in a special way.
-2, for example, is an integer literal. When encountered, the compiler will treat it as a value of type int with the content of -2. "..." is a string literal - when encountered, the compiler allocates new space in a special memory area, fills it with data that corresponds to the chars you used inside the literal, adds a 0 at the end of that area, and finally uses a pointer to that area, of type char*, as the result of the literal expression. So char* s = "hello" is an assignment from something of type char* into a variable of type char* - completely legal.
You sneaked in another question here - why a == &a[0]. It's best to ask one question at a time, but the gist of it is that a[n] is identical to *((a)+(n)), and so:
&a[0] == &*(a+0) == a+0 == a
char *name="xxxxx";
This creates a char array(const) in memory, and it will assign the address of 1st element of it to name. char* array names are like pointers. &name[0] means address of 1st element and name (In c,cpp just the char * array name will provide you with the address of 1st element too (bcos that is what was assigned to name in the 1st place itself)) also gives the same result.
The notation name[i] is translated as *(name+i) so u actually have a base address name to which you add subscripts. (calculations are as per pointer arithmetic) . printf("%s", name) is designed to print from start address to \0 (which is appended to the end of strings created using char* a.k.a String literals)
Check this too.
That is because you those strings are actually arrays of characters. What you do with the line char *name = "xxxxx";: You construct an array of 6 character values (5 of them being 'x' and the last one being '\0'). name is a pointer to that array.
That's how strings are normally handles in C, you have some kind of sequence of characters, terminated with a '\0' character, to tell functions like printf where to stop processing the string.
Let's consider this piece of code:
char *name="xxxxx";
What happens here is that the string xxxxx is allocated memory and a pointer to that memory location,or the address of that string is passed to the pointer variable name.Or in other words you initialize name with that address.
And printf() is a variadic function (one that takes one fixed argument followed by a random number of arguments).The first argument to printf() is of type const char*.And the string identifier name when passed as an argument denotes the base address of that string.Hence you can use
printf(name);
It will simply output xxxxx
name=&name too. How can it be?--Well that's a justified question.
Let me explain first with a real-world analogy. Suppose you have the first house in a row of houses in a housing society.Suppose it's plot number 0.The other houses are on plots 1,2,3......Now suppose there is a pointer to your house, and there is another pointer to the whole housing society's row. Won't the two pointers have the same address, which will be plot 0?This is because a pointer signifies a single memory location.It's the type of the pointer that matters here.
Bringing this analogy to the string( array of characters), the name identifier only signifies the base address of the string, the address of its first character (Like the address of the first house).It is numerically same to the address of the whole string,which is (&name),which in my analogy is the ROW of houses.But they are of different types, one is of type char* and the other is of type char**.
Basicly what happens when the C-compiler see the expression
char *name = "xxxxx";
is, it will say. Hey "xxxxx" that's a constant string (which is an array of bytes terminated with a 0 byte), and put that in the resulting programs binary. Then it will substitute the string for the memory location, sort of like:
char *name = _some_secret_name_the_compiler_only_know;
where _some_secret_name_the_compiler_only_know is a pointer to the memory location where the string will live once the program gets executed. And get in with parsing the file.
In this example seems that both strings "jesus" are equals(same memory location).
printf("%p\n","jesus");
printf("%p\n","jesus");
Also note that:
printf("%p\n",&"jesus");
printf("%p\n","jesus");
prints the same, but:
char* ptrToString = "jesus";
char* ptrToString = &"jesus"; //ERROR
So i wanna know how an unassigned string is stored in memory and how to point it...
First off, why are "jesus" and &"jesus" the same: "jesus" is an array of type const char[6], and it decays to a pointer to the first element. Taking the address of the array gives you a pointer to an array, whose type is const char (*)[6]. However, the pointer to the array is numerically the same as the pointer to its first element (only the types differ).
This also explains why you have an error in the last line - type type is wrong. You need:
const char (*pj)[6] = &"jesus";
Finally, the question is whether repeated string literals have the same address or not. This is entirely up to the compiler. If it were very naive, it could store a separate copy for each occurrence of a string literal in the source code. If it is slightly cleverer, it'll only store one unique copy for each string literal. String literals are of course stored in memory somewhere, typically in a read-only data segment of the program image. Think of them as statically initialized global variables.
One more thing: Your original code is actually undefined behaviour, since %p expects a void * argument, and not a const char * or a const char (*)[6]. So the correct code is:
printf("%p\n%p\n", (void const *)"jesus", (void const *)&"jesus");
C is a carefully specified language and we can make many observations about your examples that may answer some questions.
Character literals are stored in memory as initialized data. They have type array of char.
They are not necessarily strings because nul bytes can be embedded with \0.
It is not required that identical character string literals be unique, but it's undefined what happens if a program tries to modify one. This effectively allows them to be distinct or "interned" as the implementation sees fit.
In order to make that last line work, you need:
char (*ptrToString)[] = &"jesus"; // now not an ERROR
I am printing out addresses and strings from the following two declarations and initializations:
char * strPtr = (char *) "This is a string, made on the fly.";
char charArray [] = "Chars in a char array variable.";
When printed, the following output occurs with wildly different addresses for the variables charArray and strPtr. The question is, "Why?"
Printing:
printf( "%10s%40s%20p\n", "strPtr", strPtr, &(*strPtr));
printf( "%10s%40s%20p\n", "charArray", charArray, charArray);
Output:
strPtr This is a string, made on the fly. 0x400880
charArray Chars in a char array variable. 0x7fff12d5ed30
The different addresses, as you see, are: 0x400880 vs. 0x7fff12d5ed30
The rest of the variable declared before this have addresses like that of charArray.
Again, the question is, "Why are the addresses so different?"
Thanks for any assistance.
Because string literals, e.g. "foo bar" get allocated in a "different place" than your char array.
This is implementation dependent, but a typical implementation will put string literals in your .rdata ("read-only data") section of your executable, and your char array is declared locally, and hence goes on the stack.
And different sections of your image will get mapped to vastly different addresses when they get loaded in RAM.
I am guessing the compiler/linker puts the char array on the stack, whereas the the other string is put into a static string table.
The text "Chars in a char array variable." and "This is a string, made on the fly." are probably quite near each other. However, char charArray[] = ... requests space on the stack into which the corresponding bit of text is copied. The stack is practically in a different universe from the original hard-coded text, once the OS is done with its virtualization etc.
This is how it goes - I remember reading about this in [Unix: Systems Programming]
1
As you can see Initialized static data gets stored in a different location on the heap as opposed to uninitialized static data.
The crucial thing to realise here is that in the case of strPtr, you are dealing with two different objects, whereas in the case of charArray, you are dealing with only one.
charArray is a single array object, filled with the characters of the "Chars in a char array variable." string.
strPtr itself is a single pointer object. Its value is the address of a second anonymous, unmodifiable array object that in turn contains the characters of the "This is a string, made on the fly." string.
When you print out charArray using %p, you are printing the address of charArray[0] (due to a special rule for arrays). When you print out &(*strPtr) (which is exactly the same as just strPtr), you are printing the address of the anonymous, unmodifiable array object mentioned earlier - and this is why it appears so different to the addresses of the other variables involved.
If you print out &strPtr using %p, you will see that the address of the variable strPtr itself is in a similar range to the other local variables.