How can a character pointer without any address specification hold data? - c

The following C program is not supposed to work by my understanding of pointers but it does.
#include<stdio.h>
main() {
char *p;
p = "abcdefghijk";
printf("%s", p);
}
Outputs:
abcdefghijk
The char pointer variable p is pointing to something random as I have not assigned any address to it like p = &i; where i is some char array.
That means if I try to write anything to the memory address held by the pointer p it should give me segmentation fault since it is some random address not assigned to my program by the OS.
But the program compiles and runs successfully. What is happening?

In this expression statement
p="abcdefghijk";
the pointer p is assigned with the address of the first character of the string literal "abcdefghijk" that the compiler stores as a zero-terminated character array in the static memory area.
Thus in this statement there are two things that happen. At first the compiler creates an unnamed character array with the static storage duration to hold the string literal. Then the address of the first character of the array is assigned to the pointer. You can imagine it the following way
char unnamed[] = { 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', '\0' };
p = unnamed;
or
p = &unnamed[0];
Take into account that though string literals in C have types of non-constant character arrays opposite to C++ where they have types of constant character arrays nevertheless you may not change string literals. Any attempt to change a string literal results in undefined behavior.
So this code snippet is invalid
char *p = "abcdefghijk";
p[0] = 'A';
But you could create your own character array initializing it with the string literal and in this case you can change the array. For example
char s[] = "abcdefghijk";
char *p = s;
p[0] = 'A';
From the C Standard (6.4.5 String literals)
7 It is unspecified whether these arrays are distinct provided their
elements have the appropriate values. If the program attempts to
modify such an array, the behavior is undefined.
Pay attention to this part of the quote
It is unspecified whether these arrays are distinct provided their
elements have the appropriate values.
It means that for example if you will write
char *p = "abcdefghijk";
char *q = "abcdefghijk";
then it is not necessary that this expression yields true (integer value 1)
p == q
and the result depends on compiler options whether the same string literals are stored as one array or as distinct arrays.

In C a string literal like "abcdefghijk" is actually stored as an (read-only) array of characters. The assignment makes p point to the first character of that array.
I note that you mention p = &i where i would be an array. That is in most cases wrong. Arrays naturally decays to pointers to their first element. I.e. doing p = i would be equal to p = &i[0].
While both &i and &i[0] would result in the same address, it is semantically very different. Lets take an example:
char array[10];
With the above definition doing &array[0] (or just plain array as explained just above) you get a pointer to char, i.e. char *. When doing &array you get a pointer to an array of ten characters, i.e. char (*)[10]. The two types are very different.

"abcdefghijk" is a string constant, and p="abcdefghijk"; will give to p adress of this string.
So it's normal that printf("%s",p); display this string without error.

p="abcdefghijk";
You are creating a string literal in code segment and assigning the address of first character of the literal to the pointer, and as the pointer is not constant you can assign it again with different addresses.

The string literal "abcdefghijk" is compiled by putting the characters in a block in the program's datatext segment. Then your assignment of it to the pointer assigns the address of its location in the data segment to the pointer.

Related

Why i can init char pointer but not int pinter?

I'm recollecting my C programming skills after 1 year. So I decided to start from scratch. I got stuck with poineters
why i can do :
char *string="Hello";
printf ("%s",string);
But i can't do :
int *pt=23;
printf("%d",*pt);
Doesn't the pointer need to be an address ? but why the first example works?
Because using string literals to initialize a character pointer or array is a special allowed syntax, with specific rules. In your case you set a pointer to point at a string literal's address, the string literal itself having type char[] and existing in read-only memory.
For the integer case, yes it needs to be an address, or more specifically another integer pointer. You can't initialize a pointer with an integer. See:
"Pointer from integer/integer from pointer without a cast" issues
The string literal "Hello" is stored in memory as a character array like
char unnamed_literal[] = { 'H', 'e', 'l', 'l', 'o', '\0' };
So in this declaration
char *string="Hello";
the pointer string is assigned with the address of the first character of the already existent array. It can be imaging like
char unnamed_literal[] = { 'H', 'e', 'l', 'l', 'o', '\0' };
char *string = unnamed_literal;;
As for this declaration
int *pt=23;
then the value 23 is not a valid address that points to a valid object defined in your program. The compiler should issue a message that you are trying to assign an integer to a pointer. Thus this call
printf("%d",*pt);
invokes undefined behavior.
To make an analogy with the initialization of a pointer by string literal you could write for example
int a[] = { 1, 2, 3 };
int *pt = a;
This is syntaxic sugar from standard C:
What is actually stored in your char pointer is the address at which the string is loaded in memory when starting the program.
For an integer however, it will try to set the address of the pointer to 23 which is (and should be) invalid.
For example an integer, like:
int VAL = 10;
int *ptr = &VAL; /* ptr now points to val */
*ptr = DESIRED_VALUE; /* *(asterisk) here means dereference, meaning that the value of VAL will change */
or
int *&VAL = DESIRED_VALUE;
int variable_1 = 23;
int *pointer_to_int = &variable_1;
The first line is an integer variable, initialized to an integer value.
The second line is a pointer to integer variable, initialized to an address of an integer variable (the address of the variable variable_1, defined in the previous declaration.
char *p = "Charles Dickens";
This is a char pointer variable, initialized to the address of a string literal, which is a constant array of characters, composed of the characters between the quotes, plus an aditional \0 character, that represents the end of the string literal.
char c = 'a'; /* char variable initialized to value 'a' */
char *pointer_to_c = &c;
Probably you will find this last declaration very similar to the one above. The different thing is that in C you can use string literals, but a single character is similar to an integer, and this is the way pointer variables initialize: with variables addresses.

Pointer assignment to different types

We can assign a string in C as follows:
char *string;
string = "Hello";
printf("%s\n", string); // string
printf("%p\n", string); // memory-address
And a number can be done as follows:
int num = 4404;
int *nump = &num;
printf("%d\n", *nump);
printf("%p\n", nump);
My question then is, why can't we assign a pointer to a number in C just like we do with strings? For example, doing:
int *num;
num = 4404;
// and the rest...
What makes a string fundamentally different than other primitive types? I'm quite new to C so any explanation as to the difference between the two would be very helpful.
There is no such type as "string" in C. A string is not a primitive type. A string is just an array of characters, terminated by a NUL byte ('\0').
When you do this:
char *string;
string = "Hello";
What really happens is that the compiler is smart and creates a constant read only char array and then assigns it to your variable string. This can be done because in C the name of an array is the same as the pointer to its first element.
// This is placed in a different section:
const char hidden_arr[] = {'H', 'e', 'l', 'l', 'o', '\0'};
char *string;
string = hidden_arr;
// Same as:
string = &(hidden_arr[0]);
Here, hidden_arr and string are both char *, because as we just said the name of an array is equal to the pointer to its first element. Of course, all of this is done transparently, you will not actually see another variable named hidden_arr, that's just an example. In reality the string will be stored in some location in your executable without a name, and the address of that location will be copied to your string pointer.
When you try to do the same with an integer, it's wrong because int * and int are different types, and you cannot write this (well, you can, but it's meaningless and does not do what you expect it to):
int *ptr;
ptr = 123;
But, you can very well do it with an array of integers:
int arr[] = {1, 2, 3};
int *ptr;
ptr = arr;
// Same as:
ptr = &(arr[0]);
why can't we assign a pointer to a number in C just like we do with strings?
int *num;
num = 4404;
Code can do that if 4404 is a valid address for an int.
An integer may be converted to any pointer type. Except as previously specified, the
result is implementation-defined, might not be correctly aligned, might not point to an
entity of the referenced type, and might be a trap representation.
C11dr §6.3.2.3 5
If the address is not properly aligned --> undefined behavior (UB).
If the address is a trap --> undefined behavior (UB).
Attempting to de-reference the pointer is a problem unless it points to a valid int.
printf("%d\n", *num);
With below, "Hello" is a string literal. It exist someplace. The assignment take the address of the string literal and assigns that to string.
char *string;
string = "Hello";
The point is that that address assigned is known to be valid for a char *.
In the num = 4404; is not known to be valid (it likely is not).
What makes a string fundamentally different than other primitive types?
In C, a string is a C library specification, not a C language one. It is definition convenient to explaining various function therein.
A string is a contiguous sequence of characters terminated by and including the first null character §7.1.1 1
Primitive types are part of the C language.
The languages also has string literals like "Hello" in char *string; string = "Hello";. These have some similarity to strings, yet differ.
I recommend searching for "ISO/IEC9899:2017" to find a draft copy of the current C spec. It will answer many of your 10 question of the last week.
What makes a string fundamentally different than other primitive types?
A string seems like a primitive type in C because the compiler understands "foo" and generates a null-terminated character array: ['f', 'o', 'o', '\0']. But a C string is still just that: an array of characters.
My question then is, why can't we assign a pointer to a number in C just like we do with strings?
You certainly can assign a pointer to a number, it's just that a number isn't a pointer, whereas the value of an array is the address of the array. If you had an array of int, then that would work just like a string. Compare your code:
char *string;
string = "Hello";
printf("%s\n", string); // string
printf("%p\n", string); // memory-address
to the analogous code for an array of integers:
int numbers[] = {1, 2, 3, 4, 5, 0};
int *nump = numbers;
printf("%d\n", nump[0]); // string
printf("%p\n", nump); // memory-address
The only real difference is that the compiler has some extra syntax for arrays of characters because they're so common, and printf() similarly has a format specifier just for character arrays for the same reason.
The type pf a string literal (e.g. "hello world") is a char[]. Where assigning char *string = "Hello" means that string now points to the start of the array (e.g. the address of the first memory address in the array: &char[0]).
Whereas you can't assign an integer to a pointer because their types are different, one is a int the other is a pointer int *. You could cast it to the correct type:
int *num;
num = (int *) 4404;
But this would be considered quite dangerous (unless you really know what you are doing). I.e. do you know what is a memory adress 4404?

How are pointers "variables whose values are an address" when we can do char *string = "string"?

I've always understood pointers as follows:
int x = 5;
int *y = &x;
printf("%d", *y);
I store 5 at some memory location and allow myself to access that value with x.
I create an integer pointer y, setting its memory location to the memory location of x.
I print the value stored at the address that y holds.
However, I can at the same time do char *string = "neato" and it's totally functional. To me this looks like "create a character pointer, holding the memory address 'neato'". How does that make any sense?
Furthermore, if I set it, I would try to do it as *string = "more neat" but that gives an error. I instead need to do string = "more neat". The first attempt intuitively looks like "change the value stored at the memory address held by string to 'more neat'", but it doesn't work. The second looks to me like "for the memory address held by 'string', change it to 'more neato'. And that totally doesn't make sense to me.
What am I confusing? If in order to access the value stored at a pointer I need to do printf("%d", *pointer), how is setting its value not along those lines as well?
The unary * operator has two different (but related) purposes. In a declaration, it denotes that the type is a pointer type, but does not imply that the pointer is being dereferenced.
When used in an expression, it denotes deferencing the pointer. That's why your example works where it's a declaration, but when you dereference it you can't assign (because the appropriate type to assign would be a char, if you could modify string literals anyhow).
The equivalent way to do it outside the declaration looks like:
const char *s = "hello"; /* initialize pointer value */
s = "goodbye"; /* assign to pointer value */
The above initialization and assignment are equivalent.
"neato" is of the type const char[]. Arrays decay to pointers when appropriate, so the assignment is one pointer to another.
It follows then that your pointer should be to const char. Writing to a memory location occupied by a string literal invokes undefined behavior. This however is valid:
char str[] = "neato";
str[0] = 'p';
Furthermore, if I set it, I would try to do it as *string = "more neat"
Well, you have a pointer to char, so that assignment doesn't make sense (also what I said above about writing to a string literal.)
For example,
char *ptr = "neato";
char arr[] = "neato";
are completely different. ptr is a pointer to the string literal "neato" and the compiler usually stores the string literal in read-only memory. So you can't change the string literal ptr points to, but you can change the value of ptr, namely the address.
*ptr = "more neat"; // error, even if *ptr were writable, it should be a character
*ptr = 'b'; // error
ptr = "more neat"; // ok, you just create another string literal and ptr now points to it
The second one is just an abbreviation of
char arr[] = {'n', 'e', 'a', 't', 'o', '\0'};
In this case, you can change the characters in the array, but you can't change the address of arr(yeah, it's an array)
*arr = 'b'; // ok
arr = "more neat" // error, the value of arr, namely the address of the array cannot be changed
Initialisation and Assignment are different, even if they look pretty similar, the meaning of the operation is often different.
The basic type of both char * and the string "neato" is a char; String literals are simply arrays of characters, often located in read-only addresses. "n" must be located in an address (as must be the next character 'e' and so on). That address is stored in the variable char *ptr;
The second part of the question is with why *ptr = "more neat"; is invalid.
*var dereferences the address -- in this case the 1-character wide memory address, that contains already the character 'n'. You can't put either the address of the new string literal (probably representable in 4 or 8 bytes) to a single char; nor can you put the 9 characters and the terminating ascii-zero in it.
We may study this by taking a memory dump of a 16-bit (Big Endian) machine;
0FFE: .. .. // other variables, return addresses etc.
1000: F0 00 // The pointer "var" is located here
1002: // Top of stack
F000: "n" "e" "a" "t" "o" 00 // Address of constant string is F000
F006: "m" "o" "r" "e" ... // Address of next string is F006
*var accesses the single byte memory at F000. var is itself located at address 0000 and is in this machine two bytes, because addresses are here 16-bit wide. After the new assignment var="more neat"; The memory dump is:
1000: F0 06 // Pointer holds a new address

C: Behaviour of arrays when assigned to pointers

#include <stdio.h>
main()
{
char * ptr;
ptr = "hello";
printf("%p %s" ,"hello",ptr );
getchar();
}
Hi, I am trying to understand clearly how can arrays get assign in to pointers. I notice when you assign an array of chars to a pointer of chars ptr="hello"; the array decays to the pointer, but in this case I am assigning a char of arrays that are not inside a variable and not a variable containing them ", does this way of assignment take a memory address specially for "Hello" (what obviously is happening) , and is it possible to modify the value of each element in "Hello" wich are contained in the memory address where this array is stored. As a comparison, is it fine for me to assign a pointer with an array for example of ints something as vague as thisint_ptr = 5,3,4,3; and the values 5,3,4,3 get located in a memory address as "Hello" did. And if not why is it possible only with strings? Thanks in advanced.
"hello" is a string literal. It is a nameless non-modifiable object of type char [6]. It is an array, and it behaves the same way any other array does. The fact that it is nameless does not really change anything. You can use it with [] operator for example, as in "hello"[3] and so on. Just like any other array, it can and will decay to pointer in most contexts.
You cannot modify the contents of a string literal because it is non-modifiable by definition. It can be physically stored in read-only memory. It can overlap other string literals, if they contain common sub-sequences of characters.
Similar functionality exists for other array types through compound literal syntax
int *p = (int []) { 1, 2, 3, 4, 5 };
In this case the right-hand side is a nameless object of type int [5], which decays to int * pointer. Compound literals are modifiable though, meaning that you can do p[3] = 8 and thus replace 4 with 8.
You can also use compound literal syntax with char arrays and do
char *p = (char []) { "hello" };
In this case the right-hand side is a modifiable nameless object of type char [6].
The first thing you should do is read section 6 of the comp.lang.c FAQ.
The string literal "hello" is an expression of type char[6] (5 characters for "hello" plus one for the terminating '\0'). It refers to an anonymous array object with static storage duration, initialized at program startup to contain those 6 character values.
In most contexts, an expression of array type is implicitly converted a pointer to the first element of the array; the exceptions are:
When it's the argument of sizeof (sizeof "hello" yields 6, not the size of a pointer);
When it's the argument of _Alignof (a new feature in C11);
When it's the argument of unary & (&arr yields the address of the entire array, not of its first element; same memory location, different type); and
When it's a string literal in an initializer used to initialize an array object (char s[6] = "hello"; copies the whole array, not just a pointer).
None of these exceptions apply to your code:
char *ptr;
ptr = "hello";
So the expression "hello" is converted to ("decays" to) a pointer to the first element ('h') of that anonymous array object I mentioned above.
So *ptr == 'h', and you can advance ptr through memory to access the other characters: 'e', 'l', 'l', 'o', and '\0'. This is what printf() does when you give it a "%s" format.
That anonymous array object, associated with the string literal, is read-only, but not const. What that means is that any attempt to modify that array, or any of its elements, has undefined behavior (because the standard explicitly says so) -- but the compiler won't necessarily warn you about it. (C++ makes string literals const; doing the same thing in C would have broken existing code that was written before const was added to the language.) So no, you can't modify the elements of "hello" -- or at least you shouldn't try. And to make the compiler warn you if you try, you should declare the pointer as const:
const char *ptr; /* pointer to const char, not const pointer to char */
ptr = "hello";
(gcc has an option, -Wwrite-strings, that causes it to treat string literals as const. This will cause it to warn about some C code that's legal as far as the standard is concerned, but such code should probably be modified to use const.)
#include <stdio.h>
main()
{
char * ptr;
ptr = "hello";
//instead of above tow lines you can write char *ptr = "hello"
printf("%p %s" ,"hello",ptr );
getchar();
}
Here you have assigned string literal "hello" to ptr it means string literal is stored in read only memory so you can't modify it. If you declare char ptr[] = "hello";, then you can modify the array.
Say what?
Your code allocates 6 bytes of memory and initializes it with the values 'h', 'e', 'l', 'l', 'o', and '\0'.
It then allocates a pointer (number of bytes for the pointer depends on implementation) and sets the pointer's value to the start of the 5 bytes mentioned previously.
You can modify the values of an array using syntax such as ptr[1] = 'a'.
Syntactically, strings are a special case. Since C doesn't have a specific string type to speak of, it does offer some shortcuts to declaring them and such. But you can easily create the same type of structure as you did for a string using int, even if the syntax must be a bit different.

C Program String Literals

When we do
char *p ="house";
p = 'm';
Its not allowed.
But when we do
char p[] = "house";
p[0] = 'm';
printf(p);
It gives O/P as : mouse
I am not able to understand how and where C does memory allocation for string literals?
char p[] = "house";
"house" is a string literal stored in a read only location, but, p is an array of chars placed on stack in which "house" is copied.
However, in char *p = "house";, p actually points to the read-only location which contains the string literal "house", thus modifying it is UB.
A note from the standard 6.7.8 Initialization
14 An array of character type may be initialized by a character string
literal, optionally enclosed in braces. Successive characters of the
character string literal (including the terminating null character if
there is room or if the array is of unknown size) initialize the
elements of the array.
So you basically have an array of characters. It should not be so difficult or puzzle you in understanding how this array gets modified if you have used arrays of ints, floats etc.
When you use char *p="house" - the compiler collects all the "house" strings and puts them in one read only space.
When you use char p[]="house" the compiler creates space for the string as an array in the local scope.
Basic difference is that 1000's of pointer could share the first one (which is why you cannot modify) and the second is local to the scope - so as long as it stays the same size it is modifiable.
char *p = "house"; // const char* p = "house";
String literal "house" resides in read only location and cannot be modified. Now what you are doing is -
*p = 'm' ; // trying to modify read-only location; Missed the dereferencing part
Now,
char p[] = "house";
"house" is copied to the array p. So, it's contents are modifiable. So, this actually works.
p[0] = 'm'; // assigning `m` at 0th index.

Resources