Understanding pointers in C - c

I am trying to learn pointers in C but is getting mixed up with the following concepts:
char *string = "hello"
char *string2;
What is the main difference between:
A.) *string2 = string;
then
B.) string2 = "bye";

Some pictures may help.
Assume the following memory map (addresses are completely arbitrary and don't reflect any known architecture):
Item Address 0x00 0x01 0x02 0x03
---- ------- ---- ---- ---- ----
"hello" 0x00501234 'h' 'e' 'l' 'l'
0x00501238 'o' 0x00
"bye" 0x0050123A 'b' 'y'
0x0050123C 'e' 0x00 0x?? 0x??
...
string 0x80FF0000 0x00 0x50 0x12 0x34
string2 0x80FF0004 0x?? 0x?? 0x?? 0x??
This shows the situation after the declarations. "hello" and "bye" are string literals, stored as arrays of char "somewhere" in memory, such that they are available over the lifetime of the program. Note that attempting to modify the contents of string literals invokes undefined behavior; you don't want to pass string literals (or pointer expressions like string that evaluate to the addresses of string literals) as arguments to functions like scanf, strtok, fgets, etc.
string is a pointer to char, containing the address of the string literal "hello". string2 is also a pointer to char, and its value is indeterminate (0x?? represents an unknown byte value).
When you write
string2 = "bye";
you assign the address of "bye" (0x0050123A) to string2, so our memory map now looks like this:
Item Address 0x00 0x01 0x02 0x03
---- ------- ---- ---- ---- ----
"hello" 0x00501234 'h' 'e' 'l' 'l'
0x00501238 'o' 0x00
"bye" 0x0050123A 'b' 'y'
0x0050123C 'e' 0x00 0x?? 0x??
...
string 0x80FF0000 0x00 0x50 0x12 0x34
string2 0x80FF0004 0x00 0x50 0x12 0x3A
Seems simple enough, right?
Now let's look at the statement
*string2 = string;
There are a couple of problems here.
First, a digression - declarations in C are centered around the types of expressions, not objects. string2 is a pointer to a character; to access the character value, we must dereference string2 with the unary * operator:
char x = *string2;
The type of the expression *string2 is char, so the declaration becomes
char *string2;
By extension, the type of the expression string2 is char *, or pointer to char.
So when you write
*string2 = string;
you're attempting to assign a value of type char * (string) to an expression of type char (*string2). That's not going to work, because char * and char are not compatible types. This error shows up at translation (compile) time. If you had written
*string2 = *string;
then both expressions have type char, and the assignment is legal.
However, if you haven't assigned anything to string2 yet, its value is indeterminate; it contains a random bit string that may or may not correspond to a valid, writable address. Attempting to deference a random, potentially invalid pointer value invokes undefined behavior; it may appear to work fine, it may crash outright, it may do anything in between. This problem won't show up until runtime. Even better, if you assigned the string literal "bye" to string2, then you run into the problem described above; you're trying to modify the contents of a string literal. Again, that's a problem that's not going to show up until runtime.

There are some subtle inferences being made by other answerers, missing the POV of a newbie.
char *string = "hello";
Declares a pointer variable which is initialized to point at a character array (a good type match traditionally).
The statement
*string = "hello";
dereferences what should be a pointer variable and assigns a value to the pointed location. (It is not a variable declaration; that has to be done above it somewhere.) However, since string has type char *—so *string has type char—and the right side of the assignment is an expression with a pointer value, there is a type mismatch. This can be fixed in two ways, depending on the intent of the statement:
string = "hello"; /* with "char *" expressions on both sides */
or
*string = 'h'; /* with "char" expressions on both sides */
The first reassigns string to point to memory containing a sequence of characters (hello\000). The second assignment changes the character pointed to by string to the char value h.
Admittedly, this is a slightly confusing subject which all C programmers go through a little pain learning to grasp. The pointer declaration syntax has a slightly different (though related) effect than the same text in a statement. Get more practice and experience writing and compiling expressions involving pointers, and eventually my words will make perfect sense.

*string can be read as "whatever string points to", which is a char. Assigning "bye" to it makes no sense.

A C string is just an array of characters. C string literals like "hello" above could be viewed as "returning" a pointer to the first element of the character array, { 'h', 'e', 'l', 'l', 'o' }.
Thus, char *string = "bye" makes sense while char string = "bye" doesn't.

char * is a pointer to a character. Literals such as "hello" returns a pointer to the first character of the string. Therefore, string = "bye" is meaningful making string point to the first character of string "bye".
*string, on the other hand, is the character pointed by string. It's not a pointer but an 8-bit integer. This is why the assignment *string = "bye" is meaningless and will probably lead to a segmentation fault as the memory segment where "bye" stored is read-only.

AFTER EDIT:
The difference is that A) will not compile, and if it did, it's undefined behavior, because you're dereferencing an uninitialized pointer.
Also, please don't change your question drastically after posting it.

Related

What happens when a char array gets initialized from a string literal?

As I understand it, the following code works like so:
char* cptr = "Hello World";
"Hello World" lives in the .rodata section of the program's memory. The string literal "Hello World" returns a pointer to the base address of the string, or the address of the first element in the so-called "array", since the chars are laid out sequentially in memory it would be the 'H'. This is my little diagram as I visualize the string literal getting stored in the memory:
0x4 : 'H'
0x5 : 'e'
0x6 : 'l'
0x6 : 'l'
0x7 : 'o'
0x8 : ' '
0x9 : 'W'
0xa : 'o'
0xb : 'r'
0xc : 'l'
0xd : 'd'
0xe : '\0'
So the declaration above becomes:
char* cptr = 0x4;
Now cptr points to the string literal. I'm just making up the addresses.
0xa1 : 0x4
Now how does this code work?
char cString[] = "Hello World";
I am assuming that as in the previous situation "Hello World" also degrades to the address of 'H' and 0x4.
char cString[] = 0x4;
I am reading the = as an overloaded assignment operator when it used with initialization of a char array. As I understand, at initialization of C-string only, it copies char-by-char starting at the given base address into the C-string until it hits a '\0' as the last char copied. It also allocates enough memory for all the chars. Because overloaded operators are really just functions, I assume that it's internal implementation is similar to strcpy().
I would like one of the more experienced C programmers to confirm my assumptions of how this code works. This is my visualization of the C-string after the chars from the string literal get copied into it:
0xb4 : 'H'
0xb5 : 'e'
0xb6 : 'l'
0xb6 : 'l'
0xb7 : 'o'
0xb8 : ' '
0xb9 : 'W'
0xba : 'o'
0xbb : 'r'
0xbc : 'l'
0xbd : 'd'
0xbe : '\0'
Once again, the addresses are arbitrary, the point is that the C-string in the stack is distinct from the string literal in the .rodata section in memory.
What am I trying to do? I am trying to use a char pointer to temporarily hold the base address of the string literal, and use that same char pointer (base address of string literal) to initialize the C-string.
char* cptr = "Hello World";
char cString[] = cptr;
I assume that "Hello World" evaluates to its base address, 0x4. So this code ought to look like this:
char* cptr = 0x4;
char cString[] = 0x4;
I assume that it should be no different from char cString[] = "Hello World"; since "Hello World" evaluates to its base address, and that is what is stored in the char pointer!
However, gcc gives me an error:
error: invalid initializer
char cString[] = cptr;
^
How come you can't use a char pointer as a tempoorary placeholder to store the base address of a string literal?
How does this code work? Are my assumptions correct?
Does using a string literal in the code return the base address to the "array" where the chars are stored in the memory?
Your understanding of memory layout is more or less correct. But the problem you are having is one of initialization semantics in C.
The = symbol in a declaration here is NOT the assignment operator. Instead, it is syntax that specifies the initializer for a variable being instantiated. In the general case, T x = y; is not the same as T x; x = y;.
There is a language rule that a character array can be initialized from a string literal. (The string literal is not "evaluated to its base address" in this context). There is not a language rule that an array can be initialized from a pointer to the elements intended to be copied into the array.
Why are the rules like this? "Historical reasons".
I am assuming that as in the previous situation "Hello World" also degrades to the address of 'H' and 0x4.
Not really: cString[] gets a completely new address in memory. Compiler allocates 12 chars to it, and initializes them with the content of "Hello World" string literal.
I assume that "Hello World" evaluates to its base address, 0x4. Does using a string literal in the code return the base address to the "array" where the chars are stored in the memory?
cString may be converted to char* later on, yielding its base address, but it remains an array in the regular contexts. In particular, if you invoke sizeof(cString) you would get the size of the array, not the size of the pointer.
How come you can't use a char pointer as a temporary placeholder to store the base address of a string literal?
You can. However, once a string literal is assigned to char *, it stops being a string literal, at least as far as the compiler is concerned. It becomes a char * pointer, no different from other char * pointers.
Note that modern C compilers combine identical string literals as an optimization, so if you write
#define HELLO_WORLD "Hello World"
...
char* cptr = HELLO_WORLD;
char cString[] = HELLO_WORLD;
and turn optimization on, the compiler would eliminate duplicate copies of the string literal.
The second definition char cString[] = "Hello World"; is a shorthand for this equivalent definition:
char cString[12] = { 'H', 'e', 'l', 'l', 'o', ' ', 'W', 'o', 'r', 'l', 'd', '\0' };
If this definition occurs as a global scope or with static storage, cString will be in the .data segment with the initial contents in the executable image. If it occurs un the scope of a function with automatic storage, the compiler will allocate automatic storage for the array (reserving space on the stack frame or equivalent) and generate code to perform the initialization at run-time.

What happens when actual and extern types are different?

I have two files:
a.c
extern char *s;
int main()
{
puts(s);
}
and b.c:
char s[] = "hello";
I compile both of them at same time, there's no error. But program crashes when run. Why? What part of the C language specification says that this is illegal?
You invoked undefined behavior and the program happened to crash.
Quote from N1256 6.2.7 Compatible type and composite type
1 Two types have compatible type if their types are the same. Additional rules for
determining whether two types are compatible are described in 6.7.2 for type specifiers,
in 6.7.3 for type qualifiers, and in 6.7.5 for declarators. [...]
2 All declarations that refer to the same object or function shall have compatible type;
otherwise, the behavior is undefined.
In typical environment, when the program is run, what is stored will be read as pointer because the declaration says that a pointer is there in a.c, but what actually is a part of string (if size of pointers is 4 bytes) and it has little chance of being valid pointer. Therefore, reading from that address has a big chance of causing a Segmentation Fault.
In case you want to actually know why it does crash (as opposed to why it shouldn't work):
An array is a sequence of things in memory. So with
char s[] = "hello";
the memory layout for this variable will look like this (let's say it starts at 0x00123400 with 4-byte pointers):
0x00123400: 'h' <- address of s
0x00123401: 'e'
0x00123402: 'l'
0x00123403: 'l'
0x00123404: 'o'
0x00123405: '\0'
To get the address of the string, it just uses the fixed number 0x00123400.
A pointer holds the address of something else. If you had:
char *s = "hello";
then the compiler would place the array "hello" somewhere, and then fill s with its address:
0x00123400: 0x00 <- address of s
0x00123401: 0x56
0x00123402: 0x78
0x00123403: 0x9A
0x0056789A: 'h' <- what s points to
0x0056789B: 'e'
0x0056789C: 'l'
0x0056789D: 'l'
0x0056789E: 'o'
0x0056789F: '\0'
To get the address of the string, it starts from the fixed number 0x00123400, and reads the number at that location.
Now, if your variable is actually a char[] and you told the compiler it was a char*, it's going to treat it as a pointer. That means it's going to start from the address of the variable, read the number there, and use that number as the address of the string.
What number is that? Well, I did say:
0x00123400: 'h' <- address of s
0x00123401: 'e'
0x00123402: 'l'
0x00123403: 'l'
but that's a lie - we all know memory only stores numbers, not letters. It's just shorthand so people don't have to memorize the ASCII table. What's really stored there is:
0x00123400: 0x68 <- address of s
0x00123401: 0x65
0x00123402: 0x6C
0x00123403: 0x6C
So your program will read the 0x68656C6C, then it will try to print the string starting from address 0x68656C6C, which is most likely an invalid address.
(Note: I'm ignoring endianness in this answer)

Assignment a value to index of pointer

I try to assign a value to the second index of pointer, it gives me a warning
"[Warning] assignment makes integer from pointer without a cast"
and it doesn't run this program. I wonder, where am I making a mistake?
#include<stdio.h>
void main()
{
char *p="John";
*(p+2)="v";
printf("%s",p);
}
Firstly, In your code,
char *p="John";
p points to a string literal, and attempt to modify a string literal invokes undefined behavior.
Related, C11, chapter §6.4.5
[...] If the program attempts to modify such an array, the behavior is
undefined.
If you want to modify , you need an array, like
char p[]="John";
Secondly, *(p+2)="v"; is wrong, as "" denotes a string, whereas you need a char (Hint: check the type os *(p+2)). Change that to
*(p+2)='v';
To elaborate the difference, quoting C11, chapter §6.4.4.4, for Character constants
An integer character constant is a sequence of one or more multibyte characters enclosed
in single-quotes, as in 'x'.
and chapter §6.4.5, string literals
A character string literal is a sequence of zero or more multibyte characters enclosed in
double-quotes, as in "xyz".
Thirdly, as per the C standards, void main() should at least be int main(void).
You are assigning a pointer to a string literal to an element of another string literal.
First, you need to change the pointer and make it an array, so any modification is legal, this is an example
char p[] = "John";
then you need to replace "v" which is a string literal consisting of two characters 'v' and '\0', to 'v' which is the ascii value for the letter v as an integer
*(p + 2) = 'v';
also, this is the third element and not the second, because the first element is p[0].
You're making two mistakes.
First, you are attempting to modify the contents of a string literal; this invokes undefined behavior, meaning any of the following are possible (and considered equally correct): your code may crash, it may run to completion with no issues, it may run to completion but not alter the string literal, it may leave your system in a bad state, etc. Your compiler may reject the code completely, although I don't know of any compiler that does so.
If you want to be able to modify the contents of a string, then you need to set aside storage for that string, instead of just pointing to a string literal:
char p[] = "John"; // copies the contents of the string literal "John" to p
// p is sized automatically based on the length of the
// string literal
Here's a hypothetical memory map showing the result of that declaration and initialization:
Address Item 0x00 0x01 0x02 0x03
------- ---- ---- ---- ---- ----
0x8000 "John" 'J' 'o' 'h' 'n'
0x8004 0x00 0x?? 0x?? 0x?? // 0x?? represents a random byte value
...
0xfffdc100 p 'J' 'o' 'h' 'n'
0xfffdc104 0x00 0x?? 0x?? 0x??
The string literal that lives at 0x8000 should not be modified; the string that lives in p at 0xfffdc100 may be modified (although the buffer will only have enough space to store up to a 4-character string).
Your second mistake (and the one causing the compiler to complain) is in this line:
*(p+2)="v";
The expression "v" is a string literal and has type "2-element array of char"; since it's not the operand of the sizeof or unary & operators, the type "decays" to "pointer to char", and the value of the expression is the address of the string literal "v".
The expression *(p + 2) has type char (it resolves to a single character value), which is an integral type, not a pointer type. You cannot assign pointer values to non-pointer objects.
You can easily fix that by changing that line to
*(p + 2) = 'v'; // note single quote vs double quote
This time, instead of trying to assign the address of a string literal to p[2], you're assigning the value of a single character.

Why can I store a string in the memory address of a char?

I'm starting to understand pointers and how to dereference them etc. I've been practising with ints but I figured a char would behave similarly. Use the * to dereference, use the & to access the memory address.
But in my example below, the same syntax is used to set the address of a char and to save a string to the same variable. How does this work? I think I'm just generally confused and maybe I'm overthinking it.
int main()
{
char *myCharPointer;
char charMemoryHolder = 'G';
myCharPointer = &charMemoryHolder;
printf("%s\n", myCharPointer);
myCharPointer = "This is a string.";
printf("%s\n", myCharPointer);
return 0;
}
First, you need to understand how "strings" work in C.
"Strings" are stored as an array of characters in memory. Since there is no way of determining how long the string is, a NUL character, '\0', is appended after the string so that we know where it ends.
So for example if you have a string "foo", it may look like this in memory:
--------------------------------------------
| 'f' | 'o' | 'o' | '\0' | 'k' | 'b' | 'x' | ...
--------------------------------------------
The things after '\0' are just stuff that happens to be placed after the string, which may or may not be initialised.
When you assign a "string" to a variable of type char *, what happens is that the variable will point to the beginning of the string, so in the above example it will point to 'f'. (In other words, if you have a string str, then str == &str[0] is always true.) When you assign a string to a variable of type char *, you are actually assigning the address of the zeroth character of the string to the variable.
When you pass this variable to printf(), it starts at the pointed address, then goes through each char one by one until it sees '\0' and stops. For example if we have:
char *str = "foo";
and you pass it to printf(), it will do the following:
Dereference str (which gives 'f')
Dereference (str+1) (which gives 'o')
Dereference (str+2) (which gives another 'o')
Dereference (str+3) (which gives '\0' so the process stops).
This also leads to the conclusion that what you're currently doing is actually wrong. In your code you have:
char charMemoryHolder = 'G';
myCharPointer = &charMemoryHolder;
printf("%s\n", myCharPointer);
When printf() sees the %s specifier, it goes to address pointed to by myCharPointer, in this case it contains 'G'. It will then try to get next character after 'G', which is undefined behaviour. It might give you the correct result every now and then (if the next memory location happens to contain '\0'), but in general you should never do this.
Several comments
Static strings in c are treated as a (char *) to a null terminated
array of characters. Eg. "ab" would essentially be a char * to a block of memory with 97 98 0. (97 is 'a', 98 is 'b', and 0 is the null termination.)
Your code myCharPointer = &charMemoryHolder; followed by printf("%s\n", myCharPointer) is not safe. printf should be passed a null terminated string, and there's no guarantee that memory contain the value 0 immediately follows your character charMemoryHolder.
In C, string literals evaluate to pointers to read-only arrays of chars (except when used to initialize char arrays). This is a special case in the C language and does not generalize to other pointer types. A char * variable may hold the address of either a single char variable or the start address of an array of characters. In this case the array is a string of characters which has been stored in a static region of memory.
charMemoryHolder is a variable that has an address in memory.
"This is a string." is a string constant that is stored in memory and also has an address.
Both of these addresses can be stored in myCharPointer and dereferenced to access the first character.
In the case of printf("%s\n", myCharPointer), the pointer will be dereferenced and the character displayed, then the pointer is incremented. It repeasts this until finds a null (value zero) character and stops.
Hopefully you are now wondering what happens when you are pointing to the single 'G' character, which is not null-terminated like a string constant. The answer is "undefined behavior" and will most likely print random garbage until it finds a zero value in memory, but could print exactly the correct value, hence "undefined behavior". Use %c to print the single character.

Are arrays const datatypes

#include<stdio.h>
#include<string.h>
int main()
{
char a[]="aaa";
char *b="bbb";
strcpy(a,"cc");
printf("%s",a);
strcpy(b,"dd");
printf("%s",b);
return 0;
}
We could not modify the contents of the array but the above program does not show any compile time error.When run it printed cc and terminated. The contents of the array i think will get stored in the read only section of the data segment and so its not possible to change the value of array as its a const.But here in the above program the value got changed to cc and the program terminated.The value got changed here why is it so.Please help me understand.
Array is not a constant datatype but a literal string like "aaa" is. You cannot modify it's content.
Here's a hypothetical memory map showing how the string literals, array, and pointer all relate to each other:
Item Address 0x00 0x01 0x02 0x03
---- ------- ---- ---- ---- ----
"aaa" 0x00040000 'a' 'a' 'a' 0x00
"bbb" 0x00040004 'b' 'b' 'b' 0x00
"cc" 0x00040008 'c' 'c' 0x00 ???
"dd" 0x0004000C 'd' 'd' 0x00 ???
...
a 0x08000000 'a' 'a' 'a' 0x00
b 0x08000004 0x00 0x04 0x00 0x00
This is the situation at line 6 in your code, after a and b have been declared and initialized. The string literals "aaa", "bbb", "cc", and "dd" all reside somewhere in memory such that they exist over the lifetime of the program. They are stored as arrays of char (const char in C++). Attempting to modify the contents of a string literal (in the case of this hypothetical, attempting to write to any memory location starting with 0x0004) invokes undefined behavior. Some platforms store string literals in read-only memory, some store them in writable memory, but in all cases, they should be treated as though they are unwritable.
The object a is an array of char, and it has been initialized with the contents of the string literal "aaa". The object b is a pointer to char, and it has been initialized with the address of the string literal "bbb". In the line
strcpy(a, "cc");
you're copying the contents of the string literal "cc" to a; after the line is executed, your memory map looks like this:
Item Address 0x00 0x01 0x02 0x03
---- ------- ---- ---- ---- ----
"aaa" 0x00040000 'a' 'a' 'a' 0x00
"bbb" 0x00040004 'b' 'b' 'b' 0x00
"cc" 0x00040008 'c' 'c' 0x00 ???
"dd" 0x0004000C 'd' 'd' 0x00 ???
...
a 0x08000000 'c' 'c' 0x00 0x00
b 0x08000004 0x00 0x04 0x00 0x00
So when you print a to standard output, you should see the string cc. Note: printf is buffered, so it's possible that output may not be written to the terminal immediately - either add a newline character to the format string (printf("%s\n", a);) or call fflush(stdout); after the printf to make sure all your output shows up.
In line 9, you attempt to copy the contents of the string literal "dd" to the location pointed to by b; unfortunately, b points to another string literal, which as mentioned above invokes undefined behavior. At this point, your program could literally do anything from run as expected to crash outright to anything in between. This could be part of the reason you only see the output for cc.
You have several arrays in your program. Some of them are modifiable, others are not. So, your original question ("Are arrays const datatypes?") is not really answerable in any meaningful way.
String literals (like "aaa") are arrays, but they are not modifiable. Note that in C language string literals are not really const (For example array "aaa", has type char[4], not const char[4]). However, it is still explicitly prohibited by the language to attempt to modify string literals. The compiler is not required to catch such attempts. Also, no run-time error is guaranteed to happen when you make such an attempt. The behavior is simply undefined. When you do strcpy(b,"dd"), you attempt to modify a non-modifiable array - the behavior is undefined. Anything can happen.
As for the ordinary array a in your code sample, it is declared as modifiable. So, you can modify it as much as you want. When you do strcpy(a, "cc"), you copy string "cc" into your array a. So, that exactly what you observe in your experiment whe you print the content of a. Nothing unusual here.
char* != char[]!
In this case, a is a buffer to a writeable area which is first filled with the content of a literal string, which is stored in a read-only area, while b is a pointer which directly points to a read-only area! Here is a sample code to help you understand :
#include <stdio.h>
#include <string.h>
#define literal "test"
int main() {
char a[] = literal
char b[] = literal;
char* c = literal;
char* d = literal;
printf("%s (%p)\n", a, a);
printf("%s (%p)\n", b, b);
printf("%s (%p)\n", c, c);
printf("%s (%p)\n", d, d);
return 42;
}
By executing this code, you'll see that even if all 4 string printed are the same, a and b addresses different areas in memory, while c and d point to a unique third area adress. In addition, you should see a big difference in address range between the first two and the last one : here, the location in different areas of memory (read/write and read-only) is made obvious.
Edit : just to insist, a fifth printf (printf("%s (%p)\n", literal, literal);) would print the same than c and d lines.
Arrays are not const datatypes, but literals are. The string literals "aaa" and "bbb" cannot be modified (or they can, but the result is undefined).
As pointed by Paul R, char a[] = "aaa"; is fine because it will use the string literal to initialize an array (which as I mentioned aren't const datatypes). It's char *b = "bbb"; the problem here, since you later try to modify the contents of the string literal itself, and not a copy.
b is pointing toward a string bbb which is stored in read-only memory.which cannot be changed by strcpy() or anything.

Resources