Strange behavior of 'char' pointers - c

When I declare and run the following it gives me a segmentation fault.
main()
{
char *p = "boa";
*(p+1) = 'y';
printf("%s",p);
}
I suspect char *p is a constant, etc.
But the following works fine.
main()
{
int i = 300;
char *p = (char*)&i;
*(p+1) = 'y';
printf("%s",p);
}
What is the reason behind this? Doesn't the above rule apply to this as well?

That depends on your definition of "works fine". But the reason the assignment doesn't segmentation fault is because p was made to point to the address of the i variable, which is clearly not a constant. i was assigned a constant value, but i itself is not a constant.
For i = 300 (assuming little endian x86):
+--+--+--+--+
i:|2c|01|00|00|
+--+--+--+--+
.
/|\
|
p:&i
After *(p+1) = 'y'
+--+--+--+--+
i:|2c|79|00|00|
+--+--+--+--+
.
/|\
|
p:&i
So, the print statement just happens to print ,y for you, but only because you relied on the platform's byte ordering (and that 2c was a printable ASCII character). Things could have been different on a big endian machine, and/or if it was non-ASCII.

Oh, boy...
The first one seg-faults due to the string being const (you've got that right). The second one, however, is a fascinating abuse of the pointer semantics! ;-)
Here's what you are doing in the second example:
Have a random int number with a value (in your case - 300)
Get an address of that int - basically an address to a location that holds a (32-bit?) int of 300 and cast it to a char*, where each element points to an 8-bit value
Get the address of the "first" 8-bit value, increment by one (increment by 8 bits(!)) and change the value of those 8 bits to a numeric ASCII code of 'y'
Print the "resulting string"

The difference is this:
char *p="boa";
p is a pointer. You are making p point at a string literal "boa" which cannot be modified and when you try to modify it a segfault occurs.
int i=300;
char *p=(char*)&i;
i is a variable of type int, you only use the constant 300 to initialize i and do a bitwise copy of the value of 300 into the location of i, but you are never pointing at the constant itself, just using it as an initializer. This is the difference, p in your first example points at a constant string literal, whereas in your second example it points at a variable of type int. Hence modifying the location of i later on with the pointer p is fine because you are modifying a non constant object i.

Related

Difference between ptr, ptr[0] and &ptr[0] in C strings

I have been studying strings in C, and have been faced with the following problem, in the following code:
#include <stdio.h>
int main()
{
char *p = "foo";
printf("%p\t%p\t%p",&p[0],p,p[0]);
return 0;
}
And i have the following output:
00403024 00403024 00000066
Process returned 0 (0x0) execution time : 0.057 s
Press any key to continue.
Since p points to the first element of the string, shouldn't p[0] point to the same addres as p (and by consequence, &p[0])?
p[0] is not a pointer, it's a char.
Since you're asking for %p in your format string it gets force-cast to an invalid pointer with the value 0x00000066, which is just the ASCII value of f, the first character in the string.
If you turn on all the warnings your compiler offers you may see one that highlights this conversion and how it's a potential error.
p is of type char*. &p[0] is like &(*(p + 0)) which is to say you de-reference it to a char, then turn it back into a pointer with &. The end result is the same as the original.
You got it almost right. A pointer just stores the address of the pointee that it points to. So outputting p outputs the address of the string.
So what does [x] do when applied to such an address p? It does *(p+x); that is, it evaluates to the value that is stored at this address plus x. So in the case of p[0], you get the ASCII value of the char 'f', which is 66.
Taking the address of that again by prefixing with & gives you the address where it is stored. This is just the address of the original string, of course, because as you observed, the address of the string coincides with the address of its first element.

Type casting the character pointer

I am from Java back ground.I am learning C in which i gone through a code snippet for type conversion from int to char.
int a=5;
int *p;
p=&a;
char *a0;
a0=(char* )p;
My question is that , why we use (char *)p instead of (char)p.
We are only casting the 4 byte memory(Integer) to 1 byte(Character) and not the value related to it
You need to consider pointers as variable that contains addresses. Their sole purpose is to show you where to look in the memory.
so consider this:
int a = 65;
void* addr = &a;
now the 'addr' contains the address of the the memory where 'a' is located
what you do with it is up to you.
here I decided to "see" that part of the memory as an ASCII character that you could print to display the character 'A'
char* car_A = (char*)addr;
putchar(*car_A); // print: A (ASCII code for 'A' is 65)
if instead you decide to do what you suggested:
char* a0 = (char)addr;
The left part of the assignment (char)addr will cast a pointer 'addr' (likely to be 4 or 8 bytes) to a char (1 byte)
The right part of the assignment, the truncated address, will be assigned as the address of the pointer 'a0'
If you don't see why it doesn't make sense let me clarify with a concrete example
Say the address of 'a' is 0x002F4A0E (assuming pointers are stored on 4 bytes) then
'*addr' is equal to 65
'addr' is equal to 0x002F4A0E
When casting it like so (char)addr this become equal to 0x0E.
So the line
char* a0 = (char)addr;
become
char* a0 = 0x0E
So 'a0' will end up pointing to the address 0x0000000E and we don't know what is in this location.
I hope this clarify your problem
First of all, p is not necessarily 4 bytes since it's architecture-dependent. Second, p is a pointer to an integer, a0 is a pointer to a character, not a character. You're taking a pointer pointing to an integer and casting it to a pointer to a character. There are few good reasons to do this. You could also cast the value to a character, but I can't imagine any reason for doing this either.
Pointers do not provide information whether they point to a single object of first object of an array.
Consider
int *p;
int a[5] = { 1, 2, 3, 4, 5 };
int x = 1;
p = a;
p = &x;
So having a value in the pointer p you can not say whether the value is the address of the first element of the array a or it is the address of the single object x.
It is your responsibility to interpret the address correctly.
In this expression-statement
a0=(char* )p;
the address of the extent of memory pointed to by the pointer p and occupied by an object of the type int (it is unknown whether it is a single object or the first object of an array) is interpreted as an address of an extent of memory occupied by an object of the type char. Whether it is a single object of the type char or the first object of a character array with the size equal to sizeof( int ) depends on your intention that is how you are going to deal with the pointer.

In C, why can't an integer value be assigned to an int* the same way a string value can be assigned to a char*?

I've been looking through the site but haven't found an answer to this one yet.
It is easiest (for me at least) to explain this question with an example.
I don't understand why this is valid:
#include <stdio.h>
int main(int argc, char* argv[])
{
char *mystr = "hello";
}
But this produces a compiler warning ("initialization makes pointer from integer without a cast"):
#include <stdio.h>
int main(int argc, char* argv[])
{
int *myint = 5;
}
My understanding of the first program is that creates a variable called mystr of type pointer-to-char, the value of which is the address of the first char ('h') of the string literal "hello". In other words with this initialization you not only get the pointer, but also define the object ("hello" in this case) which the pointer points to.
Why, then, does int *myint = 5; seemingly not achieve something analogous to this, i.e. create a variable called myint of type pointer-to-int, the value of which is the address of the value '5'? Why doesn't this initialization both give me the pointer and also define the object which the pointer points to?
In fact, you can do so using a compound literal, a feature added to the language by the 1999 ISO C standard.
A string literal is of type char[N], where N is the length of the string plus 1. Like any array expression, it's implicitly converted, in most but not all contexts, to a pointer to the array's first element. So this:
char *mystr = "hello";
assigns to the pointer mystr the address of the initial element of an array whose contents are "hello" (followed by a terminating '\0' null character).
Incidentally, it's safer to write:
const char *mystr = "hello";
There are no such implicit conversions for integers -- but you can do this:
int *ptr = &(int){42};
(int){42} is a compound literal, which creates an anonymous int object initialized to 42; & takes the address of that object.
But be careful: The array created by a string literal always has static storage duration, but the object created by a compound literal can have either static or automatic storage duration, depending on where it appears. That means that if the value of ptr is returned from a function, the object with the value 42 will cease to exist while the pointer still points to it.
As for:
int *myint = 5;
that attempts to assign the value 5 to an object of type int*. (Strictly speaking it's an initialization rather than an assignment, but the effect is the same). Since there's no implicit conversion from int to int* (other than the special case of 0 being treated as a null pointer constant), this is invalid.
When you do char* mystr = "foo";, the compiler will create the string "foo" in a special read-only portion of your executable, and effectively rewrite the statement as char* mystr = address_of_that_string;
The same is not implemented for any other type, including integers. int* myint = 5; will set myint to point to address 5.
i'll split my answer to two parts:
1st, why char* str = "hello"; is valid:
char* str declare a space for a pointer (number that represents a memory address on the current architecture)
when you write "hello" you actually fill the stack with 6 bytes of data
(don't forget the null termination) lets say at address 0x1000 - 0x1005.
str="hello" assigns the start address of that 5 bytes (0x1000) to the *str
so what we have is :
1. str, which takes 4 bytes in memory, holds the number 0x1000 (points to the first char only!)
2. 6 bytes 'h' 'e' 'l' 'l' 'o' '\0'
2st, why int* ptr = 0x105A4DD9; isn't valid:
well, this is not entirely true!
as said before, a Pointer is a number that represent an address,
so why cant i assign that number ?
it is not common because mostly you extract addresses of data and not enter the address manually.
but you can if you need !!!...
because it isn't something that is commonly done,
the compiler want to make sure you do so in propose, and not by mistake and forces you to CAST your data as
int* ptr = (int*)0x105A4DD9;
(used mostly for Memory mapped hardware resources)
Hope this clear things out.
Cheers
"In C, why can't an integer value be assigned to an int* the same way a string value can be assigned to a char*?"
Because it's not even a similar situation, let alone "the same way".
A string literal is an array of chars which – being an array – can be implicitly converted to a pointer to its first element. Said pointer is a char *.
But an int is not either a pointer in itself, nor an array, nor anything else implicitly convertible to a pointer. These two scenarios just don't have anything in common.
The problem is that you are trying to assign the address 5 to the pointer. Here you are not dereferencing the pointer, you are declaring it as a pointer and initializing it to the value 5 (as an address which surely is not what you intend to do). You could do the following.
#include <stdio.h>
int main(int argc, char* argv[])
{
int *myint, b;
b = 5;
myint = &b;
}

Character Pointers in C

#include <stdio.h>
int main(void){
char *p = "Hello";
p = "Bye"; //Why is this valid C code? Why no derefencing operator?
int *z;
int x;
*z = x
z* = 2 //Works
z = 2 //Doesn't Work, Why does it work with characters?
char *str[2] = {"Hello","Good Bye"};
print("%s", str[1]); //Prints Good-Bye. WHY no derefrencing operator?
// Why is this valid C code? If I created an array with pointers
// shouldn't the element print the memory address and not the string?
return 0;
}
My Questions are outlined with the comments. In gerneal I'm having trouble understanding character arrays and pointers. Specifically why I can acess them without the derefrencing operator.
In gerneal I'm having trouble understanding character arrays and pointers.
This is very common for beginning C programmers. I had the same confusion back about 1985.
p = "Bye";
Since p is declared to be char*, p is simply a variable that contains a memory address of a char. The assignment above sets the value of p to be the address of the first char of the constant string "Bye", in other words the address of the letter "B".
z = 2
z is declared to be char*, so the only thing you can assign to it is the memory address of a char. You can't assign 2 to z, because 2 isn't the address of a char, it's a constant integer value.
print("%s", str[1]);
In this case, str is defined to be an array of two char* variables. In your print statement, you're printing the second of those, which is the address of the first character in the string "Good Bye".
When you type "Bye", you are actually creating what is called a String Literal. Its a special case, but essentially, when you do
p = "Bye";
What you are doing is assigning the address of this String literal to p(the string itself is stored by the compiler in a implementation dependant way (I think) ). Technically address to the first element of a char array, as Richard J. Ross III explains.
Since it is a special case, it does not work with other types.
By the way, you should likely get a compiler warning for lines like char *p = "Hello";. You should be required to define them as const char *p = "Hello"; since modifying them is undefined as the link explains.
As to the printing code.
print("%s", str[1]);
This doesnt need a dereferencing operation, since internally %s requires a pointer(specifically char *) to be passed, thus the dereferencing is done by printf. You can test this by passing a value when printf is expecting a pointer. You should get a runtime crash when it tries to dereference it.
p = "Bye";
Is an assignment of the address of the literal to the pointer.
The
array[n]
operator works in a similar way as a dereferrence of the pointer "array" increased by n. It is not the same, but it works that way.
Remember that "Hello", "Bye" all are char * not char.
So the line, p="Bye"; means that pointer p is pointing to a const char *i.e."Bye"
But in the next case with int *
*z=2 means that
`int` pointed by `z` is assigned a value of 2
while, z=2 means the pointer z points to the same int, pointed by 2.But, 2 is not a int pointer to point other ints.So, the compiler flags the error
You're confusing something: It does work with characters just as it works with integers et cetera.
What it doesn't work with are strings, because they are character arrays and arrays can only be stored in a variable using the address of their first element.
Later on, you've created an array of character pointers, or an array of strings. That means very simply that the first element of that array is a string, the second is also a string. When it comes to the printing part, you're using the second element of the array. So, unsurprisingly, the second string is printed.
If you look at it this way, you'll see that the syntax is consistent.

Dereferencing and typecasting

I've constructed the following sections of code to help myself understand pointer dereferencing and typecasting in C.
char a = 'a';
char * b = &a;
int i = (int) *b;
For the above, I understand that on the 3rd line, I've dereferenced b and got 'a' and (int) will typecast the value of 'a' to its corresponding value of 97 which is stored into i. But for this section of code:
char a = 'a';
char * b = &a;
int i = *(int *)b;
This results in i being some arbitrary large number like 792351. I'm assuming this is a memory address but my question is why? When I typecast b to an integer pointer, does this actually cause b to point to a different area in memory? What is going on?
EDIT: If the above doesn't work, then why would something like this work:
char a = 'a';
void * b = &a;
char c = *(char *)b;
This correctly assigns 'a' to c.
Your int is larger than your char - you get the 'a' value + some random data following it in memory.
E.g, assuming this layout in memory:
'a'
0xFF
0xFF
0xFF
Your char * and int * both point to the 'a'. When you dereference the char *, you get only the first byte, the 'a'. When you dereference the int * (assuming your int is 32-bit) you get the 'a' and the 3 bytes of uninitialized data following it.
EDIT: In response to updated question:
In char c = *(char *)b;, b still points at the 'a' value. You cast it to a char *, and then dereference it, getting the char pointed to by a char *
The last line you're concerned about does a very bad thing. First, it treats b as an int* whereas b is a char*. That is, the memory pointer to by b is assumed as 4 bytes(typically) instead of 1 byte. So when you dereference it, it goes to the 1 byte pointed by the actual b, takes the following 3 bytes too, treats those 4 bytes as a single int, and gives you the result. That's why it's garbage.
In general, casting one pointer type to another pointer type must be done with great caution.
You're casting a char pointer to an int pointer. Characters are (usually) stored as 8 bits. ints, on the other hand, are 32 bits (or 64 on 64-bit systems). So if you look at the other 24 bits of memory next to the 8 bits worth of b, you'll get a bunch of extra bits that weren't initialized. Even the position of *b in i is architecture dependent.
big-endian: **** ****|**** ****|**** ****|0110 0001
little-endian: 0110 0001|**** ****|**** ****|**** ****
When you cast the character stored in the above, all the asterisks become relevant.
Since a char is 1 Byte long, and an int 4, when you read an int from the address of a single character, you're reading the character and 3 more bytes. The content of these bytes is just whatever happens to lie in memory (pointers, the value of b) and could even be unallocated (resulting in a segmentation fault).
When you type cast it to a (int *) type, it will refer to a total of 4 bytes(size if int) in memory.
In the second case, you're treating the same address as if it pointed to an int. Officially, the result is simply undefined behavior.
Realistically, what happens is that whatever happens to be in the four1 bytes starting at that address get interpreted as an int.
1 4 bytes assuming a 32-bit int -- if your implementation has, for example, a 64-bit int, it'll be 8 bytes.

Resources