I am working with doing some serial communications in C in Linux. I am doing this using file descriptors. For some reason after char* s = "Hello world", I can write s to the serial port using the write method, no problem. I am using a serial monitor program to check the other end. However, I cannot send any other sort of data. I get a "Bad Address" error from the write function.
However, I noticed that if I did something very strange: int* x = "5"; That I could then send this x. My question is, what in the world does int* x = "5" mean?
int* x = "5";
This is not valid C code. You have to cast the value of the array to an int * but a dereference of the pointer can still break alignment rules and be undefined behavior.
int *x = (int *) "5";
This last code stores an unnamed array object of type char [2]. The value of "5" is a pointer to its first element, the pointer is a char *. The cast converts the char * to an int * and stores it in x.
int* x = "5";
is a constraint violation. That means that any conforming compiler must issue a diagnostic for it. It needn't be treated as a fatal error; a compiler is allowed to issue a warning and then successfully translate the program. But the language does not define the behavior of this declaration.
There is no implicit conversion from char* (the type of "5" after it decays) to int*.
This is as close as C gets to saying that something is illegal.
In practice, compilers that accept this declaration will probably treat it as equivalent to:
int *x = (int*)"5";
i.e., they'll insert a conversion. (This isn't the only possible interpretation, but most compilers will either interpret it this way or reject it.) This takes the char* value that results from the decay of the array expression "5" (i.e., the address of the '5' character at the beginning of the string), and converts to int*.
The resulting int* pointer points to an int object that may or may not be valid. The string "5" is two bytes long ({ '5', '\0' }). If int is two bytes, *x may evaluate to the result of interpreting those two bytes as an int value -- which will depend on the system's endianness. Or, if the string literal isn't correctly aligned for an int object, evaluating *x might terminate your program. And if int is wider than two bytes (as it very commonly is), *x refers to memory past the end of the string literal. In any case, attempting to modify *x has yet another kind of undefined behavior, since attempting to modify a string literal is explicitly undefined.
You should have gotten at least a warning when you compiled that declaration. If so, you definitely should not have ignored it. If you didn't get a warning, you should find out how to coax your compiler to produce more warnings.
TL;DR: Don't do that.
int* x = "5" implicitly casts "5" (a const char*) to an int* and stores it in x. Thus, x will point to sizeof(int) bytes in which the lowest is 0x35 (the character '5'), the next is 0, and the rest are indeterminate and will lead to undefined behavior when read.
Related
I am trying to understand how the conversion of a char array to a struct type works. I have done the following:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct {
int a;
int b;
int c;
} test;
int main()
{
char data[20];
strcpy(data, "text");
test *ptr = (test*)data;
return 0;
}
To try and understand what is happening I have added the following lines:
If I add the line "printf("%s", ptr)", despite the fact that there is a warning, the program output is "text".
Next, if before that line I initialize a field, say ptr->a = 1, then the output of the previous printf would be some odd characters.
I guess that after the conversion, the memory that is pointed by the -data- pointer is expanded to hold the struct fields. My problem it trying to access the data after the conversion.
So, my first question is what is happening in the memory when the above conversion is taking place?
Also how can I retrieve back the original data from the -ptr- pointer?
First, do not do this.
Second, when you access an object, you do so with a type that tells the compiler (or other C implementation) how to interpret the bytes in memory. For example, if x is declared with int x;, then, when you use x in an expression such as 3*x + 4, it tells the compiler to read the bytes of x from memory and interpret them as an int.
In test *ptr = (test *) data;, you tell the compiler to change the pointer to data[0] (because data is automatically converted to the address of its first element, &data[0]) to a pointer to a test structure. If this works (see below), then ptr points to the same bytes in memory, but, when you use ptr in an expression such as *ptr or ptr->a, you are telling the compiler to interpret the bytes as if they were a test structure (and ptr->a tells the compiler to go into the structure, get the bytes for member a, and interpret them as if they were the bytes for an int). The bytes in memory do not change. All that changes is how the compiler interprets them. We will look at how that works below. First, let’s see three reasons why you should not do this.
One, when you convert a pointer of type char * to a pointer of type test *, the C standard only guarantees that will work if the alignment is correct. Alignment is a restriction on the addresses where an object can start in memory. An array of char can start anywhere, so your data array could have any address. But, in many C implementations, an int must start on a multiple of four bytes, and this will force a test structure to have at least that alignment requirement. This means that, if data does not start on a multiple of four bytes, the C standard does not guarantee that (test *) data will produce a meaningful result or that it will not trap.
Two, although C guarantees the conversion will produce a result with some meaning if the alignment is okay, the only thing it guarantees about that result is that it can be converted back to the original type and used to access the data with that original type. It does not guarantee that the resulting pointer, of type test *, will behave like a pointer that points to the same place in memory. (This is the rule for pointer conversions in general. There are some specific conversions that have further guarantees. For example, any pointer to an object can be converted to a pointer to char, and the result is guaranteed to point to the first byte of the object.)
Three, C only guarantees that accessing objects will work if it is done through certain types. If an object is defined with one type, such as an array of char, and is accessed through another type, such as an int, the C standard does not guarantee that the program will work at all. Largely, objects can only be accessed as their original type or related compatible types, but there are some exceptions. One exception is that the bytes of any object can be accessed through a character type. (So you can go from int to char, but not from char to int.)
So, if you want to explore what happens when you reinterpret the bytes of data as if they were a test, how should you do it? A proper way is to copy the bytes into a test object, which can be done like this:
test x;
memcpy(&x, data, sizeof x);
Then you can print x.a, x.b, and x.c and see what the values are.
If your C implementation uses four-byte int, as many do, then x.a will contain the bytes from the string that was copied in. Those will be the bytes with the character codes for “t”, “e”, “x”, and “t”. The value you get for x.a will depend on what those codes are (many C implementations use ASCII codes) and the order the C implementation uses for the bytes in an int.
Assuming your C implementation does not insert any padding between members a and b, which is likely, then the first byte of x.b will be zero. However, the remaining bytes in b and the bytes in c will be indeterminate, because they have been copied from bytes in data that were never given any values. “Indeterminate” is a special word in the C standard that means the bytes might not hold fixed values at all; they might appear to vary each time you access them. In practice, C implementations will commonly use whatever values happened to be in memory at the place that was chosen for the array data. However, aggressive optimization by a compiler can produce other results.
Also how can I retrieve back the original data from the -ptr- pointer?
You can convert the pointer back:
char *p = (char *) ptr;
Then p may be used to access the bytes as their original char type, with p[0], p[1], and so on.
When I run the following code it gives a segmentation fault:
#include <stdio.h>
int main() {
int i;
char char_array[5] = {'a', 'b', 'c', 'd', 'e'};
int int_array[5] = {1, 2, 3, 4, 5};
unsigned int hacky_nonpointer;
hacky_nonpointer = (unsigned int) char_array;
for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer.
printf("[hacky_nonpointer] points to %p, which contains the char '%c'\n",
hacky_nonpointer, *((char *) hacky_nonpointer));
hacky_nonpointer = hacky_nonpointer + sizeof(char);
}
hacky_nonpointer = (unsigned int) int_array;
for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer.
printf("[hacky_nonpointer] points to %p, which contains the integer %d\n",
hacky_nonpointer, *((int *) hacky_nonpointer));
hacky_nonpointer = hacky_nonpointer + sizeof(int);
}
}
I was actually trying to do a typecast example. How can I resolve the segmentation fault?
My guess is that you're on a 64-bit machine, where pointers are 64 bits. That will lead to big problems (and undefined behavior) when you do
hacky_nonpointer = (unsigned int) char_array;
as the type int is typically still only 32 bits.
Once you're experimented with this, then throw it all away, and forget all about as well! This is bad code doing bad things that no real program should ever do.
To expand on Some_programmer_dude’s answer a bit, the safe way to store a pointer in an integral type is
#include <stdint.h>
/* ... */
uintptr_t hacky_nonpointer = (uintptr_t)(void*)p;
To convert back,
const char c = *(char*)(void*)hacky_nonpointer;
On most real-world compilers, a direct cast from any pointer type to uintptr_t will work just fine. However, the standard technically only says that any pointer can be converted to void* and back, and that any void* can be converted to uintptr_t and back.
A round-trip conversion will get you an equivalent pointer back. (See the footnote for if you care about the language-lawyering details.) That is, you can convert p to a uintptr_t value and back, and you are guaranteed to get another pointer to the same object. You cannot safely increment the uintptr_t value and convert that back, but you could increment the pointer and convert the incremented pointer to uintptr_t and back. That is how you would safely do what you appear to want.
Converting to an integral type and adding 1 (or equivalently sizeof(char), which is guaranteed to be 1) is not guaranteed to give you anything meaningful. It’s possible to imagine esoteric implementations that will crash if you try to convert that value back to a pointer! However, on mainstream compilers, it will work.
If your compiler didn’t give you a warning about this code, you need to turn on more warnings. If it did, you shouldn’t ignore compiler warnings.
As the Dude said, though, you should never write code like that in the real world. No program should ever do anything like that or will ever need to.
Footnote
There is one extremely pedantic loophole to this: the Standard guarantees that a pointer converted to uintptr_t and back will compare equal to the original pointer, and it forbids two pointers to compare equal unless they can be used the same way. With one exception.
A pointer to the start of an array object might compare equal to a pointer one-past-the-end of a different array object. By my reading of the standard, an implementation that allowed a pointer resulting from a round-trip conversion of either kind of pointer (the beginning of an array object, or one past its end) to be used in only one of those ways could claim to be technically in compliance.
However, any real-world implementation would allow such a pointer to be used in both contexts. That the standard does not spell this out appears to be an oversight.
What is the difference between these two code samples? When I print the variable p, it prints the assigned value like below.
int *p;
p = 51;
printf("%d",p);
Output: 51
When I try to assign p=15, am I making memory address "15" in the ram as a pointee to the pointer p? When I try to add int c = 5 +p; it gives output as 71. Why am I getting 71?
I thought that the memory address "15" could store any information of the OS, programs, etc. But it exactly stores int for precise. Though I change the value p = 150; it gives int . How is that possible? What's happening under the hood?! I really don't understand.
Your code is illegal. Formally, it is not C. C language prohibits assigning integral values to pointer types without an explicit cast (with the exception of constant 0)
You can do
p = (int *) 51;
(with implementation-defined effects), but you cannot do
p = 51;
If your compiler allows the latter variant, it is a compiler-specific extension that has nothing to do with standard C language.
Typically, such assignment makes p to point to address 51 in memory.
On top of that, it is illegal to print pointer values with %d format specifier in printf. Either use %p or cast pointer value to proper integer type before using integer-specific format specifiers.
So you're telling that pointer that it points to 0x15. Then, you tell printf to print it as a decimal integer, so it treats it as such.
This reason this works is that on a 32 bit system, a pointer is 4 bytes, which matches the size of an int.
p points to a place in memory. *p is the contents of that space. But you never use the contents, only the pointer.
That pointer can be viewed as just a number, so printf("%d",p) works. When you assign a number to it, it interprets that as an offset into memory (in bytes). However, the pointer is supposed to contain ints, and when you add a number to a pointer, the pointer advances by that many spaces. So p+5 means "point to the int 5 spaces past the one you're pointing at now", which for 4-byte ints means 20 bytes later, hence the 71.
Otherwise, you've said you have a pointer to an int, but you're actually just doing all your stuff to the pointer, not the int it's pointing to.
If you actually put anything into the place you were pointing, you'd run into all kinds of trouble. You need to allocate some unused memory for it (e.g. with malloc), and then read and write values to that memory using *p.
I'm learning C programming in a self-taught fashion. I know that numeric pointer addresses must always be initialized, either statically or dynamically.
However, I haven't read about the compulsory need of initializing char pointer addresses yet.
For example, would this code be correct, or is a pointer address initialization needed?
char *p_message;
*p_message = "Pointer";
I'm not entirely sure what you mean by "numeric pointer" as opposed to "char pointer". In C, a char is an integer type, so it is an arithmetic type. In any case, initialization is not required for a pointer, regardless of whether or not it's a pointer to char.
Your code has the mistake of using *p_message instead of p_message to set the value of the pointer:
*p_message = "Pointer" // Error!
This wrong because given that p_message is a pointer to char, *p_message should be a char, not an entire string. But as far as the need for initializing a char pointer when first declared, it's not a requirement. So this would be fine:
char *p_message;
p_message = "Pointer";
I'm guessing part of your confusion comes from the fact that this would not be legal:
char *p_message;
*p_message = 'A';
But then, that has nothing to do with whether or not the pointer was initialized correctly. Even as an initialization, this would fail:
char *p_message = 'A';
It is wrong for the same reason that int *a = 5; is wrong. So why is that wrong? Why does this work:
char *p_message;
p_message = "Pointer";
but this fail?
char *p_message;
*p_message = 'A';
It's because there is no memory allocated for the 'A'. When you have p_message = "Pointer", you are assigning p_message the address of the first character 'P' of the string literal "Pointer". String literals live in a different memory segment, they are considered immutable, and the memory for them doesn't need to be specifically allocated on the stack or the heap.
But chars, like ints, need to be allocated either on the stack or the heap. Either you need to declare a char variable so that there is memory on the stack:
char myChar;
char *pChar;
pChar = &myChar;
*pChar = 'A';
Or you need to allocate memory dynamically on the heap:
char* pChar;
pChar = malloc (1); // or pChar = malloc (sizeof (char)), but sizeof(char) is always 1
*pChar = 'A';
So in one sense char pointers are different from int or double pointers, in that they can be used to point to string literals, for which you don't have to allocate memory on the stack (statically) or heap (dynamically). I think this might have been your actual question, having to do with memory allocation rather than initialization.
If you are really asking about initialization and not memory allocation: A pointer variable is no different from any other variable with regard to initialization. Just as an uninitialized int variable will have some garbage value before it is initialized, a pointer too will have some garbage value before it is initialized. As you know, you can declare a variable:
double someVal; // no initialization, will contain garbage value
and later in the code have an assignment that sets its value:
someVal = 3.14;
Similarly, with a pointer variable, you can have something like this:
int ary [] = { 1, 2, 3, 4, 5 };
int *ptr; // no initialization, will contain garbage value
ptr = ary;
Here, ptr is not initialized to anything, but is later assigned the address of the first element of the array.
Some might say that it's always good to initialize pointers, at least to NULL, because you could inadvertently try to dereference the pointer before it gets assigned any actual (non-garbage) value, and dereferencing a garbage address might cause your program to crash, or worse, might corrupt memory. But that's not all that different from the caution to always initialize, say, int variables to zero when you declare them. If your code is mistakenly using a variable before setting its value as intended, I'm not sure it matters all that much whether that value is zero, NULL, or garbage.
Edit. OP asks in a comment: You say that "String literals live in a different memory segment, they are considered immutable, and the memory for them doesn't need to be specifically allocated on the stack or the heap", so how does allocation occur?
That's just how the language works. In C, a string literal is an element of the language. The C11 standard specifies in §6.4.5 that when the compiler translates the source code into machine language, it should transform any sequence of characters in double quotes to a static array of char (or wchar_t if they are wide characters) and append a NUL character as the last element of the array. This array is then considered immutable. The standard says: If the program attempts to modify such an array, the behavior is undefined.
So basically, when you have a statement like:
char *p_message = "Pointer";
the standard requires that the double-quoted sequence of characters "Pointer" be implemented as a static, immutable, NUL-terminated array of char somewhere in memory. Typically implementations place such string literals in a read-only area of memory such as the text block (along with program instructions). But this is not required. The exact way in which a given implementation handles memory allocation for this array / NUL terminated sequence of char / string literal is up to the particular compiler. However, because this array exists somewhere in memory, you can have a pointer to it, so the above statement does work legally.
An analogy with function pointers might be useful. Just as the code for a function exists somewhere in memory as a sequence of instructions, and you can have a function pointer that points to that code, but you cannot change the function code itself, so also the string literal exists in memory as a sequence of char and you can have a char pointer that points to that string, but you cannot change the string literal itself.
The C standard specifies this behavior only for string literals, not for character constants like 'A' or integer constants like 5. Setting aside memory to hold such constants / non-string literals is the programmer's responsibility. So when the compiler comes across statements like:
char *charPtr = 'A'; // illegal!
int *intPtr = 5; // illegal!
the compiler does not know what to do with them. The programmer has not set aside such memory on the stack or the heap to hold those values. Unlike with string literals, the compiler is not going to set aside any memory for them either. So these statements are illegal.
Hopefully this is clearer. If not, please comment again and I'll try to clarify some more.
Initialisation is not needed, regardless of what type the pointer points to. The only requirement is that you must not attempt to use an uninitialised pointer (that has never been assigned to) for anything.
However, for aesthetic and maintenance reasons, one should always initialise where possible (even if that's just to NULL).
First of all, char is a numeric type, so the distinction in your question doesn't make sense. As written, your example code does not even compile:
char *p_message;
*p_message = "Pointer";
The second line is a constraint violation, since the left-hand side has arithmetic type and the right-hand side has pointer type (actually, originally array type, but it decays to pointer type in this context). If you had written:
char *p_message;
p_message = "Pointer";
then the code is perfectly valid: it makes p_message point to the string literal. However, this may or may not be what you want. If on the other hand you had written:
char *p_message;
*p_message = 'P';
or
char *p_message;
strcpy(p_message, "Pointer");
then the code would be invoking undefined behavior by either (first example) applying the * operator to an invalid pointer, or (second example) passing an invalid pointer to a standard library function which expects a valid pointer to an object able to store the correct number of characters.
not needed, but is still recommended for a clean coding style.
Also the code you posted is completely wrong and won't work, but you know that and only wrote that as a quick example, right?
The following code :
int *a;
*a = 5;
will most likely result in a segmentation fault and I know why.
The following code :
int a;
*a = 5;
won't even compile.
(gcc says : invalid type argument of unary *).
Now, a pointer is simply an integer, which is used
for storing an address.
So, why should it be a problem if I say :
*a = 5;
Ideally, this should also result in a segmentation fault.
A pointer is not an integer. C has data types to
a) prevent certain programming errors, and
b) improve portability of programs
On some systems, pointers may not be integers, because they really consist of two integers (segment and offset). On other systems, the "int" type cannot be used to represent pointers because an int is 32 bits and a pointer is 64 bits. For these reasons, C disallows using ints directly as pointers. If you want to use an integral type that is large enough to hold a pointer, use intptr_t.
When you say
int a;
*a = 5;
you are trying to make the compiler dereference something that is not a pointer. Sure, you could cast it to a pointer and then dereference it, like so,
*((int*)a) = 5;
.. and that tells the compiler that you really, really want to do that. BUT -- It's kind of a risky thing to do. Why? Well, in your example, for instance, you never actually initialized the value of a, so when you use it as a pointer, you are going to have whatever value is already at the location being used for a. Since it looks like it is a local variable, that will be an un-init'd location in the function's stack frame, and could be anything. In essence, you would be trying to write the value 5 to some undetermined location; not really a wise thing to do!
It's said to illustrate that pointers merely store addresses, and that addresses may be thought as numbers, much like integers. But usually addresses have a structure (like, page number, offset within page, etc).
You should not take that by word. An integer literally stores a number, which you can add, subtract etc. But which you cannot use as a pointer. An integer is an integer, and a pointer is a pointer. They serve different purposes.
Sometimes, a cast from a pointer to an integer may be necessary (for whatever purposes - maybe in a OS kernel to do some address arithmetic). Then you may cast the pointer to such an integer type, previously figuring out whether your compiler guarantees correct sizes and preserves values. But if you want to dereference, you have to cast back to a pointer type.
You never actually assign "a" in the first case.
int* a = ?
*a = 5; //BAD. What is 'a' exactly?
int a = ? //but some int anyway
*a = 5; //'a' is not a pointer!
If you wish to use the integer as a pointer, you'll have to cast it first. Pointers may be integers, but conceptually they serve different purposes.
The operator * is a unary operator which is not defined for the integer data type. That's why the statement
*a = 5;
won't compile.
Also, an integer and a pointer are not the same thing. They are typically the same size in memory (4 bytes for 32 bit systems).
int* a — is a pointer to int. It points nowhere, you haven't initialized it. Please, read any book about C before asking such questions.