What's the rules that C converts address to int? - c

I'm a beginner in C, today I'm studying the pointer part, I found that I can print an address directly, when using the right type escaper, I can even print the intended value stored in that memory address.
Later, I did some experiments:
##### CODE PART ######
#include <stdio.h> // Define several variable types, macros, and functions about standard input/output.
int main () {
char my_string[] = "address test";
printf("%s\n", &my_string);
printf("%p\n", &my_string);
printf("%d\n", &my_string);
printf("%x\n", &my_string);
printf("\n");
char *p = "pointer string test";
printf("%s\n", p);
printf("%p\n", p);
printf("%d\n", p);
printf("\n");
char *p2 = 'p';
printf("%c\n", p2);
printf("%p\n", p2);
printf("%d\n", p2);
return 0;
}
##### OUTPUT #####
address test
0x7fff58778a7b
1484229243
58778a7b
pointer string test
0x107487f87
122191751
p
0x70
112
I'm not quite understand the behavior of %d format output at first, But after more observations and experiments. I found that %d is converting part of the hex value of the memory address.
But for address of my_string it omitted the 0x7fff part, for address of p it omitted 0x10 part, for p2 it omitted the 0x part. In my cognition, 0x is the head sign of a hex value.
But how should I know how much digits will be omitted by C when converting a memory address to int, as it does in the sample of my_string and p?
PS: My system version is OSX10.10

The C standard (ISO/IEC 9899:2011) has this to say about converting between pointers and integers:
6.3 Conversions
6.3.2.3 Pointers
¶5 An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation.67)
¶6 Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of any integer type.
67) The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to
be consistent with the addressing structure of the execution environment.
Note that the behaviour of converting between pointers and integers is implementation defined, not undefined. However, unless the integer type used is uintptr_t or intptr_t (from <stdint.h> — or <inttypes.h>), you are likely to see truncation effects if the sizes of the pointer and the integer types do not match. If you move your code between 32-bit and 64-bit systems, you will run into problems somewhere.
In your code, you have 64-bit pointers (because you're on Mac OS X 10.10 and you need to specify explicitly -m32 to get a 32-bit build, but your results are consistent with a 64-bit build anyway). When you pass those pointers to printf() with the %d and %x conversion specifications, you are requesting printf() to print a 32-bit quantity, so it formats 32 of the 64 bits you passed. The behaviour is ill-defined; you aren't getting a conversion, per se, but the calling code (in main()) pushes a 64-bit pointer onto the stack and the called code (printf()) reads a 32-bit quantity off the stack. If you requested that a single call to printf() should print several values (e.g. printf("%d %x\n", p, p);), you'd get more surprising results.
You should compile with options like:
gcc -O3 -g -std=c11 -Wall -Wextra -Wmissing-prototypes -Wstrict-prototypes \
-Wold-style-definition -Werror …
With those options, your code would not compile; the compiler would complain about mismatches between the format strings and the values passed. When I saved your code into a file noise.c and compiled it with clang (from XCode 7.2, running on Mac OS X 10.10.5), I got:
$ /usr/bin/clang -O3 -g -std=c11 -Wall -Wextra -Wmissing-prototypes -Wstrict-prototypes \
> -Wold-style-definition -Werror noise.c -o noise
noise.c:5:20: error: format specifies type 'char *' but the argument has type 'char (*)[13]'
[-Werror,-Wformat]
printf("%s\n", &my_string);
~~ ^~~~~~~~~~
noise.c:7:20: error: format specifies type 'int' but the argument has type 'char (*)[13]' [-Werror,-Wformat]
printf("%d\n", &my_string);
~~ ^~~~~~~~~~
noise.c:8:20: error: format specifies type 'unsigned int' but the argument has type 'char (*)[13]'
[-Werror,-Wformat]
printf("%x\n", &my_string);
~~ ^~~~~~~~~~
noise.c:14:20: error: format specifies type 'int' but the argument has type 'char *' [-Werror,-Wformat]
printf("%d\n", p);
~~ ^
%s
noise.c:17:11: error: incompatible integer to pointer conversion initializing 'char *' with an expression of
type 'int' [-Werror,-Wint-conversion]
char *p2 = 'p';
^ ~~~
noise.c:18:20: error: format specifies type 'int' but the argument has type 'char *' [-Werror,-Wformat]
printf("%c\n", p2);
~~ ^~
%s
noise.c:20:20: error: format specifies type 'int' but the argument has type 'char *' [-Werror,-Wformat]
printf("%d\n", p2);
~~ ^~
%s
7 errors generated.
$
Compile with stringent warnings, and pay heed to those warnings.

There is no such thing like "hex value". A number is an amount. Decimal and hex are just representations of the number using different conventions. One can, as well, represent the number using roman numerals and its value still remains the same.
The address of a variable is a concept, not a physical thing. It usually happens to be a (large) number on the current OSes and CPU architectures but this is not set in stone.
Depending on the compiler and the code it compiles, a variable can be stored in memory (it has an address that looks like a large integral number) or not. The compiler can optimize the code and store a temporary variable in a CPU register; in this case it doesn't have an address.
Back to your code. &my_string is the address of variable my_string. It looks like a number. You probably run the code on a 64-bit processor. Memory addresses in this situation are 64-bit unsigned numbers.
printf("%p\n", &my_string); - prints a 64-bit unsigned number (the most appropriate representation of a pointer on the hardware architecture you are using).
printf("%d\n", &my_string); - you pass a 64-bit number to printf() but because of the %d specifier it thinks the value is 32-bit. It grabs only half of the passed value (4 out of the 8 bytes) and represents it as a signed integer. But which half? It depends on the architecture where the code runs. The behaviour of this code is undefined.
printf("%x\n", &my_string); - similar with %d, it prints only (the same) half of the passed value using the hexadecimal notation. The behaviour of this code is, again, undefined.
The 0x prefix is not part of the hex representation; it is just a marker that signals to the C compiler that a number in the hexadecimal representation follows. While the hex representation is universal, different languages use different ways to encode them. Even the C language uses two different markers for them; 0x is used to prefix the numbers and \x is used to prefix the hex representation of a character.

There's no rule. This is not covered by the C standard. Your code causes undefined behaviour. Any results you observe for this entire program are meaningless.
With printf you must convert the arguments to the right type yourself.

printf("%d\n", &my_string);
printf("%x\n", &my_string);
is cause for undefined behavior. The format specifier and the argument type must match for printf to work correctly. For a list of valid format specifiers and data types to which they apply, take a look at http://en.cppreference.com/w/c/io/fprintf.
The following lines suffer from the same problem.
printf("%d\n", p);
printf("%c\n", p2);
printf("%d\n", p2);
The line
char *p2 = 'p';
assigns the integer value that represents the character 'p' to p2. However, that is not a valid address.
The integral types that can be used to hold a pointer are intptr_t and uintptr_t. Hence, you can use:
char my_string[] = "address test";
intptr_t ptr = &my_string;
However, you cannot use %d format specifier to print that value. You will need to use:
printf("%" SCNdPTR "\n", ptr);
to print that.
Take a look at http://en.cppreference.com/w/c/types/integer for more details.

Related

Assigning variable address to pointer. Dereferencing pointer causes segmentation fault. How?

As you can see I'm just taking a variable address and taking it back and forth through other variables and pointers. Through printf I can see that address1, value2 and address 2 all hold the same value. But when I try to dereference address2 (that holds the same value as address1 and &value, i get error. Why?
#include <stdio.h>
#include <stdint.h>
int main()
{
uint32_t value = 0;
uint32_t* address1 = NULL;
uint32_t* address2 = NULL;
address1 = &value;
printf("address1: %x\n", address1);
uint32_t value2 = (uint32_t)address1;
printf("value2: %x\n", value2);
address2 = (uint32_t*)value2;
printf("address2: %x\n", address2);
uint32_t value3 = *address2; //fault here
printf("*value2: %u\n", value3);
return 0;
}
On 64-bit systems, a uint32_t isn't big enough to hold a pointer. By putting one in it anyway, the top half of the address gets truncated, so it's pointing at the wrong place when you go to subsequently use it. Change to a uintptr_t to fix it.
Also, as Ingo Leonhardt commented, %x is the wrong format specifier for pointers, so the printed forms of them were truncated too, which is why you thought the values were all equal even though they actually aren't.
Joseph's answer explains why this program fails on your particular setup (32bit vs. 64bit). I'll try to give an answer that is based only on the C standard (C99 mostly) and platform-independent.
The problem: Your program has undefined behaviour because it violates the C standard.
Specifically, the problem is the combination of these lines:
uint32_t* address1 = NULL; // 1
uint32_t value2 = (uint32_t)address1; // 2
address2 = (uint32_t*)value2; // 3
In line 2, you are casting a "pointer to uint32_t" (uint32_t*) to a uint32_t. That by itself is ok - you will get a numeric representation of the pointer, possibly truncated to fit into unit32_t.
However, in line 3 you then proceed to cast value2 back to uint32_t*, and dereference it. This is the undefined behavior, because (according to the C standard) there is no guarantee that the resulting address2 is a valid pointer.
You may in principle cast a valid pointer to an integral type and back, and get the same (valid) pointer back. However, this only works if the integral type you use is sufficiently large for a pointer - to ensure this, you need to use uintptr_t, which is meant specifically for this purpose. If your platform does not have uintptr_t (only introduced in C99), you will need a platform-specific type.
See e.g. When is casting between pointer types not undefined behavior in C? for more details, both on casting between pointer types and between integers and pointers.
As to how to avoid these problems: There is no general solution, but modern compilers will try to warn you in many cases. For example, on my system (64bit, GCC 7.5.0), gcc (even without any warning options) will warn about the problematic pointer cast in line 3 above:
st.c: In function ‘main’:
tst.c:13:23: warning: cast from pointer to integer of different size
[-Wpointer-to-int-cast]
uint32_t value2 = (uint32_t)address1;
^
It will also warn about the second cast:
tst.c:16:16: warning: cast to pointer from integer of different size
[-Wint-to-pointer-cast]
address2 = (uint32_t*)value2;
^
And finally, it will even notice the wrong printf format string mentioned in Joseph's answer:
tst.c:17:24: warning: format ‘%x’ expects argument of type ‘unsigned int’,
but argument 2 has type ‘uint32_t * {aka unsigned int *}’ [-Wformat=]
printf("address2: %x\n", address2);
~^
%ls
Lesson: Always compile with -Wall -Wextra (or whatever switches on warning in your compiler), and heed the warnings.

Make 7 year ago prints compatible with now

I'm reading the book The Art Of Exploitation and there is a program called memory_segments.c which just demostrante to us how the memory segments works, the heap, the stack and etc. But when I try to compile the program it seems like that the printing isn't more compatible with now. I use gcc 10.2.0 to compile my C code.
#include <stdio.h>
#include <stdlib.h>
int global_initialized_var = 5;
int main()
{
printf("global_initialized_var is at address 0x%08x\n", &global_initialized_var);
// ... more prints, removed just to make code shorter
return 0;
}
// warning: format ‘%x’ expects argument of type ‘unsigned int’, but argument 2 has type ‘int *’
The author of the book's output:
...
global_initialized_var is at address 0x080497ec
...
What is the alternative for 0x%08x? And why does 0x%08x no longer work?
%x prints integer values not pointers. It is Undefined Behaviour. To print pointers use printf("%p", (void *)pointer);
The integer types which should be used when converting pointers are uintptr_t or intptr_t. For differences between pointers ptrdiff_t
To print uintptr_t or intptr_t you need to use PRIdPTR PRIiPTR PRIoPTR PRIuPTR PRIxPTR PRIXPTR printf formats. Example:
uintptr_t p = SOME_VALUE;
printf("UINTPTR_T printed as hex: %" PRIxPTR "\n", p);
What is the alternative for 0x%08x?
To properly print a pointer value, use %p with void* pointer:
printf("%p\n", (void*)&global_initialized_var);
It will use some implementation-specific form of printing it, most commonly it's the same as %#x format. If you want to get a specific numeric representation of the pointer, the best standard way would be to use the biggest possible integer type:
printf("%#jx\n", (uintmax_t)(uintptr_t)&global_initialized_var);
However, usually you can/want to just cast the pointer value to an integer and print it, as usually it's only for temporary debugging purposes, just like:
// Could truncate pointer value!
printf("%#llx\n", (unsigned long long)&global_initialized_var);
printf("%#lx\n", (unsigned long)&global_initialized_var);
printf("%#x\n", (unsigned)&global_initialized_var);
Note that unsigned has at least 16 bits, long has at least 32-bits and long long has at least 64-bits. Platforms are nowadays 64-bit, pointers are 64-bit - anyway, prefer the widest available type when printing pointer value, to be portable - thus the use uintmax_t above. It was common on unix platforms to use long to cast to/from pointers - on 64-bit unix long has 64-bits, while on windows it has 32-bits.
why does 0x%08x no longer work?
It is (and was) UB due to mis-matched types.
%x expects an unsigned. &global_initialized_var is a pointer.

How does assigning a string to int and passing that int to printf prints the string properly?

Why does this work? (i.e how is passing int to printf() results in printing the string)
#include<stdio.h>
int main() {
int n="String";
printf("%s",n);
return 0;
}
warning: initialization makes integer from pointer without a cast [enabled by default]
int n="String";
warning: format ‘%s’ expects argument of type ‘char *’, but argument 2 has type ‘int’ [-Wformat=]
printf("%s",n);
Output:String
compiler:gcc 4.8.5
In your code,
int n="String"; //conversion of pointer to integer
is highly-implementation dependent note 1
and
printf("%s",n); //passing incompatible type of argument
invokes undefined behavior. note 2
Don't do that.
Moral of the story: The warnings are there for a reason, pay heed to them.
Note 1:
Quoting C11, chapter §6.3.2.3
Any pointer type may be converted to an integer type. Except as previously specified, the
result is implementation-defined. If the result cannot be represented in the integer type,
the behavior is undefined. [....]
Note 2:
chapter §7.21.6.1
[....] If any argument is
not the correct type for the corresponding conversion specification, the behavior is
undefined.
and for the type of argument for %s format specifier with printf()
s If no l length modifier is present, the argument shall be a pointer to the initial
element of an array of character type. [...]
The behaviour of your program is undefined.
Essentially, you're assigning a const char* to an int, and the printf converts it back. But do regard that as entirely coincidental: you're not allowed to cast unrelated types like that.
C gives you the ability to shoot yourself in the foot.
int type can store 4 bytes numbers on most today's computers (from -2147483647 to 2147483647)
Which means it ""can"" store SOME address as well, only problem is when your address is bigger than 2147483647 it will cause overflow and you will not be able to get the address, (which is very bad for your program obviously)
An address is a number referring to memory space,
Pointers are made to store addresses, they are larger (8 bytes on 64bits systems, 4 bytes on 32bits systems) and they are also unsigned (only positive)
which means, when you affect int n="String"; if the address to "String" is under 2147483647 it wont cause issues and you code will run (DONT DO THAT)
http://www.tutorialspoint.com/c_standard_library/limits_h.htm
now if you think about it, you can guess why there is a 4GB ram limit on 32bit systems
(sorry about possible english mistakes, I m french)
Compiling with options like -Wall -Wextra -Werror -Wint-to-pointer-cast -pedantic (GCC) will show you very quickly that this behaviour should not be relied upon.

Compiler gives warning when printing the address of a variable

I made a very simple program to print the address of two variables.
#include<stdio.h>
int main()
{
int a,b;
printf("%u\n%u",&a,&b);
return 0;
}
But, the Clang-3.7 compiler gives warning as:
warning: format specifies type 'unsigned int' but the argument has type 'int *' [-Wformat]`
But, when I compiled with GCC-5.x, it gave no warnings. Which of them is correct?
One thing I know is that doing unsigned int num=&a; would be wrong as address can only be stored in a pointer. But, is it correct for the compiler to give warning when printing the address?
I compiled my program from gcc.godbolt.org
%p is the correct format specifier to print addresses:
printf("%p\n%p",(void*)&a, (void*)&b);
The C standard requires that the argument corresponding to %p should be of type void*. So the casts are there.
C11, Reference:
p The argument shall be a pointer to void. The value of the pointer
is converted to a sequence of printing characters, in an
implementation-defined manner.
Using incorrect format specifier is undefined behavior. A compiler is not required to produce any diagnostics for undefined behaviors. So both are gcc and clang are correct.
GCC 5.1 does produce warnings on my system without any additional
options. And GCC godbolt produces warnings with stricter compiler options: -Wall -Wextra. In general, you should compile with strictest compiler options.
The correct format specifier for printing an address (pointer) is %p and you need to cast the argument to void *.
Hence, the warning is valid and should be there.
But, when I compiled with GCC-5.x, it gave no warnings
In case of gcc, please include -WallNote compiler option and try to compile. I believe it will throw the (same) warning we're expecting.
Note: Actually, -Wformat, which checks for the type of supplied argument to printf() and scanf() family calls. -Wall enables -Wformat.
You got this error because format-specifier %u accepts unsigned-integer, whereas what you are supplying in your code is a pointer to a memory location.
You can print address of memory location held by pointer by specifying type of the argument as (void*), this will generate no errors and print the address of memory location in decimal format.
printf("%u\n%u",(void*)&a,(void*)&b);
Also, this is not the correct way to print a pointer, the correct way is to use specifier %p, which will print the address of the memory location in hexadecimal format.
printf("%p\n%p",(void*)&a,(void*)&b);
Hope that helps.

Different behavior of printf() in Turbo C and gcc when trying to print a pointer

The code below gives the output without error in Turbo C compiler, and gives the address of variable and its value both:
int a=5,*fp;
fp=&a;
printf("%d %d\n",fp,*fp);
But when I compile the same code in Linux with GCC compiler, it gives an error:
`warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘int *’ [-Wformat=]
printf("%d %d\n",fp,*fp);`
But the same code with the %p format specifier works in GCC Compiler ,to which I agree. The question is: how come it's working on Turbo C platform?
P.S. The issue is not that (in Turbo C) the error is not reported but it's that on Turbo C
it gives a signed integer value that is unchanged on repeated execution of the program; can it be garbage?
P.P.S Turbo C is running on MSDOS platform and GCC on 64-bit Linux, if that helps.
printf() assumes you will pass a signed int for the %d specifier.
On the platform and compiler you are using, ints and pointers are probably the same size and printf() is able to display it correctly. Your pointer is reinterpreted as an int.
Compiling and running this on a different platform where, for example, ints are 32-bit and pointers are 64-bit (such as gcc on x64) would be undefined behaviour. If you are lucky enough, it would crash. If not, you'd get garbage.
The %d conversion specifier requires the corresponding argument to have type [signed] int. If the corresponding argument in fact has a different type then the behavior is explicitly undefined. This is the case regardless of the relative sizes of the expected and actual types, and regardless of whether implicit or explicit conversion between those types is possible.
Neither the behavior of the compiler nor that of any part of any resulting compiled program is defined when the program exhibits undefined behavior. Turbo C is not required to diagnose the problem. On the other hand, gcc is quite permitted to diagnose it, and even to reject the source code with an error instead of merely alerting you with a warning. As far as C is concerned, the whole program's behavior can be absolutely anything if any undefined behavior is triggered -- from what the author intended (whatever that may be) to emitting rude comments from the machine's speaker, and far beyond.
In practice, the undefined behavior is likely (but by no means certain) to manifest in a relatively benign way if the expected and actual types are the same size, and the conversion specifier is not %s. Otherwise, all bets are off. Note in particular that many C implementations for 64-bit platforms feature 32-bit ints and 64-bit pointers.
Each conversion specifier, such as %d, specifies both the type of the required argument and the format used to print it.
%d requires an argument of type int (equivalently signed int), and prints it in decimal.
%u requires an argument of type unsigned int and prints it in decimal.
%x requires an argument of type unsigned int and prints it in hexadecimal.
And so forth.
In your code:
int a=5,*fp;
fp=&a;
printf("%d %d\n",fp,*fp);
the second %d is correct, since the corresponding argument, *fp is of type int. But the first is incorrect, since the corresponding argument fp is of type int*.
If you pass an argument of the wrong type for a conversion specifier, the behavior is undefined. A compiler is not required to warn you if you do this, since it's not possible to detect the error in the most general case (the format string doesn't have to be a string literal). Some compilers, including gcc, will analyze format strings if they're string literals and warn about mismatches. Turbo C apparently does not (that's not surprising, it's a very old compiler).
The correct format for printing a pointer value is %p. This requires an argument of type void*, and prints it in an implementation-defined manner. A pointer of a type other than void* should be converted.
The correct version of your code is:
int a = 5, *fp;
fp = &a;
printf("%p %d\n", (void*)fp, *fp);
It's a warning, not an error. The first compiler could also say the same, but doesn't.
The warning doesn't stop the compilation, so it will work too. They're just different compilers. Also, compiler accepting your program doesn't mean that the program is correct.

Resources