Printf output of pointer string explanation from an interview - c

I had an interview and I was given this code and asked what is the output for each one of these printf statements.
I have my answers as comments, but I am not sure about the rest.
Can anyone explain the different outputs for statements 1, 3 and 7 and why?
Thank you!
#include <stdio.h>
int main(int argc, const char * argv[]) {
char *s = "12345";
printf("%d\n", s); // 1.Outputs "3999" is this the address of the first pointer?
printf("%d\n", *s); // 2.The decimal value of the first character
printf("%c\n", s); // 3.Outputs "\237" What is this value?
printf("%c\n", *s); // 4.Outputs "1"
printf("%c\n", *(s+1)); // 5.Outputs "2"
printf("%s\n", s); // 6.Outputs "12345"
printf("%s\n", *s); // 7.I get an error, why?
return 0;
}

This call
printf("%d\n", s);
has undefined behavior because an invalid format specifier is used with a pointer.
This call
printf("%d\n", *s);
outputs the internal code (for example ASCII code) of the character '1'.
This call
printf("%c\n", s);
has undefined behavior due to using an invalid format specifier with a pointer.
These calls
printf("%c\n", *s);
printf("%c\n", *(s+1));
are valid. The first one outputs the character '1' and the second one outputs the character '2'.
This call
printf("%s\n", s);
is correct and outputs the string "12345".
This call
printf("%s\n", *s);
is invalid because an invalid format specifier is used with an object of the type char.

This code is undefined behaviour (UB). You are passing a pointer, where the function requires an int value. For example, in a 64-bit architecture, a pointer is 64 bit, and an int is 32 bit. You can be printing a truncated value.
You are passing the first char value (automatically converted to an int by the compiler) and print it in decimal. Probably you got 49 (the ASCII code for '1'. This is legal use, but be careful about surprises, as you can get negative values if your platform char implementation is signed.
You are printing the passed pointer reinterpreted as a char value. Undefined behaviour, as you cannot convert a pointer to a char value.
You are printing the pointed value of s as a char so you get the first character of string "12345" ('1').
You are printing the next to first char pointed to by s, so you get the second character of string ('2').
You are printing the string pointed to by s, so you get the whole string. This is legal and indeed, the common way to print a string.
You are passing the first character of string to be interpreted as a pointer to a null terminated string to be printed (which it isn't). This is undefined behaviour again. You are reinterpreting a char value as a pointer to a null terminated string. A SIGSEGV is common in this case, (but not warranted :) ) The signal is sent when the program tries to access unallocated memory before reaching the supposed null character that terminates the string (but it could find a '\0' in the way and just print rubbish).

The 7'th line is failing because a C style string is expected as an input, and you are placing a character instead.
Take a look at:
What does %s and %d mean in printf in the C language
C style strings guide

I used the following online C compiler in order to run your code,
and here are the results:
1. 4195988 - undefined behaviour (UB), manifesting here as the address
of the char array as you stated (for a 64 bit address you might or
might not get truncation)
2. 49 - ASCII value of '1'
3. � - undefined behaviour, manifesting here as unsupported ASCII value
for a truncation of the address of the array of chars
(placing 32-bit address into a char - assuming a 32-bit system)
4. 1 - obvious
5. 2 - obvious
6. 12345 - obvious
7. Segmentation fault - undefined behaviour, trying to place the first char
of a char array into a string reserved position
(placing char into a string)
Note on point number 3: we can deduce what took place during run-time.
In the specific example provided in the question -
printf("%c\n", s); // 3.Outputs "\237". What is this value?
This is a hardware/compiler/OS related behavior when handling the UB.
Why? Due to the output "\237" -> this implies truncation under the specific hardware system executing this code!
Please see the explanation below (assumption - 32-bit system):
char *s = "12345"; // Declaring a char pointer pointing to a char array
char c = s; // Placement of the pointer into a char - our UB
printf("Pointer to character array: %08x\n", s); // Get the raw bytes
printf("Pointer to character: %08x\n", c); // Get the raw bytes
printf("%c\n", s); // place the pointer as a character
// display is dependent on the ASCII value and the OS
// definitions for 128-255 ASCII values
The outputs:
Pointer to character array: 004006e4 // Classic 32-bit pointer
Pointer to character: ffffffe4 // Truncation to a signed char
// (Note signed MSB padding to 32 bit display)
� // ASCII value E4 = 228 is not displayed properly
The final printf command is equivalent to char c = s; printf("%c\n", c);.
Why? Thanks to truncation.
An additional example with a legitimate ASCII character output:
char *fixedPointer = 0xABCD61; // Declaring a char pointer pointing to a dummy address
char c = fixedPointer; // Placement of the pointer into a char - our UB
printf("Pointer to 32-bit address: %08x\n", fixedPointer); // Get the raw bytes
printf("Pointer to character: %08x\n", c); // Get the raw bytes
printf("%c\n", fixedPointer);
And the actual outputs:
Pointer to 32-bit address: 00abcd61
Pointer to character: 00000061
a

Related

What's the length of a string in C when I use the "\x00" to interrupt a string?

char buf1[1024] = "771675175\x00AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA";
char buf2[1024] = "771675175\x00";
char buf3[1024] = "771675175\0AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA";
char buf4[1024] = "771675175\0";
char buf5[1024] = "771675175";
buf5[9] = 0;
char buf6[1024] = "771675175";
buf6[9] = 0;
buf6[10] = "A";
printf("%d\n", strlen(buf1));
printf("%d\n", strlen(buf2));
printf("%d\n", strlen(buf3));
printf("%d\n", strlen(buf4));
printf("%d\n", strlen(buf5));
printf("%d\n", strlen(buf6));
if("\0" == "\x00"){
printf("YES!");
}
Output:
10
9
9
9
9
9
YES!
As shown above, I use the "\x00" to interrupt a string.
As far as I know, when the strlen() meet the "\x00", it will return the number of characters before the terminator, and does not include the "\x00".
But here, why is the length of the buf1 equal to 10?
As pointed out in the comments section, hexadecimal escape sequences have no length limit and terminate at the first character that is not a valid hexadecimal digit. All of the subsequent A characters are valid hexadecimal digits, so they are part of the escape sequence. Therefore, the result of the escape sequence does not fit in a char, so the result is unspecified.
You should change
char buf1[1024] = "771675175\x00AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA";
to:
char buf1[1024] = "771675175\x00" "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA";
Also, strlen returns a value of type size_t. The correct printf format specifier for size_t is %zu, not %d. Even if %d works on your platform, it may fail on other platforms.
The following program will print the desired result of 9:
#include <stdio.h>
#include <string.h>
int main( void )
{
char buf1[1024] = "771675175\x00" "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA";
printf( "%zu\n", strlen(buf1) );
}
Also, it is worth nothing that the following line does not make sense:
if("\0" == "\x00")
In that if condition, you are comparing the addresses of two pointers, which point to string literals. It depends on the compiler whether it is storing both string literals in the same memory location. Some compilers may merge identical string literals into the same memory location, some may not. Normally, this is irrelevant to the programmer. Therefore, it does not make much sense to compare these memory addresses.
You probably wanted to write the following instead, which will compare the actual character values:
if( '\0' == '\x00' )
There is a big difference between a string literal and a character constant.

What does the 2nd argument in strtoul() function do?

According to this document,
The second argument (char **endptr) seems to be a waste of space! If
it is set to NULL, STRTOL seems to work its way down the string until
it finds an invalid character and then stops. All valid chars read are
then converted if the string starts with an invalid character the
function returns ZERO (0).
It means that the following code should detect 2 as the hex number:
int main()
{
char * string = "p1pp2ppp";
unsigned integer = strtoul(string, NULL, 16);
printf("%u", integer);
return 0;
}
but, it is returning zero.
Why?
The man page says the following about the second argument:
If endptr is not NULL, strtol() stores the address of the first
invalid character in *endptr. If there were no digits at all,
strtol() stores the original value of nptr in *endptr (and
returns 0). In particular, if *nptr is not '\0' but **endptr is
'\0' on return, the entire string is valid.
For example:
char str[] = "123xyz45";
char *p;
long x = strtol(str, &p, 10);
printf("x=%ld\n", x);
printf("p - str = %d\n", p - str);
printf("*p = %c\n", *p);
printf("p (as string) = %s\n", p);
Output:
x=123
p - str = 3
*p = x
p (as string) = xyz45
We can see that when strtol returns p points to the first character in str that cannot be converted. This can be used to parse through the string a bit at a time, or to see if the entire string can be converted or if there are some extra characters.
In the case of your example, the first character in string, namely "p" is not a base 10 digit so nothing gets converted and the function returns 0.
Why?
It's returning 0 because "p..." does not follow any rules about integer representation. The 2nd argument is not relevant for your question.
The char **endptr argument in all the strto* functions is intended to receive the address of the first character that isn’t part of a valid integer (decimal, hex, or octal) or floating point number. Far from useless, it’s handy for checking invalid input. For example, if I meant to type in 1234 but fat-fingered something like 12w4, strtoul will return 12 and set the endptr argument to point to w.
Basically, if the character endptr points to isn’t whitespace or 0, then the input should most likely be rejected.

Why do I use a star for pointers when printing chars but not strings in C?

Alright I've got two questions, both based on the same snippet of sample code provided by my professor:
char arr[3][3] = {
{ '0', '1', '2' },
{ '3', '4', '5' },
{ '6', '7', '\0' }
};
char* base = &arr[0][0], *a = &arr[0][0];
a = base + 5;
printf("base=%d a=%d", base, a);
printf("*a = %c ", *a);
a = base + 3;
printf("row = %s", a); // HERE!
So in the line I marked, we reference a instead of *a. This is what I don't understand. Isn't a a number? I understand that a string knows to just go ham on an array of characters until it hits a null terminator, but I don't understand why you don't need the star. Is a not just some long ass number? Does formatting it with %s make it know to follow the pointer, and if so, why doesn't %c do the same? Why does using the star when trying to print with %s cause an exception?
The %s format specifier takes a pointer and prints the string of characters it points to stopping at a zero byte. If you dereference a pointer to a character, all you get is a single character, exactly what %c needs.
The different format specifiers tell printf how to interpret the argument:
%d treats it as a normal integer. Passing it a pointer (a) results in undefined behavior, but is likely to print part of the pointer's address (a large number)
to properly print the address of a pointer, use %p instead of %d.
%c treats it as a character (but still takes an int; char is implicitly converted). Your code passes *a, which is indeed a character: the char that a points to.
%s treats it as a pointer to the beginning of a null-terminated string of chars. Your code passes a, which is that: a pointer-to-char. It will follow the pointer along, printing characters until it hits \0.
So the type of the expression a is pointer-to-char, while *a is char. If you use a char with %s, printf will try to use it as a pointer. Since this "points to" a memory location between 0x00 and 0xFF, it will give you a segmentation fault for invalid memory access.

How do I print what's inside a pointer by using C and the printf function?

I'm trying to follow some steps in a book. This is the exact code that's in the book but I'm getting an error message.
Both of the printf statements are the problem:
printf(pointer);
printf(pointer2);
How do I fix this to actually print what's inside the pointer?
#include <stdio.h>
#include <string.h>
int main(void)
{
char str_a[20]; //A 20-element array
char *pointer; //A pointer, meant for a character array
char *pointer2; //And yet another one
strcpy(str_a, "Hello World\n");
pointer = str_a; //Set the first pointer to the start of the array
printf(pointer);
pointer2 = pointer + 2; //Set the second one 2 bytes further in.
printf(pointer2);
strcpy(pointer2, "y you guys\n"); //Copy into that spot.
printf(pointer);
return 0;
}
Try
printf("%s", str_a);
Now, if you want to print the address of the variable itself, you can try:
int a = 5;
printf("%p\n",(void*)&a);
Use
printf("%s", str_a);
It will print your str_a. But keep in mind that every char*-string in C has to be terminated by a \0-character. If it is not terminated, then everything that is in the ram after the string is also printed and accessed.
In the best case this results in a SIGSEGV, and your programm terminates. In the worst case somebody can use this to print for example plaintext password data stored in the RAM right beside the string you tried to print.
Read about "buffer overflow" and "stack overflow".
If you define the string by
const char* str = "Hello World";
then C will automatically add the \0 character for you, and the actual length of the string is 12 Bytes (for 11 Characters).
But if you go by strcpy or by reading it from stdin or from any untrusted source (like network) then you have a security leak.
But just for testing printf("%s", str_a) is just fine.
Other parameters for printf would be:
d or i Signed decimal integer 392
u Unsigned decimal integer 7235
o Unsigned octal 610
x Unsigned hexadecimal integer 7fa
X Unsigned hexadecimal integer (uppercase) 7FA
f Decimal floating point, lowercase 392.65
F Decimal floating point, uppercase 392.65
e Scientific notation (mantissa/exponent), lowercase 3.9265e+2
E Scientific notation (mantissa/exponent), uppercase 3.9265E+2
g Use the shortest representation: %e or %f 392.65
G Use the shortest representation: %E or %F 392.65
a Hexadecimal floating point, lowercase -0xc.90fep-2
A Hexadecimal floating point, uppercase -0XC.90FEP-2
c Character a
s String of characters sample
p Pointer address b8000000
n Nothing printed.
(source http://www.cplusplus.com/reference/cstdio/printf/)
You use these parameters like:
printf("%i: %f and i am a Character: [%a]", 10, 4.4, (char)a);

printing int array as string

I am trying to print int array with %s. But it is not working. Any ideas why?
#include<stdio.h>
main() {
int a[8];
a[0]='a';
a[1]='r';
a[2]='i';
a[3]='g';
a[4]='a';
a[5]='t';
a[6]='o';
a[7] = '\0';
printf("%s", a);
}
It prints just a.
I tried with short as well, but it also does not work.
This is because you are trying to print a int array, where each element has a size of 4 byte (4 chars, on 32bit machines at least). printf() interprets it as char array so the first element looks like:
'a' \0 \0 \0
to printf(). As printf() stops at the first \0 it finds, it only prints the 'a'.
Use a char array instead.
Think about the way integers are represented - use a debugger if you must. Looking at the memory you will see plenty of 0 bytes, and %s stops when it reaches a 0 byte.
It prints just a.
That's why it prints just a. Afterwards it encounters a 0 byte and it stops.
Because you declared a as an integer, so those signle characters you initialized would result in an error. You must change it to a char variable. However to save time, just make the variable a pointer using the asterisk character, which then allows you to make a single string using double quotes.
int a[8] means array of 8 ints or 8*(4 bytes) - Say 32 bit architecture
a[0] = 'a' stores in the first int index as 'a''\0''\0''\0'
a[1] = 'r' as 'r''\0''\0''\0' and so on . . .
%s represents any C-style string ie. any string followed by a '\0' character
So
printf("%s", a);
searches for trailing '\0' character and just prints "a" assuming it is the entire string

Resources