What actually happens when I do this?
{
char * str = "%d\n";
str++;
str++;
printf(str-2,300);
return 0;
}
Intuitively, it appears that the number on the screen will be 300, but I want to know, what gets stored in str.
Edit: It will be great if someone can tell me, when do we actually do this?
Thanks!
str is a memory address, initially the address of the % sign of the string literal %d\n. This literal is created because it is in your code.
Two increments make str point to the character \n and when this is the case, str - 2 is the address of the % sign. So printf sees the format string %d\n, and as usual it prints the first argument after the format string as an integer. The fact is, printf does not care about the origins of the format string. It doesn't matter if you can create it on-the-fly, or hard code it.
We don't do this generally. Sometimes you need to fiddle with a character pointer to scan a string, extract something out of a string, or to skip some prefix of a string.
str is a pointer on the stack. It initially points to (ie, holds the address of) the start of the string literal "%d\n" (this is probably stored in a read-only section of your program by the compiler).
Let's say for example the string literal (the "$d\n") is stored at 0x5000. So (assuming UTF-8 or ASCII) the byte at 0x5000 is %, the byte at 0x5001 is d, 0x5002 is \n (the newline) and at 0x5003 it is \0 (the terminating null character)
str is initially holding the address 0x5000. str++ would increase it to 0x5001, meaning it now points to the string "d\n", ie one character into the string literal "%d\n". Likewise, str++ again moves it to 0x5002, ie the string "\n", two characters into the string literal "%d\n". Note that all of these are still terminated by the null character at 0x5003 (how C knows when the string ends).
The printf call has the format string as the first argument. str at this point holds 0x5002, so the call is saying 'Use the format string starting at 0x5002 - 2 = 0x5000', which turns out to be the same string that we started with.
Thus it will be the same as calling
printf("%d\n",300)
and will print out 300.
Well you are declaring a char pointer. This pointer will hold a RAM address from where you will write the following bytes: % (1 byte) d (1 byte) \n (1byte on UNIX, 2 bytes on windows) and \0 the null terminating byte that ends your string.
Then you increment by two your pointer value (which is the address of the first byte) then decrement by two. So basically you do nothing. Thus when calling printf() src-2 will point to %d\n and the null terminating byte will make it exactly pass %d\n.
So at the end of the day what you are doing is:
printf("%d\n", 300); Hence the 300 output.
Related
I tried this chunk of code:
char string_one[8], string_two[8];
printf("&string_one == %p\n", &string_one);
printf("&string_two == %p\n", &string_two);
strcpy(string_one, "Hello!");
strcpy(string_two, "Long string");
printf("string_one == %s\n", string_one);
printf("string_two == %s\n", string_two);
And got this output:
&string_one == 0x7fff3f871524
&string_two == 0x7fff3f87151c
string_one == ing
string_two == Long string
Since the second string length value is greater than the specified size of the respective array, the characters which subscript values are greater than the specified array size are stored in the next bytes, which belong to the first array as the addresses show. Obviously the first string is overwritten.
There is no way the second array can hold the whole string, it is too big. Nevertheless, the output prints the whole string.
I speculated for a while and came to a conclusion that the printf() function keeps outputting characters from the next bytes until it comes across a string terminator '\0'. I did not find any confirmation for my pondering, so the question is are these speculations correct?
From the C Standard (5.2.1 Character sets)
2 In a character constant or string literal, members of the execution
character set shall be represented by corresponding members of the
source character set or by escape sequences consisting of the
backslash \ followed by one or more characters. A byte with all bits
set to 0, called the null character, shall exist in the basic
execution character set; it is used to terminate a character string.
And (7.21.6.1 The fprintf function)
8 The conversion specifiers and their meanings are:
s If no l length modifier is present, the argument shall be a pointer
to the initial element of an array of character type.273) Characters
from the array are written up to (but not including) the terminating
null character.
My compiler(GCC) said:
warning: ‘__builtin_memcpy’ writing 12 bytes into a region of size 8 overflows the destination [-Wstringop-overflow=]
strcpy(string_two, "Long string");
And just to show how optimizations will take everything that you think you know and turn it on its head, here's what happens if you compile this on a 64-bit PowerPC Power-9 (aka not x86) with gcc -O3 -flto
$ ./char-array-overlap
&string_one == 0x7fffc502bef0
&string_two == 0x7fffc502bef8
string_one == Hello!
string_two == Long string
Because if you look at the machine code it never executes strcpy at all.
Consider this code snippet (simplified syntax for clarity).
void simple (char *bar) {
char MyArray[12];
strcpy(MyArray, bar);
}
My instructor says that MyArray can copy at most 12 elements from bar, but from what I've read, MyArray can only store 11 characters because it needs room for the null character at the end. So if the received value of bar is 12 or greater, a buffer overflow would occur. My instructor says that this will only happen if the received value of bar
is 13 or greater. Who's right? I'd appreciate if you could cite a credible source so I can convince him.
The definition char MyArray[12] creates an array of 12 char, which can be used to store a string. Since strings in C are null terminated, one of those characters needs to be able to store the null byte at the end of the string.
So a variable of type char [12] can hold a string of at most 11 characters. Attempting to copy a string of length 12 or longer using strcpy as in your example will overflow the bounds of the array.
If you were to use strncpy as follows:
strncpy(MyArray, bar, 12);
This will not overflow the buffer, as it would copy at most 12 characters. However, if 12 characters are copied, that means the string is not null terminated and is therefore not technically a string. Then attempting to use any other string function on MyArray that expect a null terminated string would read off the end of the array.
The/a proper use of strncpy would be:
void simple(char *bar) {
char MyArray[12];
strncpy(MyArray, bar, sizeof(MyArray)-1);
MyArray[sizeof(MyArray)-1]= '\0';
}
This just puts in a terminating null character, whether strncpy was able to do that or not.
It's hard to tell, because your question is a bit confusingly worded, but I think you're right, and that your instructor is wrong.
Given the code
void simple(char *bar) {
char MyArray[12];
strcpy(MyArray, bar);
}
if the passed-in bar points to a string of 11 or fewer characters, a valid string will be copied to MyArray, with no buffer overflow. But if the string is 12 (or more) characters long, you're right, there'll be a buffer overflow, because strcpy will also copy the 13th, terminating null character.
Earlier you asked about strncpy. Given the code
void simple2(char *bar) {
char MyArray[12];
strncpy(MyArray, bar, 12);
}
if the passed-in bar points to a string of 11 or fewer characters, a valid string will be copied to MyArray. But if the string is 12 characters long, we have a different problem. strncpy will copy 12 characters and stop, meaning that it won't copy the terminating null character. There won't be a buffer overflow, but MyArray still won't end up containing a valid string.
Also, you asked for a credible source. I wrote the C FAQ list -- would you consider that credible? :-)
My instructor says that MyArray can copy at most 12 elements from bar
It will be more correctly to say that the array MyArray may accomodate at most 12 elements of the array bar. Otherwise there will be an attempt to access memory beyond the array.
So in fact your instructor is right.
The array MyArray is declared having only 12 elements
char MyArray[12];
but from what I've read, MyArray can only store 11 characters because
it needs room for the null character at the end
The terminating zero is also a character. And the function strcpy copies all characters from the source string including the terminating zero that is present in the source string.
So if the received value of bar is 12 or greater, a buffer overflow
would occur
What does mean the magic number 12 in this context? is it the number of characters in the array bar or it is the length of the string stored in the array bar (that used as an argument is converted to pointer to its first element)?
If the number 12 means the size of the string stored in the array bar then the function strcpy will try to copy all characters of the array including the terminating zero and in this case the array MyArray has to be declared as having 13 elements.
char MyArray[13];
However if the number 12 means the number of elements in the array bar (used as an argument of the function) and it contains a string then the length of the string is evidently is less than 12. So the array MyArray can accept all characters of the source array including the terminating zero.
So the reason for the confusion is that you did not make a common conclusion what the number 12 means whether it is the length of the source string or it is the size of the source array.
In the first case there will be indeed undefined behavior.
In the second case if the source array contains a string then the code will be well-formed.
A char array and string are similar, but not the same.
In C,
A string is a contiguous sequence of characters terminated by and including the first null character. C11dr §7.1.1 1
void simple (char *bar) {
char MyArray[12];
strcpy(MyArray, bar);
}
My instructor says that MyArray can copy at most 12 elements from bar,
This is correct: MyArray[] can receive up to 12 characters.
strcpy() copies the memory, starting at bar to the array MyArray[] and continues until it copies a null character. If more that 12 characters (the count of 12 includes the null character) are attempted to be copied, the result is undefined behavior (UB).
MyArray can only store 11 characters
Not quite. MyArray[] can store 12 characters. To treat that data as a string, a null character must be one of those 12. When interpreted as a string, the string include all the characters up to the null character. It also include the null chracter. Each element of MyArray[] could be an 'x', but then that memory would not be a string as it lacks a null character.
So if the received value of bar is 12 or greater, a buffer overflow would occur.
Not quite. If the strcpy() attempts to write outside MyArray[], the result is undefined. Buffer overflow may occur. The program may stop, etc. The result is not defined. It is undefined behavior.
My instructor says that this will only happen if the received value of bar is 13 or greater.
bar is a pointer - it likely does not have a "value of 13". bar likely points to memory that is a string. A string includes its terminating null character, so the string may consists of 12 non-null characters and a final null character for a total of 13 characters. MyArray[] is insufficient to store a copy of that string.
Who's right?
I suspect the dis-connect is in the imprecise meaning of "bar is 13"`. I see nothing the reported by the instructor as incorrect.
Here's what the Beez C guide (LINK) tells about the %[] format specifier:
It allows you to specify a set of characters to be stored away (likely in an array of chars). Conversion stops when a character that is not in the set is matched.
I would appreciate if you can clarify some basic questions that arise from this premise:
1) Are the input fetched by those two format specifiers stored in the arguments(of type char*) as a character array or a character array with a \0 terminating character (string)? If not a string, how to make it store as a string , in cases like the program below where we want to fetch a sequence of characters as a string and stop when a particular character (in the negated character set) is encountered?
2) My program seems to suggest that processing stops for the %[^|] specifier when the negated character | is encountered.But when it starts again for the next format specifier,does it start from the negated character where it had stopped earlier?In my program I intend to ignore the | hence I used %*c.But I tested and found that if I use %c and an additional argument of type char,then the character | is indeed stored in that argument.
3) And lastly but crucially for me,what is the difference between passing a character array for a %s format specifier in printf() and a string(NULL terminated character array)?In my other program titled character array vs string,I've passed a character array(not NULL terminated) for a %s format specifier in printf() and it gets printed just as a string would.What is the difference?
//Program to illustrate %[^] specifier
#include<stdio.h>
int main()
{
char *ptr="fruit|apple|lemon",type[10],fruit1[10],fruit2[10];
sscanf(ptr, "%[^|]%*c%[^|]%*c%s", type,fruit1, fruit2);
printf("%s,%s,%s",type,fruit1,fruit2);
}
//character array vs string
#include<stdio.h>
int main()
{
char test[10]={'J','O','N'};
printf("%s",test);
}
Output JON
//Using %c instead of %*c
#include<stdio.h>
int main()
{
char *ptr="fruit|apple|lemon",type[10],fruit1[10],fruit2[10],char_var;
sscanf(ptr, "%[^|]%c%[^|]%*c%s", type,&char_var,fruit1, fruit2);
printf("%s,%s,%s,and the character is %c",type,fruit1,fruit2,char_var);
}
Output fruit,apple,lemon,and the character is |
It is null terminated. From sscanf():
The conversion specifiers s and [ always store the null terminator in addition to the matched characters. The size of the destination array must be at least one greater than the specified field width.
The excluded characters are unconsumed by the scan set and remain to be processed. An alternative format specifier:
if (sscanf(ptr, "%9[^|]|%9[^|]|%9s", type,fruit1, fruit2) == 3)
The array is actually null terminated as remaining elements will be zero initialized:
char test[10]={'J','O','N' /*,0,0,0,0,0,0,0*/ };
If it was not null terminated then it would keep printing until a null character was found somewhere in memory, possibly overruning the end of the array causing undefined behaviour. It is possible to print a non-null terminated array:
char buf[] = { 'a', 'b', 'c' };
printf("%.*s", 3, buf);
1) Are the input fetched by those two format specifiers stored in the
arguments(of type char*) as a character array or a character array
with a \0 terminating character (string)? If not a string, how to make
it store as a string , in cases like the program below where we want
to fetch a sequence of characters as a string and stop when a
particular character (in the negated character set) is encountered?
They're stored in ASCIIZ format - with a NUL/'\0' terminator.
2) My program seems to suggest that processing stops for the %[^|]
specifier when the negated character | is encountered.But when it
starts again for the next format specifier,does it start from the
negated character where it had stopped earlier?In my program I intend
to ignore the | hence I used %*c.But I tested and found that if I use
%c and an additional argument of type char,then the character | is
indeed stored in that argument.
It shouldn't consume the next character. Show us your code or it didn't happen ;-P.
3) And lastly but crucially for me,what is the difference between
passing a character array for a %s format specifier in printf() and a
string(NULL terminated character array)?In my other program titled
character array vs string,I've passed a character array(not NULL
terminated) for a %s format specifier in printf() and it gets printed
just as a string would.What is the difference?
(edit: the following addresses the question above, which talks about array behaviours generally and is broader than the code snippet in the question that specifically posed the case char[10] = "abcd"; and is safe)
%s must be passed a pointer to a ASCIIZ text... even if that text is explicitly in a char array, it's the mandatory presence of the NUL terminator that defines the textual content and not the array length. You must NUL terminate your character array or you have undefined behaviour. You might get away with it sometimes - e.g. strncpy into the array will NUL terminate it if-and-only-if there's room to do so, and static arrays start with all-0 content so if you only overwrite before the final character you'll have a NUL, your char[10] example happens to have elements for which values aren't specified populated with NULs, but you should generally take responsibility for ensuring that something is ensuring NUL termination.
Can I do
write(&'\n', 1);
and is it equivalent to
char a = '\n';
write(&a, 1);
How would you solve this in a fashion way?
I'm tring to write the new-line caracter with a function that only take char array as first argument, and its dimension in second argument (dimension has to be specified because \0 is a valid writable character)
As others already pointed out, you cannot take the address of a character literal.
And even if you could, it would be the wrong type, because a character array usually must be zero-terminated.
What you are looking for is:
write("\n", 1);
The reason that you can't take the address of a character literal, is because the compiler actually replaces it by the characters actual ASCII value, in this case 10. So your code actually is:
write(&10, 1);
And you can't take the address of a number literal, because it's most likely not stored in memory but part of the actual generated code.
You can do this:
write("\n", 1);
What you're asking is for two different things:
'\n' is a character literal, and will be treated as an integer literal.
char a is a character variable.
you can get the address of a variable, not a literal.
char label[8] = "abcdefgh";
char arr[7] = "abcdefg";
printf("%s\n",label);
printf("%s",arr);
====output==========
abcdefgh
abcdefgÅ
Why Å is appended at the end of the string arr?
I am running C code in Turbo C ++.
printf expects NUL-terminated strings. Increase the size of your char arrays by one to make space for the terminating NUL character (it is added automatically by the = "..." initializer).
If you don't NUL-terminate your strings, printf will keep reading until it finds a NUL character, so you will get a more or less random result.
Your variables label and arr are not strings. They are arrays of characters.
To be strings (and for you to be able to pass them to functions declared in <string.h>) they need a NUL terminator in the space reserved for them.
Definition of "string" from the Standard
7.1.1 Definitions of terms
1 A string is a contiguous sequence of characters terminated by and including
the first null character. The term multibyte string is sometimes used
instead to emphasize special processing given to multibyte characters
contained in the string or to avoid confusion with a wide string. A pointer
to a string is a pointer to its initial (lowest addressed) character. The
length of a string is the number of bytes preceding the null character and
the value of a string is the sequence of the values of the contained
characters, in order.
Your string is not null terminated, so printf is running into junk data. You need to use the '\0' at the end of the string.
Using GCC (on Linux), it prints more garbage:
abcdefgh°ÃÕÄÕ¿UTÞÄÕ¿UTÞ·
abcdefgabcdefgh°ÃÕÄÕ¿UTÞÄÕ¿UTÞ·
This is because, you are printing two character arrays as strings (using %s).
This works fine:
char label[9] = "abcdefgh\0"; char arr[8] = "abcdefg\0";
printf("%s\n",label); printf("%s",arr);
However, you need not mention the "\0" explicitly. Just make sure the array size is large enough, i.e 1 more than the number of characters in your strings.