Printf outputs characters beyond the specified length of the array - arrays

I tried this chunk of code:
char string_one[8], string_two[8];
printf("&string_one == %p\n", &string_one);
printf("&string_two == %p\n", &string_two);
strcpy(string_one, "Hello!");
strcpy(string_two, "Long string");
printf("string_one == %s\n", string_one);
printf("string_two == %s\n", string_two);
And got this output:
&string_one == 0x7fff3f871524
&string_two == 0x7fff3f87151c
string_one == ing
string_two == Long string
Since the second string length value is greater than the specified size of the respective array, the characters which subscript values are greater than the specified array size are stored in the next bytes, which belong to the first array as the addresses show. Obviously the first string is overwritten.
There is no way the second array can hold the whole string, it is too big. Nevertheless, the output prints the whole string.
I speculated for a while and came to a conclusion that the printf() function keeps outputting characters from the next bytes until it comes across a string terminator '\0'. I did not find any confirmation for my pondering, so the question is are these speculations correct?

From the C Standard (5.2.1 Character sets)
2 In a character constant or string literal, members of the execution
character set shall be represented by corresponding members of the
source character set or by escape sequences consisting of the
backslash \ followed by one or more characters. A byte with all bits
set to 0, called the null character, shall exist in the basic
execution character set; it is used to terminate a character string.
And (7.21.6.1 The fprintf function)
8 The conversion specifiers and their meanings are:
s If no l length modifier is present, the argument shall be a pointer
to the initial element of an array of character type.273) Characters
from the array are written up to (but not including) the terminating
null character.

My compiler(GCC) said:
warning: ‘__builtin_memcpy’ writing 12 bytes into a region of size 8 overflows the destination [-Wstringop-overflow=]
strcpy(string_two, "Long string");

And just to show how optimizations will take everything that you think you know and turn it on its head, here's what happens if you compile this on a 64-bit PowerPC Power-9 (aka not x86) with gcc -O3 -flto
$ ./char-array-overlap
&string_one == 0x7fffc502bef0
&string_two == 0x7fffc502bef8
string_one == Hello!
string_two == Long string
Because if you look at the machine code it never executes strcpy at all.

Related

Why the strlen() function doesn't return the correct length for a hex string?

I have a hex string for example \xF5\x17\x30\x91\x00\xA1\xC9\x00\xDF\xFF, when trying to use strlen() function to get the length of that hex string it returns 4!
const char string_[] = { "\xF5\x17\x30\x91\x00\xA1\xC9\x00\xDF\xFF" };
unsigned int string_length = strlen(string_);
printf("%d", string_length); // the result: 4
Is the strlen() function dealing with that hex as a string, or is something unclear to me?
For string functions in the C standard library, a character with value zero, also called a null character, marks the end of a string. Your string contains \x00, which designates a null character, so the string ends there. There are four non-null characters before it, so strlen returns four.
C 2018 7.1.1 1 says:
A string is a contiguous sequence of characters terminated by and including the first null character… The length of a string is the number of bytes preceding the null character…
C 2018 7.24.6.3 2 says:
The strlen function computes the length of the string pointed to by s [its first argument].
You could compute the size of your array as sizeof string_ (because it is an array of char) or sizeof string_ / sizeof *string_ (to compute the number of elements regardless of type), but this will include a terminating null character because defining an array with [] and letting the length be computed from a string literal initializer includes the terminating null character of the string literal. You may need to hard-code the length of the array, possibly using #define to define a preprocessor macro, and use that length in the array definition and in other places where the length is needed.
It is because you have zero at index [4]
string_[0] == 0xF5
string_[1] == 0x17
string_[2] == 0x30
string_[3] == 0x91
string_[4] == 0
...
"\xf5" puts char having integer value 0xf5 at position [0]
To see it as a string you need to escape the \ character
const char string_[] = "\\xF5\\x17\\x30\\x91\\x00\\xA1\\xC9\\x00\\xDF\\xFF";
At compile time, your "string" appears as consecutive hex values expressed in C syntax inside a pair of quotation marks.
strlen() is a run time function that scans through a series of bytes, looking for the first instance of a zero-value byte.
It's good to understand the difference between "compile time" and "run time".

Simple single char array encryption needs an artificially long array to work?

Running a simple encryption on a single char array. It doesn't seem to work when the array size is less than or equal to 1, even though only a single char is changing.
The below works because yesCrypto[10] is set to 10 (or > 1).
char noCrypto[] = "H"; //sets an array to hold unencrypted H
char yesCrypto[10]; //sets array to hold encrypted H
yesCrypto[0]=noCrypto[0]+1;
//takes 'H' from noCrypto and turns it into an 'I' and moves it into yesCrypto.
printf("Encrypted string is '%s'\n", yesCrypto);
//prints Encrypted version of 'H', 'I'
The below does not work because yesCrypto[0] is set to 0, also does not work when set to 1.
char noCrypto[] = "H"; //sets an array to hold unencrypted H
char yesCrypto[1]; //sets array to hold encrypted H
yesCrypto[0]=noCrypto[0]+1;
//takes 'H' from noCrypto and turns it into an 'I' and moves it into yesCrypto.
printf("Encrypted string is '%s'\n", yesCrypto);
//prints 'IH'
Side question: why is it printing IH when it is not working probably.
Code is attempting to print a character array that is not a string using "%s".
yesCrypto[] is not certainly null character terminated.
char yesCrypto[10];
yesCrypto[0] = noCrypto[0]+1;
printf("Encrypted string is '%s'\n", yesCrypto); // bad
Instead, limit printing or append a null character.
// 1 is the maximum number of characters to print
printf("Encrypted string is '%.*s'\n", 1, yesCrypto);
// or
yesCrypto[1] = '\0';
printf("Encrypted string is '%s'\n", yesCrypto);
OP's 2nd code is just bad as object arrays of length 0 lack defined behavior.
// bad
char yesCrypto[0];
OP's edited post uses char yesCrypto[1];. In that case use
yesCrypto[0] = noCrypto[0]+1;
printf("Encrypted string is '%.*s'\n", 1, yesCrypto);
// or
printf("Encrypted character is '%c'\n", yesCrypto[0]);
Fundamentally, printing encrypted data as a string is a problem as the encrypted character array may contain a null character in numerous places and a string requires a null character and ends with the first one.
In the first case, you're supplying an array (as an argument to %s) which is not null-terminated.
Quoting C11, chapter §7.21.6.1,
s
If no l length modifier is present, the argument shall be a pointer to the initial
element of an array of character type.280) Characters from the array are
written up to (but not including) the terminating null character. If the
precision is specified, no more than that many bytes are written. If the
precision is not specified or is greater than the size of the array, the array shall
contain a null character.
In this case, yesCrypto being an automatic local array and left uninitialized, the contents are indeterminate, so there's no guarantee of a null being present in the array. So the usage causes undefined behavior.
What you're seeing in the second case is undefined behavior, too.
Quoting C11, chapter §6.7.6.2
In addition to optional type qualifiers and the keyword static, the [ and ] may delimit
an expression or *. If they delimit an expression (which specifies the size of an array), the
expression shall have an integer type. If the expression is a constant expression, it shall
have a value greater than zero. [...]
So, the later code (containing char yesCrypto[0];) has Constraints violations, it invokes UB.
A note on why this might not produce a compilation error:
gcc does have an extension which supports zer-length arrays, but the use case is very specific and since C99, the "flexible array member" is a standadized choice over this extension.
Finally, for
...also does not work when set to 1....
will lack the space for a null-terminator, raising the same issue as in the very first case. To put it in simple words, to make a char array behave like a string containing n elements, you need
size of the array to be n+1
index n to contain a null character ('\0').

Storing a format specifier in a pointer

What actually happens when I do this?
{
char * str = "%d\n";
str++;
str++;
printf(str-2,300);
return 0;
}
Intuitively, it appears that the number on the screen will be 300, but I want to know, what gets stored in str.
Edit: It will be great if someone can tell me, when do we actually do this?
Thanks!
str is a memory address, initially the address of the % sign of the string literal %d\n. This literal is created because it is in your code.
Two increments make str point to the character \n and when this is the case, str - 2 is the address of the % sign. So printf sees the format string %d\n, and as usual it prints the first argument after the format string as an integer. The fact is, printf does not care about the origins of the format string. It doesn't matter if you can create it on-the-fly, or hard code it.
We don't do this generally. Sometimes you need to fiddle with a character pointer to scan a string, extract something out of a string, or to skip some prefix of a string.
str is a pointer on the stack. It initially points to (ie, holds the address of) the start of the string literal "%d\n" (this is probably stored in a read-only section of your program by the compiler).
Let's say for example the string literal (the "$d\n") is stored at 0x5000. So (assuming UTF-8 or ASCII) the byte at 0x5000 is %, the byte at 0x5001 is d, 0x5002 is \n (the newline) and at 0x5003 it is \0 (the terminating null character)
str is initially holding the address 0x5000. str++ would increase it to 0x5001, meaning it now points to the string "d\n", ie one character into the string literal "%d\n". Likewise, str++ again moves it to 0x5002, ie the string "\n", two characters into the string literal "%d\n". Note that all of these are still terminated by the null character at 0x5003 (how C knows when the string ends).
The printf call has the format string as the first argument. str at this point holds 0x5002, so the call is saying 'Use the format string starting at 0x5002 - 2 = 0x5000', which turns out to be the same string that we started with.
Thus it will be the same as calling
printf("%d\n",300)
and will print out 300.
Well you are declaring a char pointer. This pointer will hold a RAM address from where you will write the following bytes: % (1 byte) d (1 byte) \n (1byte on UNIX, 2 bytes on windows) and \0 the null terminating byte that ends your string.
Then you increment by two your pointer value (which is the address of the first byte) then decrement by two. So basically you do nothing. Thus when calling printf() src-2 will point to %d\n and the null terminating byte will make it exactly pass %d\n.
So at the end of the day what you are doing is:
printf("%d\n", 300); Hence the 300 output.

C: sizeof() related doubts?

#include <stdio.h>
#include <string.h>
main()
{
printf("%d \n ",sizeof(' '));
printf("%d ",sizeof(""));
}
output:
4
1
Why o/p is coming 4 for 1st printf and moreover if i am giving it as '' it is showing error as error: empty character constant but for double quote blank i.e. without any space is fine no error?
The ' ' is example of integer character constant, which has type int (it's not converted, it has such type). Second is "" character literal, which contains only one character i.e. null character and since sizeof(char) is guaranteed to be 1, the size of whole array is 1 as well.
' ' is converted to an integer character constant(hence 4 bytes on your machine), "" is empty character array, which is still 1 byte('\0') terminated.
Here in below check the difference
#include<stdio.h>
int main()
{
char a= 'b';
printf("%d %d %d", sizeof(a),sizeof('b'), sizeof("a"));
return 0;
}
here a is defined as character whose data type size is 1 byte.
But 'b' is character constant. A character constant is an integer,The value of a character constant is the numeric value of the character in the machine's character set. sizeof char constant is nothing but int which is 4 byte
this is string literals "a" ---> array character whose size is number of character + \0 (NULL). Here its 2
This is answered in Size of character ('a') in C/C++
In C, the type of a character constant like 'a' is actually an int, with size of 4 (or some other implementation-dependent value). In C++, the type is char, with size of 1. This is one of many small differences between the two languages.
The 'space', or 'any single character', is actually of type integer, equal to the ASCII value of that character. So it's size will be 4 bytes.
If you create a character variable and store a character in it, then only it is stored in 1 byte memory.
char ch;
ch=' ';
printf("%d",sizeof(ch));
//outputs 1
For anything to be a string, it must be terminated with a null character represented as '\0'.
If we write a string "hello", it is actually stored as 'h' 'e' 'l' 'l' 'o' '\0', so that the system knows string ends after the 'o' in "hello" and it stops reading when null character comes. The length of this string is still 5 if you use strlen() function but actually the sizeof(string) is 6 bytes.
When we create an empty string, like "", it's length is 0 but size is 1 byte as it must terminate where it starts, i.e. at 0th character.
Hence an empty string consists of only one character, that is null character, giving size 1 byte.
From C Traps and Pitfalls
Single and double quotes mean very different things in C.
A Character enclosed in single quotes is just a another way of writing the integer that corresponds to the given character in ASCII implementation. Thus ' ' means exactly same thing as 32.
On the other hand, A string enclosed in double quotes is a short-hand way of writing a pointer to the initial character of a nameless array that has been initialized with the characters between the quotes and an extra character whose binary value is zero. Thus writing "" that is empty string still has '\0' character whose size is one.
because of in 1st case there is a character that's why sizeof operator is take the SACII value of character and it's take as an integer so in 1st case it will give you 4.
in 2nd case sizeof operator take as a string and in string there is no data means it's understood NULL string , so NULL string size is 1, that's why it will give you answer as a 1.

Doesn't %[] or %[^] specifier in scanf(),sscanf() or fscanf() store the input in null-terminated character array?

Here's what the Beez C guide (LINK) tells about the %[] format specifier:
It allows you to specify a set of characters to be stored away (likely in an array of chars). Conversion stops when a character that is not in the set is matched.
I would appreciate if you can clarify some basic questions that arise from this premise:
1) Are the input fetched by those two format specifiers stored in the arguments(of type char*) as a character array or a character array with a \0 terminating character (string)? If not a string, how to make it store as a string , in cases like the program below where we want to fetch a sequence of characters as a string and stop when a particular character (in the negated character set) is encountered?
2) My program seems to suggest that processing stops for the %[^|] specifier when the negated character | is encountered.But when it starts again for the next format specifier,does it start from the negated character where it had stopped earlier?In my program I intend to ignore the | hence I used %*c.But I tested and found that if I use %c and an additional argument of type char,then the character | is indeed stored in that argument.
3) And lastly but crucially for me,what is the difference between passing a character array for a %s format specifier in printf() and a string(NULL terminated character array)?In my other program titled character array vs string,I've passed a character array(not NULL terminated) for a %s format specifier in printf() and it gets printed just as a string would.What is the difference?
//Program to illustrate %[^] specifier
#include<stdio.h>
int main()
{
char *ptr="fruit|apple|lemon",type[10],fruit1[10],fruit2[10];
sscanf(ptr, "%[^|]%*c%[^|]%*c%s", type,fruit1, fruit2);
printf("%s,%s,%s",type,fruit1,fruit2);
}
//character array vs string
#include<stdio.h>
int main()
{
char test[10]={'J','O','N'};
printf("%s",test);
}
Output JON
//Using %c instead of %*c
#include<stdio.h>
int main()
{
char *ptr="fruit|apple|lemon",type[10],fruit1[10],fruit2[10],char_var;
sscanf(ptr, "%[^|]%c%[^|]%*c%s", type,&char_var,fruit1, fruit2);
printf("%s,%s,%s,and the character is %c",type,fruit1,fruit2,char_var);
}
Output fruit,apple,lemon,and the character is |
It is null terminated. From sscanf():
The conversion specifiers s and [ always store the null terminator in addition to the matched characters. The size of the destination array must be at least one greater than the specified field width.
The excluded characters are unconsumed by the scan set and remain to be processed. An alternative format specifier:
if (sscanf(ptr, "%9[^|]|%9[^|]|%9s", type,fruit1, fruit2) == 3)
The array is actually null terminated as remaining elements will be zero initialized:
char test[10]={'J','O','N' /*,0,0,0,0,0,0,0*/ };
If it was not null terminated then it would keep printing until a null character was found somewhere in memory, possibly overruning the end of the array causing undefined behaviour. It is possible to print a non-null terminated array:
char buf[] = { 'a', 'b', 'c' };
printf("%.*s", 3, buf);
1) Are the input fetched by those two format specifiers stored in the
arguments(of type char*) as a character array or a character array
with a \0 terminating character (string)? If not a string, how to make
it store as a string , in cases like the program below where we want
to fetch a sequence of characters as a string and stop when a
particular character (in the negated character set) is encountered?
They're stored in ASCIIZ format - with a NUL/'\0' terminator.
2) My program seems to suggest that processing stops for the %[^|]
specifier when the negated character | is encountered.But when it
starts again for the next format specifier,does it start from the
negated character where it had stopped earlier?In my program I intend
to ignore the | hence I used %*c.But I tested and found that if I use
%c and an additional argument of type char,then the character | is
indeed stored in that argument.
It shouldn't consume the next character. Show us your code or it didn't happen ;-P.
3) And lastly but crucially for me,what is the difference between
passing a character array for a %s format specifier in printf() and a
string(NULL terminated character array)?In my other program titled
character array vs string,I've passed a character array(not NULL
terminated) for a %s format specifier in printf() and it gets printed
just as a string would.What is the difference?
(edit: the following addresses the question above, which talks about array behaviours generally and is broader than the code snippet in the question that specifically posed the case char[10] = "abcd"; and is safe)
%s must be passed a pointer to a ASCIIZ text... even if that text is explicitly in a char array, it's the mandatory presence of the NUL terminator that defines the textual content and not the array length. You must NUL terminate your character array or you have undefined behaviour. You might get away with it sometimes - e.g. strncpy into the array will NUL terminate it if-and-only-if there's room to do so, and static arrays start with all-0 content so if you only overwrite before the final character you'll have a NUL, your char[10] example happens to have elements for which values aren't specified populated with NULs, but you should generally take responsibility for ensuring that something is ensuring NUL termination.

Resources