C: sizeof() related doubts? - c

#include <stdio.h>
#include <string.h>
main()
{
printf("%d \n ",sizeof(' '));
printf("%d ",sizeof(""));
}
output:
4
1
Why o/p is coming 4 for 1st printf and moreover if i am giving it as '' it is showing error as error: empty character constant but for double quote blank i.e. without any space is fine no error?

The ' ' is example of integer character constant, which has type int (it's not converted, it has such type). Second is "" character literal, which contains only one character i.e. null character and since sizeof(char) is guaranteed to be 1, the size of whole array is 1 as well.

' ' is converted to an integer character constant(hence 4 bytes on your machine), "" is empty character array, which is still 1 byte('\0') terminated.

Here in below check the difference
#include<stdio.h>
int main()
{
char a= 'b';
printf("%d %d %d", sizeof(a),sizeof('b'), sizeof("a"));
return 0;
}
here a is defined as character whose data type size is 1 byte.
But 'b' is character constant. A character constant is an integer,The value of a character constant is the numeric value of the character in the machine's character set. sizeof char constant is nothing but int which is 4 byte
this is string literals "a" ---> array character whose size is number of character + \0 (NULL). Here its 2

This is answered in Size of character ('a') in C/C++
In C, the type of a character constant like 'a' is actually an int, with size of 4 (or some other implementation-dependent value). In C++, the type is char, with size of 1. This is one of many small differences between the two languages.

The 'space', or 'any single character', is actually of type integer, equal to the ASCII value of that character. So it's size will be 4 bytes.
If you create a character variable and store a character in it, then only it is stored in 1 byte memory.
char ch;
ch=' ';
printf("%d",sizeof(ch));
//outputs 1
For anything to be a string, it must be terminated with a null character represented as '\0'.
If we write a string "hello", it is actually stored as 'h' 'e' 'l' 'l' 'o' '\0', so that the system knows string ends after the 'o' in "hello" and it stops reading when null character comes. The length of this string is still 5 if you use strlen() function but actually the sizeof(string) is 6 bytes.
When we create an empty string, like "", it's length is 0 but size is 1 byte as it must terminate where it starts, i.e. at 0th character.
Hence an empty string consists of only one character, that is null character, giving size 1 byte.

From C Traps and Pitfalls
Single and double quotes mean very different things in C.
A Character enclosed in single quotes is just a another way of writing the integer that corresponds to the given character in ASCII implementation. Thus ' ' means exactly same thing as 32.
On the other hand, A string enclosed in double quotes is a short-hand way of writing a pointer to the initial character of a nameless array that has been initialized with the characters between the quotes and an extra character whose binary value is zero. Thus writing "" that is empty string still has '\0' character whose size is one.

because of in 1st case there is a character that's why sizeof operator is take the SACII value of character and it's take as an integer so in 1st case it will give you 4.
in 2nd case sizeof operator take as a string and in string there is no data means it's understood NULL string , so NULL string size is 1, that's why it will give you answer as a 1.

Related

Why the strlen() function doesn't return the correct length for a hex string?

I have a hex string for example \xF5\x17\x30\x91\x00\xA1\xC9\x00\xDF\xFF, when trying to use strlen() function to get the length of that hex string it returns 4!
const char string_[] = { "\xF5\x17\x30\x91\x00\xA1\xC9\x00\xDF\xFF" };
unsigned int string_length = strlen(string_);
printf("%d", string_length); // the result: 4
Is the strlen() function dealing with that hex as a string, or is something unclear to me?
For string functions in the C standard library, a character with value zero, also called a null character, marks the end of a string. Your string contains \x00, which designates a null character, so the string ends there. There are four non-null characters before it, so strlen returns four.
C 2018 7.1.1 1 says:
A string is a contiguous sequence of characters terminated by and including the first null character… The length of a string is the number of bytes preceding the null character…
C 2018 7.24.6.3 2 says:
The strlen function computes the length of the string pointed to by s [its first argument].
You could compute the size of your array as sizeof string_ (because it is an array of char) or sizeof string_ / sizeof *string_ (to compute the number of elements regardless of type), but this will include a terminating null character because defining an array with [] and letting the length be computed from a string literal initializer includes the terminating null character of the string literal. You may need to hard-code the length of the array, possibly using #define to define a preprocessor macro, and use that length in the array definition and in other places where the length is needed.
It is because you have zero at index [4]
string_[0] == 0xF5
string_[1] == 0x17
string_[2] == 0x30
string_[3] == 0x91
string_[4] == 0
...
"\xf5" puts char having integer value 0xf5 at position [0]
To see it as a string you need to escape the \ character
const char string_[] = "\\xF5\\x17\\x30\\x91\\x00\\xA1\\xC9\\x00\\xDF\\xFF";
At compile time, your "string" appears as consecutive hex values expressed in C syntax inside a pair of quotation marks.
strlen() is a run time function that scans through a series of bytes, looking for the first instance of a zero-value byte.
It's good to understand the difference between "compile time" and "run time".

Printf outputs characters beyond the specified length of the array

I tried this chunk of code:
char string_one[8], string_two[8];
printf("&string_one == %p\n", &string_one);
printf("&string_two == %p\n", &string_two);
strcpy(string_one, "Hello!");
strcpy(string_two, "Long string");
printf("string_one == %s\n", string_one);
printf("string_two == %s\n", string_two);
And got this output:
&string_one == 0x7fff3f871524
&string_two == 0x7fff3f87151c
string_one == ing
string_two == Long string
Since the second string length value is greater than the specified size of the respective array, the characters which subscript values are greater than the specified array size are stored in the next bytes, which belong to the first array as the addresses show. Obviously the first string is overwritten.
There is no way the second array can hold the whole string, it is too big. Nevertheless, the output prints the whole string.
I speculated for a while and came to a conclusion that the printf() function keeps outputting characters from the next bytes until it comes across a string terminator '\0'. I did not find any confirmation for my pondering, so the question is are these speculations correct?
From the C Standard (5.2.1 Character sets)
2 In a character constant or string literal, members of the execution
character set shall be represented by corresponding members of the
source character set or by escape sequences consisting of the
backslash \ followed by one or more characters. A byte with all bits
set to 0, called the null character, shall exist in the basic
execution character set; it is used to terminate a character string.
And (7.21.6.1 The fprintf function)
8 The conversion specifiers and their meanings are:
s If no l length modifier is present, the argument shall be a pointer
to the initial element of an array of character type.273) Characters
from the array are written up to (but not including) the terminating
null character.
My compiler(GCC) said:
warning: ‘__builtin_memcpy’ writing 12 bytes into a region of size 8 overflows the destination [-Wstringop-overflow=]
strcpy(string_two, "Long string");
And just to show how optimizations will take everything that you think you know and turn it on its head, here's what happens if you compile this on a 64-bit PowerPC Power-9 (aka not x86) with gcc -O3 -flto
$ ./char-array-overlap
&string_one == 0x7fffc502bef0
&string_two == 0x7fffc502bef8
string_one == Hello!
string_two == Long string
Because if you look at the machine code it never executes strcpy at all.

Simple single char array encryption needs an artificially long array to work?

Running a simple encryption on a single char array. It doesn't seem to work when the array size is less than or equal to 1, even though only a single char is changing.
The below works because yesCrypto[10] is set to 10 (or > 1).
char noCrypto[] = "H"; //sets an array to hold unencrypted H
char yesCrypto[10]; //sets array to hold encrypted H
yesCrypto[0]=noCrypto[0]+1;
//takes 'H' from noCrypto and turns it into an 'I' and moves it into yesCrypto.
printf("Encrypted string is '%s'\n", yesCrypto);
//prints Encrypted version of 'H', 'I'
The below does not work because yesCrypto[0] is set to 0, also does not work when set to 1.
char noCrypto[] = "H"; //sets an array to hold unencrypted H
char yesCrypto[1]; //sets array to hold encrypted H
yesCrypto[0]=noCrypto[0]+1;
//takes 'H' from noCrypto and turns it into an 'I' and moves it into yesCrypto.
printf("Encrypted string is '%s'\n", yesCrypto);
//prints 'IH'
Side question: why is it printing IH when it is not working probably.
Code is attempting to print a character array that is not a string using "%s".
yesCrypto[] is not certainly null character terminated.
char yesCrypto[10];
yesCrypto[0] = noCrypto[0]+1;
printf("Encrypted string is '%s'\n", yesCrypto); // bad
Instead, limit printing or append a null character.
// 1 is the maximum number of characters to print
printf("Encrypted string is '%.*s'\n", 1, yesCrypto);
// or
yesCrypto[1] = '\0';
printf("Encrypted string is '%s'\n", yesCrypto);
OP's 2nd code is just bad as object arrays of length 0 lack defined behavior.
// bad
char yesCrypto[0];
OP's edited post uses char yesCrypto[1];. In that case use
yesCrypto[0] = noCrypto[0]+1;
printf("Encrypted string is '%.*s'\n", 1, yesCrypto);
// or
printf("Encrypted character is '%c'\n", yesCrypto[0]);
Fundamentally, printing encrypted data as a string is a problem as the encrypted character array may contain a null character in numerous places and a string requires a null character and ends with the first one.
In the first case, you're supplying an array (as an argument to %s) which is not null-terminated.
Quoting C11, chapter §7.21.6.1,
s
If no l length modifier is present, the argument shall be a pointer to the initial
element of an array of character type.280) Characters from the array are
written up to (but not including) the terminating null character. If the
precision is specified, no more than that many bytes are written. If the
precision is not specified or is greater than the size of the array, the array shall
contain a null character.
In this case, yesCrypto being an automatic local array and left uninitialized, the contents are indeterminate, so there's no guarantee of a null being present in the array. So the usage causes undefined behavior.
What you're seeing in the second case is undefined behavior, too.
Quoting C11, chapter §6.7.6.2
In addition to optional type qualifiers and the keyword static, the [ and ] may delimit
an expression or *. If they delimit an expression (which specifies the size of an array), the
expression shall have an integer type. If the expression is a constant expression, it shall
have a value greater than zero. [...]
So, the later code (containing char yesCrypto[0];) has Constraints violations, it invokes UB.
A note on why this might not produce a compilation error:
gcc does have an extension which supports zer-length arrays, but the use case is very specific and since C99, the "flexible array member" is a standadized choice over this extension.
Finally, for
...also does not work when set to 1....
will lack the space for a null-terminator, raising the same issue as in the very first case. To put it in simple words, to make a char array behave like a string containing n elements, you need
size of the array to be n+1
index n to contain a null character ('\0').

What if a null character is present in the middle of a string?

I understand that the end of a string is indicated by a null character, but i cannot understand the output of the following code.
#include <stdio.h>
#include <string.h>
int
main(void)
{
char s[] = "Hello\0Hi";
printf("%d %d", strlen(s), sizeof(s));
}
OUTPUT: 5 9
If strlen() detects the end of the string at the end of o, then why doesn't sizeof() do the same thing? Even if it doesn't do the same thing, isn't '\0' A null character (i.e, only one character), so shouldn't the answer be 8?
The sizeof operator does not give you the length of a string but instead the size of the type of it's operand. Since in your code the operand is an array, sizeof is giving you the size of the array including both null characters.
If it were like this
const char *string = "This is a large text\0This is another string";
printf("%zu %zu\n", strlen(string), sizeof(string));
the result will be very different because string is a pointer and not an array.
Note: Use the "%zu" specifier for size_t which is what strlen() returns, and is the type of the value given by sizeof.
strlen() doesn't care about the actual size of the string. It looks for a null byte and stops when it sees the first null byte.
But sizeof() operator knows the total size. It doesn't care about what bytes you are in the string literal. You might as well have all null bytes in the string and sizeof() would still give the correct size of the array (strlen() would retrun 0 in that case).
They are not comparable; they do different things.
If strlen() detects the end of the string at the end of o, then why doesn't sizeof() do the same thing?
strlen only works for string (character array), whereas sizeof works for every data type. sizeof calculates the exact memory spaces for any given data type; whereas strlen provides the length of a string (NOT including the NULL terminator \0). So in normal cases, this is true for a typical character array s:
char s[] = "Hello";
strlen( s ) + 1 = sizeof( s ); // +1 for the \0
In your case it's different because you have a NULL terminator in the middle of character array s:
char s[] = "Hello\0Hi";
Here, strlen would detect the first \0 and gives the length as 5. The sizeof, however, will calculate the total number of spaces enough to hold the character arrays, including two \0, so that's why it gives 9 as the second output.
strlen() computes the length of the string. This is done by returning the amount of characters before (and not including) the '\0' character. (See the manual page below.)
sizeof() returns the amount of bytes of the given variable (or data-type). Note that your example "Hello\0Hi" has 9 characters. But you don't seem to understand where character 9 comes from in your question. Let me explain the given string first. Your example string is:
"Hello\0Hi"
This can be written as the following array:
['H', 'e', 'l', 'l', 'o', '\0', 'H', 'i', '\0']
Note the last '\0' character. When using the string quotes the compiler ends the string with an '\0' character. This means "" also is ['\0'] and thus has 1 element.
BEWARE that sizeof() does NOT return the number of elements in the array. It returns the amount of bytes. char is 1 byte and therefor sizeof() does returns the number of elements. But if you used any other datatype, for example if you would call sizeof() on [1, 2, 3, 4] it would return 16. Since int is 4 bytes and the array has 4 elements.
BEWARE that passing an array as parameter will only passes the pointer. If you would pass s to another function and call sizeof() it will return the size of the pointer, which is the same as sizeof(void *). This is a fixed length independent from the array.
STRLEN(3) BSD Library Functions Manual STRLEN(3)
NAME
strlen, strnlen -- find length of string
LIBRARY
Standard C Library (libc, -lc)
SYNOPSIS
#include <string.h>
size_t
strlen(const char *s);
size_t
strnlen(const char *s, size_t maxlen);
DESCRIPTION
The strlen() function computes the length of the string s. The strnlen()
function attempts to compute the length of s, but never scans beyond the
first maxlen bytes of s.
RETURN VALUES
The strlen() function returns the number of characters that precede the
terminating NUL character. The strnlen() function returns either the
same result as strlen() or maxlen, whichever is smaller.
SEE ALSO
string(3), wcslen(3), wcswidth(3)
STANDARDS
The strlen() function conforms to ISO/IEC 9899:1990 (``ISO C90'').
The strnlen() function conforms to IEEE Std 1003.1-2008 (``POSIX.1'').
BSD February 28, 2009 BSD
As name literal itself implies string literal is a sequence of characters enclosed in double quotes. Implicitly this sequence of characters is appended by a terminating zero.
So any character enclosed in the double quotes is a part of the string literal.
When a string literal is used to initialize a character array all its characters including the terminating zero serve as initializers of the corresponding elements of the character array.
Each string literal in turn has type of a character array.
For example this string literal "Hello\0Hi" in C has type char[9]: 8 characters enclosed in the quotes plus the implicit terminating zero.
So in memory this string literal is stored like
{ 'H', 'e', 'l', 'l', 'o', '\0', 'H', 'i', '\0' }
Operator sizeof returns the number of bytes occupied by an object. So for the string literal above the operator sizeof will return value 9- it is the number of bytes occupied by the literal in memory.
If you wrote "Hello\0Hi" then the compiler may not itself just remove this part Hi from the literal. It has to store it in memory along with other characters of the literal enclosed in quotes.
The sizeof operator returns the size in bytes of any object in C not only of character arrays.
In general character arrays can store any raw data for example some binary data read from a binary file. In this case this data is not considered by the user and by the program like strings and as result are processed differently than strings.
Standard C function strlen is specially written for character arrays that to find the length of a stored string in a character array. It does not know what data are stored in an array and how they were written in it. All what it does is searches the first zero character in a character array and returns the number of characters in the character array before the zero character.
You can store in one character array several strings sequentially. For example
char s[12];
strcpy( s, "Hello" );
strcpy( s + sizeof( "Hello" ), "World" );
puts( s ); // outputs "Hello"
puts( s + sizeof( "Hello" ) ); // outputs "World"
If you would define a two dimensional array like this
char t[2][6] = { "Hello", "World" };
then in memory it will be stored the same way as the one-dimensional array above. So you can write
char *s = ( char * )t;
puts( s ); // outputs "Hello"
puts( s + sizeof( "Hello" ) ); // outputs "World"
Another example. Standard C function strtok can split one string stored in a character array to several strings substituting the specified by the user delimiters with zero bytes. As result the character array will contain several strings.
For example
char s[] = "Hello World";
printf( "%zu\n", sizeof( s ) ); // outputs 12
strtok( s, " " );
puts( s ); // outputs "Hello"
puts( s + sizeof( "Hello" ) ); // outputs "World"
printf( "%zu\n", sizeof( s ) ); // outputs 12
The last printf statement will output the same value equal to 12 because the array occupies the same number of bytes. Simply one byte in the memory allocated for the array was changed from ' ' to '\0'.
Character arrays in C and pointers to character array are not same thing. Though you can print addresses and get same value.
An array in C is made up of following things.
Size of array
Its address / pointer
Homogenous Type of elements
Where a pointer is made up of just:
Address
Type information
char s[] = "Hello\0Hi";
printf("%d %d", strlen(s), sizeof(s));
Here you are calculating the size of array (which is s variable) using sizeof() which is 9.
But if you treat this character array as string than array(string now) looses its size information and become just a pointer to a character. Same thing happens when you try to print character array using %s.
So strlen() and %s treat character array as string and it utilize its address information only. You can guess, strlen() keep incrementing the pointer to calculate the length up-to first null character. When it encounter a null character you get a length up-to that point.
So the strlen() gives you 5 and do not count null character.
So sizeof() operator tells only the size of its operand. If you give it array variable than it utilize the array size information and tells the size regardless of null character position.
But if you give sizeof() the pointer to array of characters than it finds pointer without the size information and prints the size of pointer which is usually 64bit/8byte on 64bit systems or 32bit/4bytes on 32bit systems.
One more thing if you initialize your character arrays using double quotes like "Hello" than C adds a null character otherwise it does not in case of {'H','e','l','l','o'}.
Using gcc compiler. Hope it will help only to understand.

Confused about C string constants

When I came across this C language implementation of Porters Stemming algorithm I found a C-ism I was confused about.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void test( char *s )
{
int len = s[0];
printf("len= %i\n", len );
printf("s[len] = %c\n", s[len] );
}
int main()
{
test("\07" "abcdefg");
return 0;
}
and output:
len = 7
s[len] = g
However, when I input
test("\08" "abcdefgh");
or any string constant that is longer than 7 with the corresponding length in the first pair of parenthesis ( i.e. test("\09" "abcdefghi"); the output is
len = 0
s[len] =
But any input like test("\01" "abcdefgh"); prints out the character in that position ( if we call the first character position 1 and not 0 for the moment )
It appears if test( char *s ) reads the number in the first pair of parenthesis ( how it does this I am not sure since I thought s[0] would be able to only read a single char, i.e. the '\' ) and prints the last character at that index + 1 of the string constant in the second pair of parenthesis.
My question is this: It seems as if we are passing two string constants into test( char *s ). What exactly is happening here, meaning, how does the compiler seem to "split" up the string over two pairs of parenthesis? Another question one might have is, is a string of the form "blah" "abcdefg" one consecutive block of memory? It may be the case that I have overlooked something elementary, but even so I would like to know what I overlooked. I know this is a basic concept but I could not find a clear example or situation on the web that explains this and in all honesty I don't follow the output. Any helpful comments are welcomed.
There are at least three things going on here:
Literal strings juxtaposed against one another are concatenated by the compiler. "a" "b" is exactly the same as "ab".
The backslash is an escape character, which means it is not copied literally into the resulting string. The notation \01 means "the character with ASCII value 1".
The notation \0... means an octal character constant. Octal numbers are base 8, made up from digits that range from 0 through 7 inclusive. 8 is not a valid octal constant, so "\08" does not follow "\07".
The problem is not in the length of the string, but in the \o syntax for specifying non-printable values in string literals. \o, \oo, and \ooo denote octal constants, i.e. a single character whose value is written in base 8. Since 08 in \08 doesn't represent a valid base 8 number, it is interpreted as \0 followed by the ASCII character 8.
To fix the problem, represent 8 as \10 or \010:
test("\007" "abcdefg");
test("\010" "abcdefgh");
...or switch to hexadecimal, where the \x prefix makes the base more explicit to the casual reader:
test("\x07" "abcdefg");
test("\x08" "abcdefgh");
test("\x09" "abcdefghi");
test("\x0a" "abcdefghij");
...
\number in a character or string literal is means the character whose code is the value number. number is interpreted in octal, so the first non-octal digit terminates the number. So "\07" is a one-character string containing the character with code 7, but \08 is a two-character string containing the character with code 0 followed by the digit 8.
Additionally, code 0 the null terminator that's used in C to indicate the end of the string. So that second string ends at the beginning, because its first byte is the terminator. This why the length of the string in your second example is 0.
When two or more string literals are adjacent (separated only by white-space), the compiler will join them into a single string. Therefore "\07" "abcdefg" is equivalent to "\07abcdefg".
"\07" is an octal escape. An octal escape ends after three digits or with first non-octal character. So, when you enter "\08", 8 is a non octal character therefore escape ends and 0 is stored at s[0].
Now, len is 0 and printing s[len] will try to print the character at s[0] which has a non printable ASCII code (Only character above ASCII value above 32 are printable).

Resources