When I ran this program it gave an output of
1, 4, 4
Why does sizeof('A') gives 4 bytes? Is 'A' treated as integer? If so, then why?
#include<stdio.h>
int main()
{
char ch = 'A';
printf("%d, %d, %d", sizeof(ch), sizeof('A'), sizeof(3.14f));
return 0;
}
Moreover, when I replace
printf("%d, %d, %d", sizeof(ch), sizeof('A'), sizeof(3.14f));
with,
printf("%d, %d, %d", sizeof(ch), sizeof("A"), sizeof(3.14f));
It gives the output
1, 2, 4
which is even more confounding.
P.S.: I used compileonline.com to test this code.
In C, the type of 'A' is int, which explains why sizeof('A') is 4 (since evidently your platform has 32-bit int). For more information, see Size of character ('a') in C/C++
When compiled as C++, the first program prints 1 1 4.
"A" is a string literal consisting of the letter A followed by the NUL character. Since it's two characters long, sizeof("A") is 2.
1.sizeof operator provide the size of input argument.
2.Size of a vaiable is machine(complier) dependent.In you case it is 32 bit.
3.sizeof(ch)=1 because you declare as char.
4.sizeof('A')=4 because compiler treats the literal constant as an integer.
5.sizeof("A")=2 because its a string of 2 bye.In the case string,if u write a single character also compiler insert null character at the end.so its size is 2 bytes.
4.sizeof(3.13f)=4 because its size of float is 4 bytes
I generally suggest to use sizeof on types or on variables. Using sizeof on literal constants seems confusing (except perhaps on literal strings, to compute 1 + their string length at compile time).
The literal 'A' is in C an int whose size is 4 on your machine.
The literal string "A" is exactly like
const char literal_A_string[] = {'A', (char)0};
whose size is obviously 2 bytes (because each literal string has a terminal null byte appended).
Related
I tried this chunk of code:
char string_one[8], string_two[8];
printf("&string_one == %p\n", &string_one);
printf("&string_two == %p\n", &string_two);
strcpy(string_one, "Hello!");
strcpy(string_two, "Long string");
printf("string_one == %s\n", string_one);
printf("string_two == %s\n", string_two);
And got this output:
&string_one == 0x7fff3f871524
&string_two == 0x7fff3f87151c
string_one == ing
string_two == Long string
Since the second string length value is greater than the specified size of the respective array, the characters which subscript values are greater than the specified array size are stored in the next bytes, which belong to the first array as the addresses show. Obviously the first string is overwritten.
There is no way the second array can hold the whole string, it is too big. Nevertheless, the output prints the whole string.
I speculated for a while and came to a conclusion that the printf() function keeps outputting characters from the next bytes until it comes across a string terminator '\0'. I did not find any confirmation for my pondering, so the question is are these speculations correct?
From the C Standard (5.2.1 Character sets)
2 In a character constant or string literal, members of the execution
character set shall be represented by corresponding members of the
source character set or by escape sequences consisting of the
backslash \ followed by one or more characters. A byte with all bits
set to 0, called the null character, shall exist in the basic
execution character set; it is used to terminate a character string.
And (7.21.6.1 The fprintf function)
8 The conversion specifiers and their meanings are:
s If no l length modifier is present, the argument shall be a pointer
to the initial element of an array of character type.273) Characters
from the array are written up to (but not including) the terminating
null character.
My compiler(GCC) said:
warning: ‘__builtin_memcpy’ writing 12 bytes into a region of size 8 overflows the destination [-Wstringop-overflow=]
strcpy(string_two, "Long string");
And just to show how optimizations will take everything that you think you know and turn it on its head, here's what happens if you compile this on a 64-bit PowerPC Power-9 (aka not x86) with gcc -O3 -flto
$ ./char-array-overlap
&string_one == 0x7fffc502bef0
&string_two == 0x7fffc502bef8
string_one == Hello!
string_two == Long string
Because if you look at the machine code it never executes strcpy at all.
#include <stdio.h>
#include <string.h>
main()
{
printf("%d \n ",sizeof(' '));
printf("%d ",sizeof(""));
}
output:
4
1
Why o/p is coming 4 for 1st printf and moreover if i am giving it as '' it is showing error as error: empty character constant but for double quote blank i.e. without any space is fine no error?
The ' ' is example of integer character constant, which has type int (it's not converted, it has such type). Second is "" character literal, which contains only one character i.e. null character and since sizeof(char) is guaranteed to be 1, the size of whole array is 1 as well.
' ' is converted to an integer character constant(hence 4 bytes on your machine), "" is empty character array, which is still 1 byte('\0') terminated.
Here in below check the difference
#include<stdio.h>
int main()
{
char a= 'b';
printf("%d %d %d", sizeof(a),sizeof('b'), sizeof("a"));
return 0;
}
here a is defined as character whose data type size is 1 byte.
But 'b' is character constant. A character constant is an integer,The value of a character constant is the numeric value of the character in the machine's character set. sizeof char constant is nothing but int which is 4 byte
this is string literals "a" ---> array character whose size is number of character + \0 (NULL). Here its 2
This is answered in Size of character ('a') in C/C++
In C, the type of a character constant like 'a' is actually an int, with size of 4 (or some other implementation-dependent value). In C++, the type is char, with size of 1. This is one of many small differences between the two languages.
The 'space', or 'any single character', is actually of type integer, equal to the ASCII value of that character. So it's size will be 4 bytes.
If you create a character variable and store a character in it, then only it is stored in 1 byte memory.
char ch;
ch=' ';
printf("%d",sizeof(ch));
//outputs 1
For anything to be a string, it must be terminated with a null character represented as '\0'.
If we write a string "hello", it is actually stored as 'h' 'e' 'l' 'l' 'o' '\0', so that the system knows string ends after the 'o' in "hello" and it stops reading when null character comes. The length of this string is still 5 if you use strlen() function but actually the sizeof(string) is 6 bytes.
When we create an empty string, like "", it's length is 0 but size is 1 byte as it must terminate where it starts, i.e. at 0th character.
Hence an empty string consists of only one character, that is null character, giving size 1 byte.
From C Traps and Pitfalls
Single and double quotes mean very different things in C.
A Character enclosed in single quotes is just a another way of writing the integer that corresponds to the given character in ASCII implementation. Thus ' ' means exactly same thing as 32.
On the other hand, A string enclosed in double quotes is a short-hand way of writing a pointer to the initial character of a nameless array that has been initialized with the characters between the quotes and an extra character whose binary value is zero. Thus writing "" that is empty string still has '\0' character whose size is one.
because of in 1st case there is a character that's why sizeof operator is take the SACII value of character and it's take as an integer so in 1st case it will give you 4.
in 2nd case sizeof operator take as a string and in string there is no data means it's understood NULL string , so NULL string size is 1, that's why it will give you answer as a 1.
When I came across this C language implementation of Porters Stemming algorithm I found a C-ism I was confused about.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void test( char *s )
{
int len = s[0];
printf("len= %i\n", len );
printf("s[len] = %c\n", s[len] );
}
int main()
{
test("\07" "abcdefg");
return 0;
}
and output:
len = 7
s[len] = g
However, when I input
test("\08" "abcdefgh");
or any string constant that is longer than 7 with the corresponding length in the first pair of parenthesis ( i.e. test("\09" "abcdefghi"); the output is
len = 0
s[len] =
But any input like test("\01" "abcdefgh"); prints out the character in that position ( if we call the first character position 1 and not 0 for the moment )
It appears if test( char *s ) reads the number in the first pair of parenthesis ( how it does this I am not sure since I thought s[0] would be able to only read a single char, i.e. the '\' ) and prints the last character at that index + 1 of the string constant in the second pair of parenthesis.
My question is this: It seems as if we are passing two string constants into test( char *s ). What exactly is happening here, meaning, how does the compiler seem to "split" up the string over two pairs of parenthesis? Another question one might have is, is a string of the form "blah" "abcdefg" one consecutive block of memory? It may be the case that I have overlooked something elementary, but even so I would like to know what I overlooked. I know this is a basic concept but I could not find a clear example or situation on the web that explains this and in all honesty I don't follow the output. Any helpful comments are welcomed.
There are at least three things going on here:
Literal strings juxtaposed against one another are concatenated by the compiler. "a" "b" is exactly the same as "ab".
The backslash is an escape character, which means it is not copied literally into the resulting string. The notation \01 means "the character with ASCII value 1".
The notation \0... means an octal character constant. Octal numbers are base 8, made up from digits that range from 0 through 7 inclusive. 8 is not a valid octal constant, so "\08" does not follow "\07".
The problem is not in the length of the string, but in the \o syntax for specifying non-printable values in string literals. \o, \oo, and \ooo denote octal constants, i.e. a single character whose value is written in base 8. Since 08 in \08 doesn't represent a valid base 8 number, it is interpreted as \0 followed by the ASCII character 8.
To fix the problem, represent 8 as \10 or \010:
test("\007" "abcdefg");
test("\010" "abcdefgh");
...or switch to hexadecimal, where the \x prefix makes the base more explicit to the casual reader:
test("\x07" "abcdefg");
test("\x08" "abcdefgh");
test("\x09" "abcdefghi");
test("\x0a" "abcdefghij");
...
\number in a character or string literal is means the character whose code is the value number. number is interpreted in octal, so the first non-octal digit terminates the number. So "\07" is a one-character string containing the character with code 7, but \08 is a two-character string containing the character with code 0 followed by the digit 8.
Additionally, code 0 the null terminator that's used in C to indicate the end of the string. So that second string ends at the beginning, because its first byte is the terminator. This why the length of the string in your second example is 0.
When two or more string literals are adjacent (separated only by white-space), the compiler will join them into a single string. Therefore "\07" "abcdefg" is equivalent to "\07abcdefg".
"\07" is an octal escape. An octal escape ends after three digits or with first non-octal character. So, when you enter "\08", 8 is a non octal character therefore escape ends and 0 is stored at s[0].
Now, len is 0 and printing s[len] will try to print the character at s[0] which has a non printable ASCII code (Only character above ASCII value above 32 are printable).
As per my code, I assume each greek character is stored in 2bytes.
sizeof returns the size of each character as 4 (i.e the sizeof int)
How does strlen return 16 ? [Making me think each character occupies 2 bytes] (Shouldn't it be 4*8 = 32 ? Since it counts the number of bytes.)
Also, how does printf("%c",bigString[i]); print each character properly? Shouldn't it read 1 byte (a char) and then display because of %c, why is the greek character not split in this case.
strcpy(bigString,"ειδικούς");//greek
sLen = strlen(bigString);
printf("Size is %d\n ",sizeof('ε')); //printing for each character similarly
printf("%s is of length %d\n",bigString,sLen);
int k1 = 0 ,k2 = sLen - 2;
for(i=0;i<sLen;i++)
printf("%c",bigString[i]);
Output:
Size is 4
ειδικούς is of length 16
ειδικούς
Character literals in C have type int, so sizeof('ε') is the same as sizeof(int). You're playing with fire in this statement, a bit. 'ε' will be a multicharacter literal, which isn't standard, and might come back to bite you. Be careful with using extensions like this one. Clang, for example, won't accept this program with that literal in it. GCC gives a warning, but will still compile it.
strlen returns 16, since that's the number of bytes in your string before the null-terminator. Your greek characters are all 16 bits long in UTF-8, so your string looks something like:
c0c0 c1c1 c2c2 c3c3 c4c4 c5c5 c6c6 c7c7 0
in memory, where c0c0, for example, is the two bytes of the first character. There is a single null-termination byte in your string.
The printf appears to work because your terminal is UTF-8 aware. You are printing each byte separately, but the terminal is interpreting the first two prints as a single character, and so on. If you change that printf call to:
printf("%d: %02x\n", i, (unsigned char)bigString[i]);
You'll see the byte-by-byte behaviour you're expecting.
Just wondering if someone could explain this to me? I have a program that asks a user to input a sentence. The program then reads the user input into an array and changes all of the vowels to a $ sign. My question is how does the for loop work? When initialising char c = 0; does that not mean that the array element is an int? I can't understand how it functions.
#include <stdio.h>
#include <string.h>
int main(void)
{
char words[50];
char c;
printf("Enter any number of words: \n");
fgets(words, 50, stdin);
for(c = 0; words[c] != '\n'; c++)
{
if(words[c] =='a'||words[c]=='e'||words[c]=='i'||words[c]=='o'||words[c]=='u')
{
words[c] = '$';
}
}
printf("%s", words);
return 0;
}
The code treats c as an integer variable (in C, char is basically a very narrow integer). In my view it would be cleaner to declare it as int (perhaps unsigned int). However, given that words is at most 50 characters long, char c works fine.
As to the loop:
c = 0 initializes c to zero.
words[c] != '\n' checks -- right at the start and also after each iteration -- whether the current character (words[c]) is a newline, and stops if it is.
c++ increments c after each iteration.
An array is like a building, you have several floors each one with a number.
In the floor 1 lives John.
In floor 2 lives Michael.
If you want to go to Jonh apartment you press 1 on the elevator. If you want to go to Michael's you press 2.
Thats the same with arrays. Every position in the array stores a value, in this case a letter.
Every position has a index associated. The first position is 0.
When you want to access a position of the array you use array[position] where position is the index in the array that you want to access.
The variable c holds the position to be acessed. When you do words[c] you're acctualy accessing the cnt position in the array and retrieving its value.
Supose the word is cool
word[1] results in o,
word[0] results in c
To determine the end of the word, a the caracter \n is set at the last position of the array.
Not really, char and int are implicitly converted.
You can look at a char in this case as a smaller int. sizeof(char) == 1, so it's smaller than an int, that's probably the reason it was used. Programatically, there's no difference in this case, unless the input string is very long, in which case the char will overflow before an int does.
Number literals (such as 0 in your case) are compatible with variables of type char. In fact, even a character literal enclosed in single quotes (for example '\n') is of type int but is implicitly converted to a char when assigned or compared to another char.
Number literals are interchangeable with character literals, as long as the former do not exceed the range of a character.
The following should result in a compiler warning:
char c = 257;
whereas this will not:
char c = 127;
A char is C is an integral type as is short, int, long, and long long (and many other types):
It is defined as the smallest addressable unit on the machine you are compiling on and will usually be 8 bits which means it can hold values -128 to 127. And an unsigned char can hold values 0 - 255.
It works as an iterator in the above since it will stop before 50 all the time and it can hold values up to 127. Whereas an int type can usually hold values up to 2,147,483,647, but takes up 4 times the space in the machine as an 8 bit char. An int is only guaranteed to be at least 16 bits in C which means values between −32,768 and 32,767 or 0 - 6,5535 for an unsigned int.
So your loop is just accessing elements in your array, one after the other like words[0] at the beginning to look at the first character, then words[1] to look at the next character. Since you use a char, which I'm assuming is 8 bits on your machine as that is very common. Your char will be enough to store the iterator for your loop until it gets above 127. If you read in more than 127 characters (instead of just 50) and used a char to iterate you would run into weird problems since the char can't hold 128 and will loop around to -128. Causing you to access words[-128] which would most likely result in a Segmentation Fault.