printing improper number of characters in a string - c

#include <stdio.h>
int main()
{
char * name = "bob";
int x = sizeof(name);
printf("%s is %d characters\n",name,x);
}
I have the above code. I want to print the number of characters in this string. It keeps printing 8 instead of 3. Why?

sizeof() returns byte size. Specifically, it gives the bytes required to store an object of the type of the operand. In this case sizeof() is returning the byte size of a pointer to a string, which on your computer is 8 bytes, or, 64-bits.
strlen() is what you are looking for:
#include <stdio.h>
#include <string.h> // include string.h header to use strlen()
int main()
{
char * name = "bob";
int x = strlen(name); // use strlen() here
printf("%s is %d characters\n",name,x);
}

use strlen for finding the length of a string.
Each character is atleast 1 byte wide. It prints 8 because sizeof gets a pointer to bob and, on your machine, a pointer is 8 bytes wide.

strlen gives you the number of characters in a string. sizeof gives you the size of the object in bytes. On your system, an object of type char * is apparently 8 bytes wide.

Related

Function is returning a different value every time?

I'm trying to convert a hexadecimal INT to a char so I could convert it into a binary to count the number of ones in it. Here's my function to convert it into char:
#include <stdio.h>
#include <stdlib.h>
#define shift(a) a=a<<5
#define parity_even(a) a = a+0x11
#define add_msb(a) a = a + 8000
void count_ones(int hex){
char *s = malloc(2);
sprintf(s, "0x%x", hex);
free(s);
printf("%x", s);
};
int main() {
int a = 0x01B9;
shift(a);
parity_even(a);
count_ones(a);
return 0;
}
Every time I run this, i always get different outputs but the first three hex number are always the same. Example of outputs:
8c0ba2a0
fc3b92a0
4500a2a0
d27e82a0
c15d62a0
What exactly is happening here? I allocated 2 bytes for the char since my hex int is 2 bytes.
It's too long to write a comment so here goes:
I'm trying to convert a hexadecimal INT
int are stored as a group of value, padding (possible empty) and sign bits, so is there no such thing as a hexadecimal INT but you can represent (print) a given number in the hexadecimal format.
convert a ... INT to a char
That would be lossy conversion as an int might have 4 bytes of data that you are trying to cram into a 1 byte. char specifically may be signed or unsigned. You probably mean string (generic term) or char [] (standard way to represent a string in C).
binary to count the number of ones
That's the real issue you are trying to solve and this is a duplicate of:
How to count the number of set bits in a 32-bit integer?
count number of ones in a given integer using only << >> + | & ^ ~ ! =
To address the question you ask:
Need to allocate more than 2 bytes. Specifically ceil(log16(hex)) + 2 (for 0x) + 1 (for trailing '\0').
One way to get the size is to just ask snprintf(s, 0, ...)
then allocate a suitable array via malloc (see first implementation below) or use stack allocated variable length array (VLA).
You can use INT_MAX instead of hex to get an upper
bound. log16(INT_MAX) <= CHAR_BIT * sizeof(int) / 4 and the
latter is a compile time constant. This means you can allocate your string on stack (see 2nd implementation below).
It's undefined behavior to use a variable after it's deallocated. Move free() to after the last use.
Here is one of the dynamic versions mentioned above:
void count_ones(unsigned hex) {
char *s = NULL;
size_t n = snprintf(s, 0, "0x%x", hex) + 1;
s = malloc(n);
if(!s) return; // memory could not be allocated
snprintf(s, n, "0x%x", hex);
printf("%s (size = %zu)", s, n);
free(s);
};
Note, I initialized s to NULL which would cause the first call to snprintf() to return an undefined value on SUSv2 (legacy). It's well defined on c99 and later. The output is:
0x3731 (size = 7)
And the compile-time version using a fixed upper bound:
#include <limits.h>
// compile-time
void count_ones(unsigned hex) {
char s[BIT_CHAR * sizeof(int) / 4 + 3];
sprintf(s, "0x%x", hex);
printf("%s (size = %zu)", s, n);
};
and the output is:
0x3731 (size = 11)
Your biggest problem is that malloc isn't allocating enough. As Barmar said, you need at least 7 bytes to store it or you could calculate the amount needed. Another problem is that you are freeing it and then using it. It is only one line after the free that you use it again, which shouldn't have anything bad happen like 99.9% of the time, but you should always free after you know you are done using it.

Variable storage from character to integer pointer is not retrieving data properly

In below program why data of character printed properly in normal int variable case and why it is not printing the data properly in int?
Case 1
#include <stdio.h>
int main()
{
int *i = NULL;
char s = 'A';
i = (int *)&s; // storing
printf("i - %d\n",*i);
return 0;
}
Output :
i - 1837016897
Why 65 value not printed here?
Case 2
#include <stdio.h>
int main()
{
int *i = NULL;
char s = 'A';
i = (int *)&s; // storing
printf("i - %c\n",*i); // if we display character stored here
// then it is printed properly
return 0;
}
Output:
i - A
Case 3
#include <stdio.h>
int main()
{
int i = 0;
char s = 'A';
i = s;
printf("i - %d\n",i); // in this case data is properly printing
return 0;
}
Output:
i - 65
int is 4 bytes long, char is 1 byte. This means that you cannot convert like this. You are taking the pointer of a 1-byte variable and are telling the program to interpret it like a 4-byte variable, so whatever is behind it in the memory will also be used in the program.
1837016897 in hexadecimal value becomes 0x6D7EA741, where the last byte (0x41) is actually decimal 65, so the character does show up in your result (if you're wondering why it is the last byte and not the first, this is because of endianness - you can read up on that yourself if you like).
Your programs 1 and 2 exhibit undefined behaviour because they refer to an object of type char via an lvalue of type int. This is not allowed according to section 6.5 paragraph 7 of the standard.
Your program 3 is OK because char is implicitly converted to int when passed to a function like printf, which is perfectly normal and well-defined.
i = (int *)&s;
this makes sure i points to the address of s, but since s is char and i is int* when dereferencing i, (*i) the compiler looks for int type unless using cast like so ( *(char*)i ), so it looks at the address of s, but looks at 4 bytes instead of 1 (assuming 32-bit int)
int requires 4 bytes to be stored in memory while char requires only 1 byte to be stored in memory. So char s = 'A' stores only one byte with value 65 at memory address &s.
In case 1, you try to print 4 bytes at the memory address pointed by &s as am integer. Now the memory adjacent to &s may have garbage values hence, you get 1837016897 instead of 65.
In case 2, you are printing using %c which recasts i into a char and hence prints only one byte.
In case 3, i=s stores the value of s i.e. 65 into i, hence you get 65 as the output.

printing the char value of each wide character's bytes

when running the following:
char acute_accent[7] = "éclair";
int i;
for (i=0; i<7; ++i)
{
printf("acute_accent[%d]: %c\n", i, acute_accent[i]);
}
I get:
acute_accent[0]:
acute_accent[1]: �
acute_accent[2]: c
acute_accent[3]: l
acute_accent[4]: a
acute_accent[5]: i
acute_accent[6]: r
which makes me think that the multibyte character é is 2-byte wide.
However, when running this (after ignoring the compiler warning me from multi-character character constant):
printf("size: %lu",sizeof('é'));
I get size: 4.
What's the reason for the different sizes?
EDIT: This question differs from this one because it is more about multibyte characters encoding, the different UTFs and their sizes, than the mere understanding of a size of a char.
The reason you're seeing a discrepancy is because in your first example, the character é was encoded by the compiler as the two-byte UTF-8 codepoint 0xC3 0xA9.
See here:
http://www.fileformat.info/info/unicode/char/e9/index.htm
And as described by dbush, the character 'é' was encoded as a UTF-32 codepoint and stored in an integer; therefore it was represented as four bytes.
Part of your confusion stems from using an implementation defined feature by storing Unicode in an undefined manner.
To prevent undefined behavior you should always clearly identify the encoding type for string literals.
For example:
char acute_accent[7] = u8"éclair"
This is very bad form because unless you count it out yourself, you can't know the exact length of the string unless. And indeed, my compiler (g++) is yelling at me because, while the string is 7 bytes, it's 8 bytes total with the null character at the end. So you have actually overrun the buffer.
It's much safer to use this instead:
const char* acute_accent = u8"éclair"
Notice how your string is actually 8-bytes:
#include <stdio.h>
#include <string.h> // strlen
int main() {
const char* a = u8"éclair";
printf("String length : %lu\n", strlen(a));
// Add +1 for the null byte
printf("String size : %lu\n", strlen(a) + 1);
return 0;
}
The output is:
String length : 7
String size : 8
Also note that the size of a char is different between C and C++!!
#include <stdio.h>
int main() {
printf("%lu\n", sizeof('a'));
printf("%lu\n", sizeof('é'));
return 0;
}
In C the output is:
4
4
While in C++ the output is:
1
4
From the C99 standard, section 6.4.4.4:
2 An integer character constant is a sequence of one or more multibyte
characters enclosed in single-quotes, as in 'x'.
...
10 An integer character constant has type int.
sizeof(int) on your machine is probably 4, which is why you're getting that result.
So 'é', 'c', 'l' are all integer character constants, so all are of type int whose size is 4. The fact that some are multibyte and some are not doesn't matter in this regard.

Byte allocation in short unsigned data

kindly check the following program
#include <stdio.h>
#include<stdlib.h>
int main()
{
char *Date= NULL;
unsigned short y=2013;
Date = malloc(3);
sprintf((char *)Date,"%hu",y);
printf("%d %d %d %d %d \n",Date[0],Date[1],Date[2],Date[3],Date[4]);
printf("%s %d %d",Date,strlen(Date),sizeof(y));
}
output:
50 48 49 51 0
2013 4 2
How I am getting the string length 4 instead 2,as I am putting a short integer value into the memory so it should be occupied in 2 byte of memory.But how it is taking 4 byte.
How each byte getting 2 0 1 3 from the input, instead 20 in one byte and 13 in another byte.
I want to put 20 to one byte and 13 to another byte.How to do that.kindly tell something
Kindly give some answer.
As indicate by its name, the sprintf function write a formated string. So, your number 2013 is converted to "2013" (a 5 character string).
You are invoking undefined behaviour.
You have allocated only 3 bytes for Date and storing 5 bytes.
Four bytes for 2013 and 1 NUL byte. So you should allocate at least 5 bytes if you want to store 2013.
If you want to transfer a stream of bytes then I suggest you do in the following way:
#include <stdio.h>
#include<string.h>
#include<stdlib.h>
int main()
{
unsigned char *Date= NULL;
unsigned short y=2013;
unsigned char *p;
p = (unsigned char*) &y;
Date = malloc(3);
Date[0] = *p;
Date[1] = *(p+1);
Date[2] = 0;
printf("%s %d %d",Date,strlen(Date),sizeof(y));
}
This outputs:
� 2 2
The strange char is because interpreting some byte values as a string. Plain char may be signed or unsigned depending on your implementation. So use unsigned char to avoid incorrect interpretation of bytes.

Sizeof vs Strlen

#include <stdio.h>
#include <string.h>
int main(int argc, char *argv[]) {
char string[] = "october"; // 7 letters
strcpy(string, "september"); // 9 letters
printf("the size of %s is %d and the length is %d\n\n", string,
sizeof(string), strlen(string));
return 0;
}
Output:
$ ./a.out
the size of september is 8 and the length is 9
Is there something wrong with my syntax or what?
sizeof and strlen() do different things. In this case, your declaration
char string[] = "october";
is the same as
char string[8] = "october";
so the compiler can tell that the size of string is 8. It does this at compilation time.
However, strlen() counts the number of characters in the string at run time. So, after you call strcpy(), string now contains "september". strlen() counts the characters and finds 9 of them. Note that you have not allocated enough space for string to hold "september". This is undefined behaviour.
The Output is correct because
first statement string size was allocated by compiler that is 7+1 (October is 7 bytes & 1 byte for null terminator at compile time)
Second statement: you are copying September (9 bytes to 8 bytes string);
there for you got size of September as 8 bytes (still strlen() will not work for September it does not have null character)
Your destination array is 8 bytes (length of "october" plus \0) and you want to put in 9 chars in that array.
man strcpy says:
If the destination string of a strcpy() is not large enough, then anything might happen.
Please tell me what you really want to do, because this smells bad long way
You must eliminate buffer overflow problem in this example. One way to do this - is to use strncpy:
memset(string, 0, sizeof(string));
strncpy(string, "september", sizeof(string)-1);

Resources