s(n)printf prints more characters than format specifier specifies - c

I am encountering a curious issue with sprintf on an embedded system (Libelium Waspmote, similar to Arduino) where sprintf is outputting more characters than given by the format specifier. In this particular instance I am using %02X to output the hexadecimal value of bytes in an array. However on some bytes, instead of writing 2 characters, 4 are being written, with FF being prefixed before the actual byte value. snprintf behaves similarly, except that it respects the buffer size specified and just prints the prefix.
For reference, here is the code snippet printing the array contents:
char *pduChars = (char *) malloc(17*sizeof(char));
pduData.toChar(pduChars);
for (int i = 0; i < 17; i++) {
char asciiCharsS[5];
char asciiCharsSN[3];
int printedS = sprintf(asciiCharsS, "%02X", pduChars[i]);
int printedSN = snprintf(asciiCharsSN, 3, "%02X", pduChars[i]);
USB.print(printedS);
USB.print(" ");
USB.print(printedSN);
USB.print(" ");
USB.print(asciiCharsS);
USB.print(" ");
USB.print(asciiCharsSN);
USB.println(" ");
}
And the output from that snippet (abridged to only the erroneous bytes):
The actual byte sequence should be 0x00 0xFC 0xFF 0xFF 0x48 0xA5 0x33 0x51
sprintf snprintf sprintf Buffer snprintf Buffer
…
2 2 00 00
4 4 FFFC FF
4 4 FFFF FF
4 4 FFFF FF
2 2 48 48
4 4 FFA5 FF
2 2 33 33
2 2 51 51
Am I overlooking something here or might this be a platform-specific issue relating to the implementation of s(n)printf?

I'm guessing your implementation is using signed chars. The format "%X" expects unsigned values. Cast or use unsigned char instead.
/* cast */
int printedS = sprintf(asciiCharsS, "%02X", (unsigned char)pduChars[i]);
int printedSN = snprintf(asciiCharsSN, 3, "%02X", (unsigned char)pduChars[i]);
or
/* use unsigned char */
unsigned char *pduChars = malloc(17); /* cast is, at best, redundant */
/* sizeof (char) is, by definition, 1 */

The format specifier modifiers you are using are only used for padding. In case the value's number of symbols exceeds the specified value, the whole string will be printed.

%02X is for padding... it will not omit...
so in case your value is greater than specified value then whole string will be printed

Related

How to get unicode value of multibyte character stored under char * in C?

Let's assume I don't use <uchar.h> from C11 and have something like this
char *a = "Ā";
How can I get unicode value of this character (it's 256)? Doing something like this:
int *a_value = (int *)a;
printf("%d\n", *a_value);
doesn't work.
How this character is written in memory?
gdb shows me:
loc a = 0x555555556004 "Ā": -60 '\304'
but quite don't get what does it exactly mean.
I've checked size of a and it's 2 bytes which is okay but doing
printf("%d\n", a[0]);
printf("%d\n", a[1]);
also doesn't work. It gives me -60 and -128.
The value is encoded as UTF-8.
256 in binary is 100000000 (9 bits). It has more than 7 bits (but less than 12) so it need 2 bytes to be encoded in UTF-8.
The 1st byte will have the first 5 bits, the 2nd byte will have the final 6 bits.
So, again 256 in binary with 11 bits is 00100000000 or 00100 followed by 000000
Final UTF-8 1st byte 11000100 ... 110 + 00100
Final UTF-8 2nd byte 10000000 ... 10 + 000000
11000100 in decimal is 196, or considering the MSB a sign bit: -60
10000000 in decimal is 128, or considering the MSB a sign bit: -128
Read more about UTF-8 encoding in the Wikipedia article
Two more things:
(1) You got those weird numbers because plain characters on your machine (like many) are evidently signed. You can see the "real" bytes by casting to unsigned char:
char *a = "Ā";
printf("%u %u\n", ((unsigned char *)a)[0], ((unsigned char *)a)[1]);
printf("%x %x\n", ((unsigned char *)a)[0], ((unsigned char *)a)[1]);
or by using unsigned char all along:
unsigned char *u = "Ā";
printf("%x %x\n", u[0], u[1]);
The %u version prints 196 128, and the %x versions print c4 80.
(2) I'm not sure what you meant by "not using <uchar.h> from C11", but if you don't want to do the UTF-8 conversion by hand, you can convert a "multibyte string" (which is just about invariably UTF-8) to a wide or Unicode character by using the library function mbtowc from <stdlib.h>:
wchar_t wc;
mbtowc(&wc, a, strlen(a));
printf("%d %x\n", wc, wc);
This prints 256 100 on my machine, since Ā is U+0100.
Another useful function is mbstowcs, which does this for multiple characters at once:
char *mbs = "Daß ist sehr schön";
printf("%s\n", mbs);
wchar_t wcs[20];
int n = mbstowcs(wcs, mbs, 20);
for(int i = 0; i < n; i++)
printf("%3d %x %lc\n", wcs[i], wcs[i], wcs[i]);
When using functions like mbtowc and mbstowcs, however, you have to remember that they do not necessarily deal in UTF-8 and Unicode. There are wide character encodings other than Unicode, and there are multibyte representations other than UTF-8. In fact, to get these functions to work "correctly" on my machine I have to first call
setlocale(LC_CTYPE, "");
to tell them that it's okay to use my locale settings (namely, en_US.UTF-8), instead of the default "C" locale which does not assume Unicode.

int to hex conversion not going proper for high values 225943 is being converted into 0x000372ffffff97

My C program takes a random int high value and convert it into hex and write it to a file. Everything goes well if the value is 225919 or less
eg. 225875 is 00 03 72 53
but if the value is above 225919 it starts writing extra ffffff for last byte in the hex value example 885943 is 00 03 72 ffffff97, while the right value would have been 00 03 72 97.
Code that writes the value into file is as follows:
char *temp = NULL;
int cze = 225943;
temp = (char *)(&cze);
for (ii = 3; ii >= 0; ii--) {
printf(" %02x ", temp[ii]); //just for printing the values
fprintf(in, "%02x", temp[ii]);
}
Output is: 00 03 72 ffffff97
Expected output: 00 03 72 97
Please help, any pointer is appreciated.
Your temp array contains char values, which in most cases means signed char. The values are then being printed as signed chars, so any byte greater than 0x7f is considered a negative value. When that value is passed to printf, it is implicitly converted to int. This adds one or more bytes containing all 1 bits if the number is negative.
Change the datatype to unsigned char. This will cause the implicit promotion to change to unsigned int and you'll get the correct values.
unsigned char *temp=NULL;
int cze=225943;
temp=(unsigned char *)(&cze);
for(ii=3;ii>=0;ii--){
printf(" %02x ",temp[ii] );//just for printing the values
fprintf(in,"%02x",temp[ii]);
}
Alternately, you can use the hh length modifier in printf, which tells it that the argument is a char or unsigned char. This will restrict it to printing 1 byte's worth of data.
printf(" %02hhx ",temp[ii] );

Unexpected output in the C code with union

I don't understand the output in the following C code:
#include <stdio.h>
int main()
{
union U
{
int i;
char s[3];
} u;
u.i=0x3132;
printf("%s", u.s);
return 0;
}
Initial memory is 32 bits and is the binary value of 0x3132 which is
0000 0000 0000 0000 0011 0001 0011 0010.
If the last three bytes of 0x3132 are the value of s (without leading zeroes), then s[0]=0011,s[1]=0001,s[2]=0011.
This gives the values of s=0011 0001 0011=787.
Question: Why the output is 21 and not 787?
The value 0x3132 is represented in memory as: 0x32 , 0x31 , 0x0 , 0x0, because the byte order is in little endian.
The printf call prints out the string represented by the member of the union s. The string is printed out byte by byte. First 0x32 and then 0x31 which are the ascii values for the characters: '2' and '1'. Then the printing stops as the third element is the null character: 0x0.
Note that the representation of int is implementation defined and may not consist of 4 bytes and may have padding. Thus the member of the union s may not represent a string, in which case calling printf with the %s specifier will cause undefined behavior.
first see this code sample:
#include <inttypes.h>
#include <stdio.h>
#include <stdint.h>
int main()
{
union{
int32_t i32;
uint32_t u32;
int16_t i16[2];
uint16_t u16[2];
int8_t i8[4];
uint8_t u8[4];
} u;
u.u8[3] = 52;
u.u8[2] = 51;
u.u8[1] = 50;
u.u8[0] = 49;
printf(" %d %d %d %d \n", u.u8[3], u.u8[2], u.u8[1], u.u8[0]); // 52 51 50 49
printf(" %x %x %x %x \n", u.u8[3], u.u8[2], u.u8[1], u.u8[0]); // 34 33 32 31
printf(" 0x%x \n", u.i32); // 0x34333231
return 0;
}
the union here is just to access the memory of u in 6 different ways.
you may use u.i32 to read or write as int32_t or
you may use u.u32 to read or write as uint32_t or
you may use u.i16[0] or u.i16[1] to read or write as int16_t or
you may use u.u16[0] or u.u16[1] to read or write as uint16_t or
or like this to write as uint8_t:
u.u8[3] = 52;
u.u8[2] = 51;
u.u8[1] = 50;
u.u8[0] = 49;
and read like this as int8_t:
printf(" %d %d %d %d \n", u.u8[3], u.u8[2], u.u8[1], u.u8[0]);
then output is:
52 51 50 49
and read as int32_t:
printf(" 0x%x \n", u.i32);
then output is:
0x34333231
so as you see in this sample code union shares one memory place with many names/types.
in your sample code u.i=0x3132; this writes 0x3132 inside u.i memory, and depending on endianness of you system which is little-endian here, then you asked printf("%s", u.s); from compiler, so u.s is array of type char meaning constant pointer to char type, so this printf("%s", u.s); will reads u.s[0] and prints that on the output stdout then reads u.s[1] and prints that on the output stdout and so on ..., until one of this u.s[i] is zero.
this is what your code doing, so if none of u.s[0], u.s[1], u.s[2], u.s[3] not zero, then memory outside of your union will be read until one zero found or system memory fault error happens.
It means that you machine is little-endian, so the bytes are stored in the opposite order, like this:
32 31 00 00
So: s[0] = 0x32, s[1] = 0x31, s[2] = 0x00.
Even if in theory printing an array of chars using "%s" is undefined behaviour, this works, it prints 0x32 (character '2'), 0x31 (character '1') and then it stops a 0x00.
if you write your code like this:
#include <stdio.h>
int main( void )
{
union U
{
int i;
char s[3];
} u;
u.i=0x3132;
printf("%s", u.s);
printf( "%8x\n", (unsigned)u.i);
}
Then you would see that the contents of u.i is 0x0000000000003132, which would actually be stored as: 0x3231000000000000 due to Endianness
and 0x00 is not a printable character, so the output from the second call to printf() is <blank><blank><blank><blank><blank><blank>3132 as you would expect
and the ascii char 1 is 0x31 and ascii char 2 is 0x32 and the first 0x00 stops the %s operations, so the first printf() outputs 21.

C Printing Hexadecimal

Okay so I am trying to print hexadecimal values of a struct. Now my print function does the following:
int len = sizeof(someStruct);
unsigned char *buffer = (unsigned char*)&someStruct;
int count;
for(count = 0; count < len; count++) {
fprintf(stderr, "%02x ", buffer[count]);
}
fprintf(stderr, "\n");
Here is the definition of the struct:
struct someStruct {
unsigned char a;
short myShort;
} __attribute__((packed)) someStruct;
The length of this struct printed out as expected is (output on console):
sizeof(someStruct): 3 bytes
Issue here is the following that I am encountering. There is a short which I set to a value.
someStruct.myShort = 0x08;
Now this short is 2 bytes long. When it is printed out into the console however, it does not show the most significant 0x00. Here is the output I get,
stderr: 00 08
I would like the following output however (3 bytes long),
stderr: 00 00 08
If I fill the short with a 0xFFFF, then I do get the 2 byte output, however, whenever there is leading 0x00, it does not output the leading 0x00 to console.
Any ideas on what I am doing wrong. Probably something small I would assume I am overlooking.
After you provided more info, your code is OK for me. It prints the output:
00 08 00
First 00 is from unsigned char a; and second bytes 08 00 are from short. They are switched because of platform dependent data storing in memory.
If you want switched bytes of the short you could just show a short:
fprintf(stderr, "%02x %02x", (someStruct.myShort >> 8) & 0xFF, someStruct.myShort & 0xFF)
I don't see a problem with your code. However, I get 08 00, which makes sense on my little-endian Intel machine.
The problem is in the format of the printf
%02x
%02x means that the result will be printed as hex value (x), with a minimum lenght of 2 (2) and filling the spaces with 0 (0)
Try with
fprintf(stderr, "%04x ", buffer[count]);
The width specifier in the format string (2 in your case) refers to the minimum number of characters in the text output, not the number of bytes to print. Try using "%04x " as your format string instead.
As for the digit grouping (00 08 as opposed to 0008): Plain old printf doesn't support that, but POSIX printf does. Info here: Digit grouping in C's printf
Need to take care not to shift in a signed bit should buffer be signed. Use "hh" to only print 1 byte worth of data. "hh" available with C99. See What is the purpose of the h and hh modifiers for printf?
fprintf(stderr, "%02hhx %02hhx", buffer[count] >> 8, buffer[count]);
[Edit OP's latest edit wants to see 3 bytes] This will print all field's contents. Each field is in the endian order of the machine.
size_t len = sizeof(someStruct);
const unsigned char *buffer = (unsigned char*)&someStruct;
size_t count;
for(count = 0; count < len; count++) {
fprintf(stderr, "%02x ", buffer[count]);
}
fprintf(stderr, "\n");

Hex to Decimal conversion in C

Here is my code which is doing the conversion from hex to decimal. The hex values are stored in a unsigned char array:
int liIndex ;
long hexToDec ;
unsigned char length[4];
for (liIndex = 0; liIndex < 4 ; liIndex++)
{
length[liIndex]= (unsigned char) *content;
printf("\n Hex value is %.2x", length[liIndex]);
content++;
}
hexToDec = strtol(length, NULL, 16);
Each array element contains 1 byte of information and I have read 4 bytes. When I execute it, here is the output that I get :
Hex value is 00
Hex value is 00
Hex value is 00
Hex value is 01
Chunk length is 0
Can any one please help me understand the error here. Th decimal value should have come out as 1 instead of 0.
Regards,
darkie
My guess from your use of %x is that content is encoding your hexademical number as an array of integers, and not an array of characters. That is, are you representing a 0 digit in content as '\0', or '0'?
strtol only works in the latter case. If content is indeed an array of integers, the following code should do the trick:
hexToDec = 0;
int place = 1;
for(int i=3; i>=0; --i)
{
hexToDec += place * (unsigned int)*(content+i);
place *= 16;
}
content += 4;
strtol is expecting a zero-terminated string. length[0] == '\0', and thus strtol stops processing right there. It converts things like "0A21", not things like {0,0,0,1} like you have.
What are the contents of content and what are you trying to do, exactly? What you've built seems strange to me on a number of counts.

Resources