c binary file reading problems - c

hi i am reading a binary file using c as shown here link text
so that all the information read from binary file is stored in "char *buffer".
i have the format standard where it says that one of the lines should be
format: unsigned char, size: 1 byte
i am doing the following:
printf("%x\n", buffer[N]);
but what should i do when the format says:
format: unsigned short, size: 2 bytes
if i do it as follows, would this be correct:
printf("%d%d\n", buffer[N], buffer[N+1]);
if not can you show me the correct way?
Also can you tell me if the following are correct way while printing:
char %c
unsigned long %ul
unsigned short %d
unsigned char %x
double %f
long %ld
all of the data in binary file is in little-endian format! thanks a lot in advance!

Try printf("%d", (short)(buffer[N] + buffer[N+1]<<8)). Now notice that I had to assume that the byte order in the buffer had the least significant byte of the two-byte short stored at the lower address.
I could likely have written *(short *)(&buffer[N]), but that assumes that N has the right alignment to hold a short on your platform, and that the buffer and the platform agree on byte order.
This is actually just the tip of a very large iceberg of a topic. There are many subtle issues lurking, and some really unsubtle ones when you wander into floating point values.

Related

Writing and reading a binary file in c

I am trying to write an integer to a binary file and then read the same thing. However, my program reads a different number than the one that was written. What am I doing wrong?
unsigned short int numToWrite = 2079;
// Write to output
FILE *write_ptr;
write_ptr = fopen("test.bin","wb"); // w for write, b for binary
printf("numToWrite: %d\n", *(&numToWrite));
fwrite(&numToWrite, sizeof(unsigned short int), 1, write_ptr); // write 10 bytes from our buffer
fclose(write_ptr);
// Read the binary file
FILE *read_ptr = fopen(filename, "rb");
if (!read_ptr) {
perror("fopen");
exit(EXIT_FAILURE);
}
unsigned short int* numToRead = malloc(sizeof (unsigned short int));
fread(numToRead, sizeof(unsigned short int), 1, read_ptr);
printf("numToRead: %d\n", *numToRead);
free(numToRead);
fclose(read_ptr);
The output is this:
numToWrite: 2079
numToRead: 26964
man printf
Length modifier
h - A following integer conversion corresponds to a short or unsigned short argument, ...
Conversion specifiers
d,i - The int argument is converted to signed decimal notation.
Format of the format string
...Each conversion specification is introduced by the character %, and ends with a conversion specifier. In between there may be (in this order) zero or more flags, an optional minimum field width, an optional precision and an optional length modifier.
You're using unsigned short int, but that's not what you're telling to printf.
Hence, your expectations are not fulfilled.
A few things that are going on:
'printf' does not have any binary format specifier, so you would have to do it manually.
You need to take a deep dive in data types and their ranges, so I recommend using this source: Microsoft Data Type Ranges. It says C++, it is irrelevant since it gives you a good idea of the ranges.
I know this is isn't you're entire code, but just understand there are somethings from what I'm seeing that is not defined, such as 'filename'.
In case someone mentions atoi() and itoa(), keep this in mind, theoretically you could use atoi() and itoa(), however just be mindful that atoi() and itoa() is a non-standard function which is supported by some compilers.
Lastly, why are you using 'unsigned short int', '*(&numToWrite)' and is there anymore from the program that you can show us?

Please help me for using memcpy() in C

I have the following code:
void main()
{
char tmp[3]= "AB";
short k;
memcpy(&k,tmp,2);
printf("%x\n", k);
}
In ASCII, the hex value of char 'A' is 41 and the hex value of char 'B' is 42. Why is the result of this program 4241? I think the correct result is 4142.
You are apparently running this on a "little-endian" machine, where the least significant byte comes first. See http://en.wikipedia.org/wiki/Endianness.
Your platform stores less significant bytes of a number at smaller memory addresses, and more significant bytes at higher memory addresses. Such platforms are called little-endian platforms.
However, when you print a number the more significant digits are printed first while the less significant digits are printed later (which is how our everyday numeric notation works). For this reason the result looks "reversed" compared to the way it is stored in memory on a little-endian platform.
If you compile and run the same program on a big-endian platform, the output should be 4142 (assuming a platform with 2-byte short).
P.S. One can argue that the "problem" in this case is the "weirdness" of our everyday numerical notation: we write numbers so that the significance of their digits increase in right-to-left direction. This appears to be inconsistent in the context of societies that write and read in left-to-right direction. In other words, it in not the little-endian memory that is reversed. It is the way we write numbers that is reversed.
Your system is little-endian. That means that a short (16-bit integer) is stored with the least significant byte first, followed by the most significant byte.
The same goes for larger integers. The following code would result in "44434241".
void main()
{
char tmp[5]= "ABCD";
int k;
memcpy(&k,tmp,4);
printf("%x\n", k);
}

difference between printing a memory address using %u and %d in C?

I reading a C book. To print out a memory address of a variable, sometimes the book uses:
printf("%u\n",&n);
Sometimes, the author wrote:
printf("%d\n",&n);
The result is always the same, but I do not understand the differences between the two (I know %u for unsigned).
Can anyone elaborate on this, please?
Thanks a lot.
%u treats the integer as unsigned, whereas %d treats the integer as signed. If the integer is between 0 an INT_MAX (which is 231-1 on 32-bit systems), then the output is identical for both cases.
It only makes a difference if the integer is negative (for signed inputs) or between INT_MAX+1 and UINT_MAX (e.g. between 231 and 232-1). In that case, if you use the %d specifier, you'll get a negative number, whereas if you use %u, you'll get a large positive number.
Addresses only make sense as unsigned numbers, so there's never any reason to print them out as signed numbers. Furthermore, when they are printed out, they're usually printed in hexadecimal (with the %x format specifier), not decimal.
You should really just use the %p format specifier for addresses, though—it's guaranteed to work for all valid pointers. If you're on a system with 32-bit integers but 64-bit pointers, if you attempt to print a pointer with any of %d, %u, or %x without the ll length modifier, you'll get the wrong result for that and anything else that gets printed later (because printf only read 4 of the 8 bytes of the pointer argument); if you do add the ll length modifier, then you won't be portable to 32-bit systems.
Bottom line: always use %p for printing out pointers/addresses:
printf("The address of n is: %p\n", &n);
// Output (32-bit system): "The address of n is: 0xbffff9ec"
// Output (64-bit system): "The address of n is: 0x7fff5fbff96c"
The exact output format is implementation-defined (C99 §7.19.6.1/8), but it will almost always be printed as an unsigned hexadecimal number, usually with a leading 0x.
%d and %u will print the same results when the most significant bit is not set. However, this isn't portable code at all, and is not good style. I hope your book is better than it seems from this example.
What value did you try? The difference unsigned vs. signed, just as you said you know. So what did it do and what did you expect?
Positive signed values look the same as unsigned so can I assume you used a smaller value to test? What about a negative value?
Finally, if you are trying to print the variable's address (as it appears you are), use %p instead.
All addresses are unsigned 32-bit or 64-bit depending on machine (can't write to a negative address). The use of %d isn't appropriate, but will usually work. It is recommended to use %u or %ul.
There is no such difference ,just don't get confused if u have just started learning pointers.
%u is for unsigned ones.And %d for signed ones

Byte order (endian) of int in NSLog?

NSLog function accepts printf format specifiers.
My question is about %x specifier.
Does this print hex codes as sequence on memory? Or does it have it's own printing sequence style?
unsigned int a = 0x000000FF;
NSLog(#"%x", a);
Results of above code on little or big endian processors are equal or different?
And how about NSString's -initWithFormat method? Does it follows this rule equally?
%x always prints the most significant digits first. Doesn't matter what kind of processor it is running on.

What's a portable way of converting Byte-Order of strings in C

I am trying to write server that will communicate with any standard client that can make socket connections (e.g. telnet client)
It started out as an echo server, which of course did not need to worry about network byte ordering.
I am familiar with ntohs, ntohl, htons, htonl functions. These would be great by themselves if I were transfering either 16 or 32-bit ints, or if the characters in the string being sent were multiples of 2 or 4 bytes.
I'd like create a function that operates on strings such as:
str_ntoh(char* net_str, char* host_str, int len)
{
uint32_t* netp, hostp;
netp = (uint32_t*)&net_str;
for(i=0; i < len/4; i++){
hostp[i] = ntoh(netp[i]);
}
}
Or something similar. The above thing assumes that the wordsize is 32-bits. We can't be sure that the wordsize on the sending machine is not 16-bits, or 64-bits right?
For client programs, such as telnet, they must be using hton* before they send and ntoh* after they receive data, correct?
EDIT: For the people that thing because 1-char is a byte that endian-ness doesn't matter:
int main(void)
{
uint32_t a = 0x01020304;
char* c = (char*)&a;
printf("%x %x %x %x\n", c[0], c[1], c[2], c[3]);
}
Run this snippet of code. The output for me is as follows:
$ ./a.out
4 3 2 1
Those on powerPC chipsets should get '1 2 3 4' but those of us on intel chipset should see what I got above for the most part.
Maybe I'm missing something here, but are you sending strings, that is, sequences of characters? Then you don't need to worry about byte order. That is only for the bit pattern in integers. The characters in a string are always in the "right" order.
EDIT:
Derrick, to address your code example, I've run the following (slightly expanded) version of your program on an Intel i7 (little-endian) and on an old Sun Sparc (big-endian)
#include <stdio.h>
#include <stdint.h>
int main(void)
{
uint32_t a = 0x01020304;
char* c = (char*)&a;
char d[] = { 1, 2, 3, 4 };
printf("The integer: %x %x %x %x\n", c[0], c[1], c[2], c[3]);
printf("The string: %x %x %x %x\n", d[0], d[1], d[2], d[3]);
return 0;
}
As you can see, I've added a real char array to your print-out of an integer.
The output from the little-endian Intel i7:
The integer: 4 3 2 1
The string: 1 2 3 4
And the output from the big-endian Sun:
The integer: 1 2 3 4
The string: 1 2 3 4
Your multi-byte integer is indeed stored in different byte order on the two machines, but the characters in the char array have the same order.
With your function signature as posted you don't have to worry about byte order. It accepts a char*, that can only handle 8-bit characters. With one byte per character, you cannot have a byte order problem.
You'd only run into a byte order problem if you send Unicode, either in UTF16 or UTF32 encoding. And the endian-ness of the sending machine doesn't match the one of the receiving machine. The simple solution for that is to use UTF8 encoding. Which is what most text is sent as across networks. Being byte oriented, it doesn't have a byte order issue either. Or you could send a BOM.
If you'd like to send them as an 8-bit encoding (the fact that you're using char implies this is what you want), there's no need to byte swap. However, for the unrelated issue of non-ASCII characters, so that the same character > 127 appears the same on both ends of the connection, I would suggest that you send the data in something like UTF-8, which can represent all unicode characters and can be safely treated as ASCII strings. The way to get UTF-8 text based on the default encoding varies by the platform and set of libraries you're using.
If you're sending 16-bit or 32-bit encoding... You can include one character with the byte order mark which the other end can use to determine the endianness of the character. Or, you can assume network byte order and use htons() or htonl() as you suggest. But if you'd like to use char, please see the previous paragraph. :-)
It seems to me that the function prototype doesn't match its behavior. You're passing in a char *, but you're then casting it to uint32_t *. And, looking more closely, you're casting the address of the pointer, rather than the contents, so I'm concerned that you'll get unexpected results. Perhaps the following would work better:
arr_ntoh(uint32_t* netp, uint32_t* hostp, int len)
{
for(i=0; i < len; i++)
hostp[i] = ntoh(netp[i]);
}
I'm basing this on the assumption that what you've really got is an array of uint32_t and you want to run ntoh() on all of them.
I hope this is helpful.

Resources