endianess and integer variable - c

In c I am little worried with the concept of endianess. Suppose I declare an integer (2 bytes) on a little endian machine as
int a = 1;
and it is stored as:
Address value
1000 1
1001 0
On big endian it should be stored as vice-versa. Now if I do &a then I should get 1000 on on both machines.
If this is true then if I store int a=1 then I should get a=1 on little endian whereas 2^15 on big endian. Is this correct?

It should not matter to you how the data is represented as long as you don't transfer it between platforms (or access it through assembly code).
If you only use standard C - it's hidden from you and you shouldn't bother yourself with it. If you pass data around between unknown machines (for example you have a client and a server application that communicate through some network) - use hton and ntoh functions that convert from local to network and from network to local endianess, to avoid problems.
If you access memory directly, then the problem would not only be endian but also packing, so be careful with it.

In both little endian and big endian, the address of the "first" byte is returned. Here by "first" I mean 1000 in your example and not "first" in the sense of most-significant or least significant byte.

Now if I do &a then I should get 1000 on little endian and 1001 on big endian.
No; on both machines you will get the address 1000.
If this is true then if I store int a=1 then I should get a=1 on little endian whereas 2^15 on big endian. Is this correct?
No; because the preliminary condition is false. Also, if you assign 1 to a variable and then read the value back again, you get the result 1 back. Anything else would be a major cause of confusion. (There are occasions when this (reading back the stored value) is not guaranteed - for example, if multiple threads could be modifying the variable, or if it is an I/O register on an embedded system. However, you'd be aware of these if you needed to know about it.)
For the most part, you do not need to worry about endianness. There are a few places where you do. The socket networking functions are one such; writing binary data to a file which will be transferred between machines of different endianness is another. Pretty much all the rest of the time, you don't have to worry about it.

This is correct, but it's not only an issue in c, but any program that reads and writes binary data. If the data stays on a single computer, then it will be fine.
Also, if you are reading/writing from text files this won't be an issue.

Related

big-endian && little -endian?

Can any one tell me what this statement means:
"Specify the endianness of the object files. This only affects disassembly. This can be useful when disassembling a file format which does not describe endianness information, such as S-records. "
This article explains really well what endianness is and how to program in an endian independent way.
Different environments store numbers in different ways,
Big endian environments store information with the most significant byte first.
Little endian environments store information with the least significant byte first.
Nowadays most high level frameworks take care of all this for you, however if your programming at a lower level then this will be important.
Have a peek at the wikipedia entry for it, its really not as bad as some others
I don't understand all the question, but endianess is used to describe which way round numbers are stored.
For example, the number 256 is too big for 1 byte so it is represented in 2 bytes as a 1 in one of the bytes and a 0 in the other, representing 1 * 256 plus 0 * units.
The way round these bytes are stored is endianess

How do I work with bit data in C

In class I've been tasked with writing a C program that decompresses a text file and prints out the characters it contains. Each character in the file is represented by 2 bits (4 possible characters).
I've recently been informed that a byte is not necessarily 8 bits on all systems, and a char is not necessarily 1 byte. This then makes me wonder how on earth I'm supposed to know how many bits got loaded from a file when I loaded 1 byte. Also how am I supposed to keep the loaded data in memory when there are no data types that can guarantee a set amount of bits.
How do I work with bit data in C?
A byte is not necessarily 8 bits. That much is certainly true. A char, on the other hand, is defined to be a byte - C does not differentiate between the two things.
However, the systems you will write for will almost certainly have 8-bit bytes. Bytes of different sizes are essentially non-existant outside of really, really old systems, or certain embedded systems.
If you have to write your code to work for multiple platforms, and one or more of those have differently sized chars, then you write code specifically to handle that platform - using e.g. CHAR_BIT to determine how many bits each byte contains.
Given that this is for a class, assume 8-bit bytes, unless told otherwise. The point is not going to be extreme platform independence, the point is to teach you something about bit fiddling (or possibly bit fields, but that depends on what you've covered in class).
This then makes me wonder how on earth I'm supposed to know how many
bits got loaded from a file when I loaded 1 byte.
You'll be hard pressed to find a platform where a byte is not 8 bits. (though as noted above CHAR_BIT can be used to verify that). Also clarify the portability requirements with your instructor or state your assumptions.
Usually bits are extracted using shifts and bitwise operations, e.g. (x & 3) are the rightmost 2 bits of x. ((x>>2) & 3) are the next two bits. Pick the right data type for the platforms you are targettiing or as others say use something like uint8_t if available for your compiler.
Also see:
Type to use to represent a byte in ANSI (C89/90) C?
I would recommend not using bit fields. Also see here:
When is it worthwhile to use bit fields?
You can use bit fields in C. These indices explicitly let you specify the number of bits in each part of the field, if you are truly concerned about width. This page gives a discussion: http://msdn.microsoft.com/en-us/library/yszfawxh(v=vs.80).aspx
As an example, check out the ieee754.h for usage in the context of implementing IEEE754 floats

Simple bitwise manipulation for little-endian integer, in big-endian machine?

For a specific need I am building a four byte integer out of four one byte chars, using nothing too special (on my little endian platform):
return (( v1 << 24) | (v2 << 16) | (v3 << 8) | v4);
I am aware that an integer stored in a big endian machine would look like AB BC CD DE instead of DE CD BC AB of little endianness, although would it affect the my operation completely in that I will be shifting incorrectly, or will it just cause a correct result that is stored in reverse and needs to be reversed?
I was wondering whether to create a second version of this function to do (yet unknown) bit manipulation for a big-endian machine, or possibly to use ntonl related function which I am unclear of how that would know if my number is in correct order or not.
What would be your suggestion to ensure compatibility, keeping in mind I do need to form integers in this manner?
As long as you are working at the value level, there will be absolutely no difference in the results you obtain regardless of whether your machine is little-endian or big-endian. I.e. as long as you are using language-level operators (like | and << in your example), you will get exactly the same arithmetical result from the above expression on any platform. The endianness of the machine is not detectable and not visible at this level.
The only situations when you need to care about endianness is when the data you are working with is examined at the object representation level, i.e. in situations when its raw memory representation is important. What you said above about "AB BC CD DE instead of DE CD BC AB" is specifically about the raw memory layout of the data. That's what functions like ntonl do: they convert one memory layout to another memory layout. So far you gave no indication that the actual raw memory layout is in any way important to you. Is it?
Again, if you only care about the value of the above expression, it is fully and totally endianness-independent. Basically, you are not supposed to care about endianness at all when you write C programs that don't attempt to access and examine the raw memory contents.
although would it affect the my operation completely in that I will be shifting incorrectly (?)
No.
The result will be the same regardless of the endian architecture. Bit shifting and twiddling are just like regular arithmetic operations. Is 2 + 2 the same on little endian and big endian architectures? Of course. 2 << 2 would be the same as well.
Little and big endian problems arise when you are dealing directly with the memory. You will run into problems when you do the following:
char bytes[] = {1, 0, 0, 0};
int n = *(int*)bytes;
On little endian machines, n will equal 0x00000001. On big endian machines, n will equal 0x01000000. This is when you will have to swap the bytes around.
[Rewritten for clarity]
ntohl (and ntohs, etc.) is used primarily for moving data from one machine to another. If you're simply manipulating data on one machine, then it's perfectly fine to do bit-shifting without any further ceremony -- bit-shifting (at least in C and C++) is defined in terms of multiplying/dividing by powers of 2, so it works the same whether the machine is big-endian or little-endian.
When/if you need to (at least potentially) move data from one machine to another, it's typically sensible to use htonl before you send it, and ntohl when you receive it. This may be entirely nops (in the case of BE to BE), two identical transformations that cancel each other out (LE to LE), or actually result in swapping bytes around (LE to BE or vice versa).
FWIW, I think a lot of what has been said here is correct. However, if the programmer has coded with endianness in mind, say using masks for bitwise inspection and manipulation, then cross-platform results could be unexpected.
You can determine 'endianness' at runtime as follows:
#define LITTLE_ENDIAN 0
#define BIG_ENDIAN 1
int endian() {
int i = 1;
char *p = (char *)&i;
if (p[0] == 1)
return LITTLE_ENDIAN;
else
return BIG_ENDIAN;
}
... and proceed accordingly.
I borrowed the code snippet from here: http://www.ibm.com/developerworks/aix/library/au-endianc/index.html?ca=drs- where there is also an excellent discussion of these issues.
hth -
Perry

Endian dependent code in real application?

I know the following C code is endian-dependent:
short s_endian = 0x4142;
char c_endian = *(char *)&s_endian;
On a big-endian machine, c_endian will be 'A'(0x41); while on a little-endian machine, it will be 'B'(0x42).
But this code seems kind of ugly. So is there endian dependent code in real applications? Or have you came across any application that needs a lot of changes when porting to a different target with a different endian?
Thanks.
Pretty much any code that deals with saving integers with more than 8 bits in binary format, or sends such integers over the network. For one extremely common example, many of the fields in the TCP header fall into this category.
Networking code is endian dependent (it should always transfer across the network as big-endian, even on a little-endian machine), hence the need for functions like htons(), htonl(), ntohs(), and ntohl() in net/hton.h that allow easy conversions from host-to-network byte-order and network-to-host byte-order.
Hope this helps,
Jason
I once collected data using a specialized DAQ card on a PC, and tried to analyze the file on a PowerPC mac. Turns out the "file format" the thing used was a raw memory dump...
Little endian on x86, big endian on Power PC. You figure it out.
The short answer is yes. Anything that reads/writes raw binary to a file or socket needs to keep track of the endianness of the data.
For example, the IP protocol requires big-endian representation.
When manipulating the internal representation of floating-point numbers, you could access the parts (or the full value) using an integer type. For example:
union float_u
{
float f;
unsigned short v[2];
};
int get_sign(float f)
{
union float_u u;
u.f = f;
return (u.v[0] & 0x8000) != 0; // Endian-dependant
}
If your program sends data to another system (either over a serial or network link, or by saving it to a file for something else to read) or reads data from another system, then you can have endianness issues.
I don't know that static analysis would be able to detect such constructs, but having your programmers follow a coding standard, where structure elements and variables were marked up to indicate their endianness could help.
For example, if all network data structures had _be appended to the named of multi-byte members, you could look for instances where you assigned a non-suffixed (host byte order) variable or even a literal value (like 0x1234) to one of those members.
It would be great if we could capture endianness in our datatypes -- uint32_be and uint32_le to go with uint32_t. Then the compiler could disallow assignments or operations between the two. And the signature for htobe32 would be uint32_be htobe32( uint32_t n);.

retrieve cobol s9(2) COMP onto C variable

I need to retrieve data from a COBOL variable of the type: "PIC S9(2) COMP" onto a C variable of the type "int".
It's stored using two bytes of a string, so I receive it as a couple of chars.
I know COBOL stores decimal data onto a "S9(2) COMP" in binary format, so It would be a great help letting me know any algorithm or way to convert it safely.
Any kind of help & suggestion will be welcome.
Update:
Finally we decided to change the picture of the variable to 9(3) in the COBOL part of the implementation, because of the endianess problem.
Thanks to all of you for the answers.
You should be able to treat that as a short, or a 16-bit twos complement integer. You will need to check for endianess though depending upon which platform originated the field.
The format of s9(2) comp will depend on the Cobol Compiler. Most Cobol compilers I know will store it a 2 byte big-endian integer (high byte first; Intel processors store low byte first). Exceptions include
MicroFocus (depends on compile parameters) - normally 1 byte integer
RM-Cobol : Has its own "Binary Format".
If it is a 2 byte integer
On Big-Endian machine (IBM Mainframe, Power etc) it should be a 2-Byte integer
On Little-Endian (Intel) you need to swap the bytes around.
The S9(2) COMP indicates it is a left-aligned (bit 0 at the left of word) not explicity synchronised signed numeric field, probably in an internal or pseudo-binary format, that can hold 2 digits and possibly a bit in the word (or in a different memory location) is used for indicating if the value is positive or negative. A cobol program can have a specific method for storing a COMP data item which may not be directly compatible with C. It appears that you need to access and test the sign indicator in a relevant way to check it and you may have to access each of the 2 bytes (2 of 8 bits) or characters (2 of 6 bits), check if they are big-endian or little-endian and put them into a signed integer field in C. A lot will depend on the architecture and compilers of the computer involved in creating and reading in the data field and can be more complicated if 2 computer types are involved.

Resources