embedding chars in int and vice versa - c

I have smart card on which I can store bytes (multiple of 16).
If I do: Save(byteArray, length) then I can do Receive(byteArray,length)
and I think I will get byte array in the same order I stored.
Now, I have such issue. I realized if I store integer on this card,
and some other machine (with different endianness) reads it, it may get wrong data.
So, I thought maybe solution is I always store data on this card, in a little
endian way, and always retrieve the data in a little endian way (I will write apps to read and write, so I am free to interpret numbers as I like.). Is this possible?
Here is something I have come up with:
Embed integer in char array:
int x;
unsigned char buffer[250];
buffer[0] = LSB(x);
buffer[1] = LSB(x>>8);
buffer[2] = LSB(x>>16);
buffer[3] = LSB(x>>24);
Important is I think that LSB function should return the least significant byte regardless of the endiannes of the machine, how would such LSB function look like?
Now, to reconstruct the integer (something like this):
int x = buffer[0] | (buffer[1]<<8) | (buffer[2]<<16) | (buffer[3]<<24);
As I said I want this to work, regardless of the endiannes of the machine who reads it and writes it. Will this work?

The 'LSB' function may be implemented via a macro as below:-
#define LSB(x) ((x) & 0xFF)
Provided x is unsigned.

If your C library is posix compliant, then you have standard functions available to do exactly what you are trying to code. ntohl, ntohs, htonl, htons (network to host long, network to host short, ...). That way you don't have to change your code if you want to compile it for a big-endian or for a little-endian architecture. The functions are defined in arpa/inet.h (see http://linux.die.net/man/3/ntohl).

I think the answer for your question is YES, you can write data on a smart card such that it is universally (and correctly) read by readers of both big endian AND little endian orientation. With one big caveat: it would be incumbent on the reader to do the interpretation, not your smart card interpreting the reader, would it not? That is, as you know there are many routines to determine endianess (1, 2, 3). But it is the readers that would have to contain code to test endianess, not your card.
Your code example works, but I am not sure it would be necessary given the nature of the problem as it is presented.
By the way, HERE is a related post.

Related

Writing number as raw binary to file

So I'm trying to write the value of a long as raw binary to a file with fwrite(), here's a test file:
#include<stdio.h>
int main() {
FILE* stream;
stream = fopen("write", "wb");
long number = 0xfff;
fwrite(&number, sizeof(long), 1, stream);
}
When opening the written file in a text editor, this is the output:
ff0f 0000 0000 0000
while I was expecting something like 0000 0000 0000 0fff.
How do I write the desired result correctly as raw binary?
Read about Endianness, which states how a bytes are arranged in a word (or double/quad word, et cetera) in a computer system.
I'm assuming you've coded and compiled this example on a X86 system, which is little-endian, so, the least significant bits COME FIRST. The opposite of that arrangement is called big-endian.
Now, it is clear that your objective in this exercise is to marshall (or pickle, depending on how your prefer your jargon) some bytes to be later retrieved, possibly by another program.
If you develop a program that uses fread() and and reads the data in the same way (using sizeof(long) so you don't read too much data) and in a machine with the same endianness, it will magically work, and the number you expect is gonna be back. But, if you compile and run the "read" tool in a machine with the opposite endianness, reading the same input file, your number will be garbled.
If your objective is to marshall data, you should be better off with a tool to help you marshall your bytes in a way that is endianness-agnostic, that is, a library that helps you get the data in the correct order. There are libraries out there that take care of that for you.
There's no problem. You're seeing it as ff0f 0000 0000 0000 cause of the endian of the machine!
Try using fread() instead!
as other have pointed out in the comments, this is an endianness "issue". That it, it is only an issue if you are going to run your software on systems with an other endianness.
This very useful resource is the first result for the "c endian" google search, at least for me. I hope it helps you.
Edit: I will dig a bit more into the details below.
To determine what is the endianness of the machine you are currently running on, write a known value (for example, 0xAABBCCDD) into memory, then take a pointer to that value and cast it to a char* or other 1-byte data type. Read the same value again, byte by byte, and compare the ordering with what you wrote. If they are the same, you are on a big endian machine. If not... Then you could be on a little (more probable) or middle (less probable) endian machine. You can in either case generate a "swap map" for reordering the bytes; which is the reason why I chose four different byte values in the above example.
But, how to swap those values? An easy way, as indicated here, is to use bit shifts with masks. You can also use tables and swap their values around.

Reading negative values of accelerometer

Im interfacing accelerometer with TivaC and displaying the RAW data on UART.
void main(){
signed int accelerationX;
accelerationX = getAcceleration_X();
if (accelerationX>=0){
UART_OutString("\r\nX Axl: ");
UART_OutUDec((unsigned short) accelerationX);
} else {
UART_OutString("\r\nX Axl: - ");
UART_OutUDec((unsigned short) (accelerationX*-1));
}
}
Such type of code I got on some forum.
I'm not understanding why " accelerationX*-1 " is done when acceleration is negative.
accelerationX is a signed integer, but it would seem that UART_OutUDec expects an unsigned integer. Therefore they have to print a minus sign followed by the absolute value of accelerationX (sign removed).
It's because the number is being sent as an unsigned short instead of a signed quantity. It would be helpful to see what UART_OutUDec is doing, but it also doesn't really matter because a UART will simply send whatever is dropped in its data register. As an aside, UART_OutUDec is most likely translating the unsigned short into ASCII. The receiver is unlikely to understand the value was supposed to be negative, so the minus sign is transmitted with what's effectively the absolute value of the acceleration.
Something to consider is that not all receivers are equal. A lot of people assume the device on the other end is a computer or something that understands ASCII, but that's not always the case. I've worked on embedded systems that transmitted ASCII characters mixed with non-ASCII characters, which is confusing and hard to maintain, but these systems exist. That's almost certainly not applicable to your situation simply because it's rare, but, in the future, if you give additional details about the receiver it will help clarify how the data should be formatted and transmitted.

serialize data into a consecutive array

I have objects that I like to be able to serialize as a consecutive stream of bytes. Two questions:
1) Is an array of char appropriate for this task? If not what are better options?
2) What is the most efficient of doing this? So far what I have done is using memcpy. Is it possible to cast/convert a double, for instance, into 8 bytes of chars and vice versa without going through memcpy?
I'm well aware of external libraries for this but I like to learn new stuff.
Yes, char is a great choice for the task.
memcpy is fine if you are storing your result into a file and reading it again on the same architecture. But if you want to pass it through a socket or open it somewhere else, you have to be more careful. With floating points and integral types, representation and endianess are always an issue.
Don't do a simple memcpy on a float/integer (and avoid even more casting it from a buffer (strict aliasing and UB)).
For floating points, lookup this two functions frexp() and ldexp(). There is a lot of that on the web so there is no point of copying it here.
For integrals, you can do something like this:
buffer[0] = integer >> 24;
buffer[1] = integer >> 16;
buffer[2] = integer >> 8;
buffer[3] = integer;
This guarantees getting the same number back.
Serialization implies that you are taking an object and giving it a represntation that can be used to completely rebuild it using only that representation. Usually, serialization applies to file storage, but it is often used to communicate objects over networks.
So, usually, using char or unsigned char works just fine. The real bear of the problem is ensuring that you are storing everything that object contains. That includes objects that are referenced in the object that you are trying to serialize.
I would start by googling "deep copy." deep copy vs shallow copy
Edit: memcpy is a form of "shallow copy."

Best way to assert compatible architecture when reading and writing a file?

I have a program that reads and writes a binary file. A file is interchangeable between executions of the program on the same platform, but a file produced on one machine may not be valid on another platform due to the sizes of types, endian-ness etc.
I want a quick way to be able to assert that a given file is valid for reading on a given architecture. I am not interested in making a file cross-architecture (in fact the file is memory-mapped structs). I only want a way of checking that the file was created on an architecture with the same size types, etc before reading it.
One idea is to write a struct with constant magic numbers in to the start of the file. This can be read and verified. Another would be to store the sizeof various types in single-byte integers.
This is for C but I suppose the question is language-agnostic for languages with the same kinds of issues.
What's the best way to do this?
I welcome amendments for the title of this question!
I like the magic number at the start of the file idea. You get to make these checks with a magic value:
If there are at least two magic bytes and you treat them as a single multi-byte integer, you can detect endianness changes. For instance, if you choose 0xABCD, and your code reads 0xCDAB, you're on a platform with different endianness than the one where the file was written.
If you use a 4- or 8-byte integer, you can detect 32- vs. 64-bit platforms, if you choose your data type so it's a different size on the two platforms.
If there is more than just an integer or you choose it carefully, you can rule out the possibility of accidentally reading a file written out by another program to a high degree of probability. See /etc/magic on any Unixy type system for a good list of values to avoid.
#include <stdint.h>
union header {
uint8_t a[8];
uint64_t u;
};
const struct header h = { .u = (sizeof( short ) << 0 )
| (sizeof( int ) << 8 )
| (sizeof( long ) << 16 )
| (sizeof( long long ) << 24 )
| (sizeof( float ) << 32 )
| (sizeof( double ) << 40 )
| (sizeof( long double ) << 48 )
| 0 } ;
This should be enough to verify the type sizes and endianness, except that floating point numbers are crazy difficult for this.
If you want to verify that your floating point numbers are stored in the same format on the writer and the reader then you might want to store a couple of constant floating point numbers (more interesting than 0, 1, and -1) in the different sizes after this header, and verify that they are what you think they should be.
It is very likely that storing an actual magic string with version number would also be good as another check that this is the correct file format.
If you don't care about floats or something like that then feel free to delete them. I didn't include char because it is supposed to always be 1 byte.
It might be a good idea if you also store the sizeof some struct like:
struct misalligned {
char c;
uint64_t u;
};
This should allow you to easily determine the alignment and padding of the compiler that generated the code that generated the file. If this were done on most 32 bit computers that care about alignment the size would be 96 because there would be 3 bytes of padding between c and u, but if it were done on a 64 bit machine then the sizeof it may be 128, having 7 bytes of padding between c and u. If this were done on an AVR the sizeof this would most likely be 9 because there would be no padding.
NOTE
this answer relied on the question stating that the files were being memory mapped and no need for portability beyond recognizing that a file was the wrong format. If the question were about general file storage and retrivial I would have answered differently. The biggest difference would be packing the data structures.
Call the uname(2) function (or equivalent on non-POSIX platforms) and write the sysname and machine fields from the struct utsname into a header at the start of the file.
(There's more to it than just sizes and endianness - there's also floating point formats and structure padding standards that vary too. So it's really the machine ABI that you want to assert is the same).
First, I fully agree with the previous answer provided by Warren Young.
This is a meta-data case we're talking about.
On a filesystem and homogeneous content, I'd prefer having one padded (to the size of a structure) meta-data at the beginning of the binary file. This allow to preserve data structure alignment and simplify append-writing.
If heterogeneous, I'd prefer using Structure-Value or Structure-Length-Value (also known as Type Length Value) in front of each data or range of data.
On a stream with random joining, you may wish to have some kind of structure sync with something like HDLC (on Wikipedia) and meta-data repetition during the constant flow of binary data. If you're familiar with audio/video format, you may think of TAGs inside a data flow which is intrinsically composed of frames.
Nice subject !

Parsing Binary Data in C?

Are there any libraries or guides for how to read and parse binary data in C?
I am looking at some functionality that will receive TCP packets on a network socket and then parse that binary data according to a specification, turning the information into a more useable form by the code.
Are there any libraries out there that do this, or even a primer on performing this type of thing?
I have to disagree with many of the responses here. I strongly suggest you avoid the temptation to cast a struct onto the incoming data. It seems compelling and might even work on your current target, but if the code is ever ported to another target/environment/compiler, you'll run into trouble. A few reasons:
Endianness: The architecture you're using right now might be big-endian, but your next target might be little-endian. Or vice-versa. You can overcome this with macros (ntoh and hton, for example), but it's extra work and you have make sure you call those macros every time you reference the field.
Alignment: The architecture you're using might be capable of loading a mutli-byte word at an odd-addressed offset, but many architectures cannot. If a 4-byte word straddles a 4-byte alignment boundary, the load may pull garbage. Even if the protocol itself doesn't have misaligned words, sometimes the byte stream itself is misaligned. (For example, although the IP header definition puts all 4-byte words on 4-byte boundaries, often the ethernet header pushes the IP header itself onto a 2-byte boundary.)
Padding: Your compiler might choose to pack your struct tightly with no padding, or it might insert padding to deal with the target's alignment constraints. I've seen this change between two versions of the same compiler. You could use #pragmas to force the issue, but #pragmas are, of course, compiler-specific.
Bit Ordering: The ordering of bits inside C bitfields is compiler-specific. Plus, the bits are hard to "get at" for your runtime code. Every time you reference a bitfield inside a struct, the compiler has to use a set of mask/shift operations. Of course, you're going to have to do that masking/shifting at some point, but best not to do it at every reference if speed is a concern. (If space is the overriding concern, then use bitfields, but tread carefully.)
All this is not to say "don't use structs." My favorite approach is to declare a friendly native-endian struct of all the relevant protocol data without any bitfields and without concern for the issues, then write a set of symmetric pack/parse routines that use the struct as a go-between.
typedef struct _MyProtocolData
{
Bool myBitA; // Using a "Bool" type wastes a lot of space, but it's fast.
Bool myBitB;
Word32 myWord; // You have a list of base types like Word32, right?
} MyProtocolData;
Void myProtocolParse(const Byte *pProtocol, MyProtocolData *pData)
{
// Somewhere, your code has to pick out the bits. Best to just do it one place.
pData->myBitA = *(pProtocol + MY_BITS_OFFSET) & MY_BIT_A_MASK >> MY_BIT_A_SHIFT;
pData->myBitB = *(pProtocol + MY_BITS_OFFSET) & MY_BIT_B_MASK >> MY_BIT_B_SHIFT;
// Endianness and Alignment issues go away when you fetch byte-at-a-time.
// Here, I'm assuming the protocol is big-endian.
// You could also write a library of "word fetchers" for different sizes and endiannesses.
pData->myWord = *(pProtocol + MY_WORD_OFFSET + 0) << 24;
pData->myWord += *(pProtocol + MY_WORD_OFFSET + 1) << 16;
pData->myWord += *(pProtocol + MY_WORD_OFFSET + 2) << 8;
pData->myWord += *(pProtocol + MY_WORD_OFFSET + 3);
// You could return something useful, like the end of the protocol or an error code.
}
Void myProtocolPack(const MyProtocolData *pData, Byte *pProtocol)
{
// Exercise for the reader! :)
}
Now, the rest of your code just manipulates data inside the friendly, fast struct objects and only calls the pack/parse when you have to interface with a byte stream. There's no need for ntoh or hton, and no bitfields to slow down your code.
The standard way to do this in C/C++ is really casting to structs as 'gwaredd' suggested
It is not as unsafe as one would think. You first cast to the struct that you expected, as in his/her example, then you test that struct for validity. You have to test for max/min values, termination sequences, etc.
What ever platform you are on you must read Unix Network Programming, Volume 1: The Sockets Networking API. Buy it, borrow it, steal it ( the victim will understand, it's like stealing food or something... ), but do read it.
After reading the Stevens, most of this will make a lot more sense.
Let me restate your question to see if I understood properly. You are
looking for software that will take a formal description of a packet
and then will produce a "decoder" to parse such packets?
If so, the reference in that field is PADS. A good article
introducing it is PADS: A Domain-Specific Language for Processing Ad
Hoc Data. PADS is very complete but unfortunately under a non-free licence.
There are possible alternatives (I did not mention non-C
solutions). Apparently, none can be regarded as completely production-ready:
binpac
PacketTypes
DataScript
If you read French, I summarized these issues in Génération de
décodeurs de formats binaires.
In my experience, the best way is to first write a set of primitives, to read/write a single value of some type from a binary buffer. This gives you high visibility, and a very simple way to handle any endianness-issues: just make the functions do it right.
Then, you can for instance define structs for each of your protocol messages, and write pack/unpack (some people call them serialize/deserialize) functions for each.
As a base case, a primitive to extract a single 8-bit integer could look like this (assuming an 8-bit char on the host machine, you could add a layer of custom types to ensure that too, if needed):
const void * read_uint8(const void *buffer, unsigned char *value)
{
const unsigned char *vptr = buffer;
*value = *buffer++;
return buffer;
}
Here, I chose to return the value by reference, and return an updated pointer. This is a matter of taste, you could of course return the value and update the pointer by reference. It is a crucial part of the design that the read-function updates the pointer, to make these chainable.
Now, we can write a similar function to read a 16-bit unsigned quantity:
const void * read_uint16(const void *buffer, unsigned short *value)
{
unsigned char lo, hi;
buffer = read_uint8(buffer, &hi);
buffer = read_uint8(buffer, &lo);
*value = (hi << 8) | lo;
return buffer;
}
Here I assumed incoming data is big-endian, this is common in networking protocols (mainly for historical reasons). You could of course get clever and do some pointer arithmetic and remove the need for a temporary, but I find this way makes it clearer and easier to understand. Having maximal transparency in this kind of primitive can be a good thing when debugging.
The next step would be to start defining your protocol-specific messages, and write read/write primitives to match. At that level, think about code generation; if your protocol is described in some general, machine-readable format, you can generate the read/write functions from that, which saves a lot of grief. This is harder if the protocol format is clever enough, but often doable and highly recommended.
You might be interested in Google Protocol Buffers, which is basically a serialization framework. It's primarily for C++/Java/Python (those are the languages supported by Google) but there are ongoing efforts to port it to other languages, including C. (I haven't used the C port at all, but I'm responsible for one of the C# ports.)
You don't really need to parse binary data in C, just cast some pointer to whatever you think it should be.
struct SomeDataFormat
{
....
}
SomeDataFormat* pParsedData = (SomeDataFormat*) pBuffer;
Just be wary of endian issues, type sizes, reading off the end of buffers, etc etc
Parsing/formatting binary structures is one of the very few things that is easier to do in C than in higher-level/managed languages. You simply define a struct that corresponds to the format you want to handle and the struct is the parser/formatter. This works because a struct in C represents a precise memory layout (which is, of course, already binary). See also kervin's and gwaredd's replies.
I'm not really understand what kind of library you are looking for ? Generic library that will take any binary input and will parse it to unknown format?
I'm not sure there is such library can ever exist in any language.
I think you need elaborate your question a little bit.
Edit:
Ok, so after reading Jon's answer seems there is a library, well kind of library it's more like code generation tool. But as many stated just casting the data to the appropriate data structure, with appropriate carefulness i.e using packed structures and taking care of endian issues you are good. Using such tool with C it's just an overkill.
Basically suggestions about casting to struct work but please be aware that numbers can be represented differently on different architectures.
To deal with endian issues network byte order was introduced - common practice is to convert numbers from host byte order to network byte order before sending the data and to convert back to host order on receipt. See functions htonl, htons, ntohl and ntohs.
And really consider kervin's advice - read UNP. You won't regret it!

Resources