sprintf or itoa or memcpy for IPC - c

A process say PA wants to send values of 2 integers to PB by sending it in a char buf after populating it with values. Assume PA and PB are in same machine. PB knows that the buffer it reads contains values of 2 integers.
uint x=1;
uint y=65534;
Case 1
PA writes into char buf as shown
sprintf(buff,"%d%d",x,y);
Q1 - In this case how will PB able to extract their values as 1 and 65534 since it just has an array containing 1,6,5,5,3,4. Is using sprintf the problem?
Case 2
PA use itoa function to populate the value of integers in to buffer.
PB use atoi to extract the values from buffer.
Since itoa puts a null terminator after each value this should be possible.
Q2 - Now consider PA is running on a 32 bit machine with 4 byte int size and PB is running on a 16 bit machine with 2 byte int size. Will only checking for out of range make my code portable?
Q3 - Is memcpy another way of doing this?
Q4 - How is this USUALLY done ?

1) The receiver will read the string values from the network, and do its own conversion; in this case it woud get the string representation of 165,534. You need some way of delimiting the values for the receiver.
2) Checking for out of range is a good start, but portability depends on more factors, such as defining a format for the transfer, be it binary or textual.
3) Wha?
4) It's usually done by deciding on a standard for binary representation of the number, i.e., is it a signed/unsigned 16/32/64 bit value, and then converting it into what's commonly referred to as network byte order[1] on the sending side, and converting it to host byte order on the receiving side.
[1] http://en.wikipedia.org/wiki/Network_byte_order#Endianness_in_networking

I would suggest that you have a look into
As you noticed in Case 1 there is no way to extract the values from the buffer if you don't have additional information. So you need some delimitier character.
In Q2 you mention a 16 Bit machine. Not only the #bytes for an int can be a problem but also the endianess and the sign.
What I would do:
- Define an own protocol for different numbers (you can't send a 4 byte int to the 16 bit machine and use the same type without loosing information)
Or
- Check the int (must fit in 2 bytes) before writing.
I hope this helps.

Q1: Not using sprintf is the problem, but the way of using it. How about:
sprintf(buff,"%d:%d",x,y);
(Note: A comma as seperator could cause problems with international formats)
Q2: No. Other problems, e.g. regarding endianness, could arise
Q3: No if you use different machines. One a single machine, you can (mis)use your buffer as an array of bytes.
Q4: Different ways, e.g. XDR (http://en.wikipedia.org/wiki/External_Data_Representation)

You need a protocol and a transport mechanism.
Transport mechanisms include sockets, named pipes, shared memory, SSL etc.
The protocol could be as simple as space separated strings, as you suggested. It could also be something more "complicated" like an XML-based format. Or binary format.
All these protocol types are in use in various applications. Which protocol to choose depends on your requirements.

Related

How to encode the length of the string at the front of the string

Imaging I want to design a protocol, so I want to send a packet from the client side to the server side, I need to encode my data.
I have a string, I want to add the length of the string at the front of the string, for example:
string: "my"
which length is 2
So what I expect is to create a char[] in c and store | 2 | my | in the buffer
In this way, after the server receives the packet, it will know how many bytes need to be read for this request. (by using C programming)
I tried to do it but I don't know how to control the empty between the length and the string, I can create a buffer which size is 10, and use sprintf() to convert the length of the string and add it into the buffer.
One poor way to do it is to encode the length in ASCII at the front the string - the down side is you’ll need variable char elements to store the length if you ever want to send anything longer than 9 chars.
A better way to encode the strings length, since you are designing your own protocol, is to allocate a fixed number of bytes at the beginning, say 8 bytes, and cast &char[0] as a pointer to an uint64_t Basically, use array[0~7] to store an 8byte unsigned long. Align the address w.r.t. 8byte boundary for (slightly) better performance.
If the sender and receiver machine have different endianness, you’ll also have to include a multi-byte long “magic number” at the head of the char array. This is necessary for both sides to correctly recover the string length from the multi-byte-long length field.
There are two standards used in C:
str*: char * which is terminated with a '\0'.
mem*, read/write: void * plus a length size_t. It's the same idea for readv() and writev() but here the two variables is bundled into an array of struct iovec. Note that sizeof(size_t) may differ between sender and render.
If you use anything else it's automatically a learning curve for whoever needs to read or interact with your code. I wouldn't do that trade-off, but you do you.
You can, of course, encode the length into the char * but now you have to think about how you encode it (big vs little endian), fixed vs variable size.
You might be interested in SDS which hides the length. This way only have to reimplement the functions that change the length of the string instead of all string functions. Use an existing library.

Send an integer in two byte

I am going to transfer an integer number in two byte. As you know by using sprint function it splits your number into ASCII and sends them(for me by Ethernet connection). For Example:
sprintf(Buffer,"A%02ldB",Virtual);
My Virtual number is between 0 to 3600. By using sprintf function it sends ASCII codes(3600 converts to 4 ASCII bytes). However by converting 3600 to binary form we can see that we can squeeze it into a 12bit (or two bytes which bits between 13-16 are unused). But I can't send binary code again because sprintf sends ASCII codes for each 1 and 0. if I can convert my Virtual variable into two bytes I can increase my bytes transportation. So how can I convert a variable to two byte and them them by sprintf function?
sprintf() doesn't do this, partly because it's specifically meant to work with strings (not binary data), and partly because there's nothing to convert – an integer variable in C already has bytes representing the number.
For example, if you have the variable declared as a short int or as uint16_t, that'll always hold your number in exactly two bytes, with &Virtual indicating the memory address where those bytes are kept, and I believe you can directly memcpy() the data into your network buffer. (Note: some types have a fixed size, others vary between architectures; be careful with those.)
The only thing you really should do before sending it is ensure that the bytes are in the correct order. That's another thing that varies between CPU architectures, so use htons()/htonl() to get an integer suitable for sending over the network, and upon receiving use ntohs()/ntohl() to convert it back to native CPU format.

C convert char* to network byte order before transfer

I'm working on a project where I must send data to server and this client will run on different os's. I knwo the problem with endians on different machines, so I'm converting 'everything' (almost) to network byte order, using htonl and vice-versa on the other end.
Also, I know that for a single byte, I don't need to convert anything. But, what should I do for char*? ex:
send(sock,(char*)text,sizeof(text));
What's the best approach to solve this? Should I create an 'intermediate function' do intercept this 'send', then really send char-by-char of this char array? If so, do I need to convert every char to network byte order? I think no, since every char is only one byte.
Thinking of this, if I create this 'intermediate functions', I don't have to convert nothing more to network byte order, since this function will send char by char, thus don't need conversion of endians.
I any advice on this.
I am presuming from your question that the application layer protocol (more specifically everything above level 4) is under your design control. For single byte-wide (octet-wide in networking parlance) data there is no issue with endian ordering and you need do nothing special to accommodate that. Now if the character data is prepended with a length specifier that is, say 2 octets, then the ordering of the bytes must be treated consistently.
Going with network byte ordering (big-endian) will certainly fill the bill, but so would consistently using little-endian. Consistency of byte ordering on each end of a connection is the crucial issue.
If the protocol is not under your design control, then the protocol specification should offer guidance on the issue of byte ordering for multi-byte integers and you should follow that.

Best way to receive integer array on c socket

I need to receive a nested integer array on a socket, e.g.
[[1,2,3],[4,5,6],...]
The subarrays are always 3 values long, the length of the main array varries, but is known in advance.
Searching google has given me a lot of options, from sending each integer seperatly to just casting the buffer to what I think it should be (seems kind of unsafe to me), so I am looking for a safe and fast way to do this.
The "subarrays" don't matter, in the end you're going to be transmitting 3 n numbers and have the receiver interpret them as n rows of 3 numbers each.
For any external representation, you're going to have to pick a precision, i.e. how many bits you should use for each integer. The type int is not well-specified, so perhaps pick 32 bits and treat each number as an int32_t.
As soon as an external integer representation has multiple bytes, you're going to have to worry about the order of those bytes. Traditionally network byte ordering ("big endian") is used, but many systems today observe that most hardware is little-endian so they use that. In that case you can write the entire source array into the socket in one go (assuming of course you use a TCP/IP socket), perhaps prepended by either the number of rows or the total number of integers.
Assuming that bandwidth and data size isn't very critical I would propose, that (de-)serializing the array to a string is a safe and platform/architecture independent way to transfer such an array. This has the following advantages:
No issues with different sizes of the binary representations of integers between the communicating hosts
No issues with differing endiannesses
More flexible if the parameters change (length of the subarrays, etc)
It is easier to debug a text-protocol in contrast to a binary protocol
The drawback is, that more bytes have to be transmitted over the channel as minimal necessary with a good binary encoding.
If you want to go with a ready-to-use library for serializing/deserializing your array, you could take a look at one of the many JSON-libraries available.
http://www.json.org/ provides a list with several implementations.
Serialize it the way you want, two main possibilities:
encode as strings, and fix separators, etc.
encode with NBO, and send data to fix some parameters: first the length of your ints, then the length of the array and then the data; everything properly encoded.
In C, you can use XDR routines to encode properly your data.

Best way to assert compatible architecture when reading and writing a file?

I have a program that reads and writes a binary file. A file is interchangeable between executions of the program on the same platform, but a file produced on one machine may not be valid on another platform due to the sizes of types, endian-ness etc.
I want a quick way to be able to assert that a given file is valid for reading on a given architecture. I am not interested in making a file cross-architecture (in fact the file is memory-mapped structs). I only want a way of checking that the file was created on an architecture with the same size types, etc before reading it.
One idea is to write a struct with constant magic numbers in to the start of the file. This can be read and verified. Another would be to store the sizeof various types in single-byte integers.
This is for C but I suppose the question is language-agnostic for languages with the same kinds of issues.
What's the best way to do this?
I welcome amendments for the title of this question!
I like the magic number at the start of the file idea. You get to make these checks with a magic value:
If there are at least two magic bytes and you treat them as a single multi-byte integer, you can detect endianness changes. For instance, if you choose 0xABCD, and your code reads 0xCDAB, you're on a platform with different endianness than the one where the file was written.
If you use a 4- or 8-byte integer, you can detect 32- vs. 64-bit platforms, if you choose your data type so it's a different size on the two platforms.
If there is more than just an integer or you choose it carefully, you can rule out the possibility of accidentally reading a file written out by another program to a high degree of probability. See /etc/magic on any Unixy type system for a good list of values to avoid.
#include <stdint.h>
union header {
uint8_t a[8];
uint64_t u;
};
const struct header h = { .u = (sizeof( short ) << 0 )
| (sizeof( int ) << 8 )
| (sizeof( long ) << 16 )
| (sizeof( long long ) << 24 )
| (sizeof( float ) << 32 )
| (sizeof( double ) << 40 )
| (sizeof( long double ) << 48 )
| 0 } ;
This should be enough to verify the type sizes and endianness, except that floating point numbers are crazy difficult for this.
If you want to verify that your floating point numbers are stored in the same format on the writer and the reader then you might want to store a couple of constant floating point numbers (more interesting than 0, 1, and -1) in the different sizes after this header, and verify that they are what you think they should be.
It is very likely that storing an actual magic string with version number would also be good as another check that this is the correct file format.
If you don't care about floats or something like that then feel free to delete them. I didn't include char because it is supposed to always be 1 byte.
It might be a good idea if you also store the sizeof some struct like:
struct misalligned {
char c;
uint64_t u;
};
This should allow you to easily determine the alignment and padding of the compiler that generated the code that generated the file. If this were done on most 32 bit computers that care about alignment the size would be 96 because there would be 3 bytes of padding between c and u, but if it were done on a 64 bit machine then the sizeof it may be 128, having 7 bytes of padding between c and u. If this were done on an AVR the sizeof this would most likely be 9 because there would be no padding.
NOTE
this answer relied on the question stating that the files were being memory mapped and no need for portability beyond recognizing that a file was the wrong format. If the question were about general file storage and retrivial I would have answered differently. The biggest difference would be packing the data structures.
Call the uname(2) function (or equivalent on non-POSIX platforms) and write the sysname and machine fields from the struct utsname into a header at the start of the file.
(There's more to it than just sizes and endianness - there's also floating point formats and structure padding standards that vary too. So it's really the machine ABI that you want to assert is the same).
First, I fully agree with the previous answer provided by Warren Young.
This is a meta-data case we're talking about.
On a filesystem and homogeneous content, I'd prefer having one padded (to the size of a structure) meta-data at the beginning of the binary file. This allow to preserve data structure alignment and simplify append-writing.
If heterogeneous, I'd prefer using Structure-Value or Structure-Length-Value (also known as Type Length Value) in front of each data or range of data.
On a stream with random joining, you may wish to have some kind of structure sync with something like HDLC (on Wikipedia) and meta-data repetition during the constant flow of binary data. If you're familiar with audio/video format, you may think of TAGs inside a data flow which is intrinsically composed of frames.
Nice subject !

Resources