Best way to receive integer array on c socket - c

I need to receive a nested integer array on a socket, e.g.
[[1,2,3],[4,5,6],...]
The subarrays are always 3 values long, the length of the main array varries, but is known in advance.
Searching google has given me a lot of options, from sending each integer seperatly to just casting the buffer to what I think it should be (seems kind of unsafe to me), so I am looking for a safe and fast way to do this.

The "subarrays" don't matter, in the end you're going to be transmitting 3 n numbers and have the receiver interpret them as n rows of 3 numbers each.
For any external representation, you're going to have to pick a precision, i.e. how many bits you should use for each integer. The type int is not well-specified, so perhaps pick 32 bits and treat each number as an int32_t.
As soon as an external integer representation has multiple bytes, you're going to have to worry about the order of those bytes. Traditionally network byte ordering ("big endian") is used, but many systems today observe that most hardware is little-endian so they use that. In that case you can write the entire source array into the socket in one go (assuming of course you use a TCP/IP socket), perhaps prepended by either the number of rows or the total number of integers.

Assuming that bandwidth and data size isn't very critical I would propose, that (de-)serializing the array to a string is a safe and platform/architecture independent way to transfer such an array. This has the following advantages:
No issues with different sizes of the binary representations of integers between the communicating hosts
No issues with differing endiannesses
More flexible if the parameters change (length of the subarrays, etc)
It is easier to debug a text-protocol in contrast to a binary protocol
The drawback is, that more bytes have to be transmitted over the channel as minimal necessary with a good binary encoding.
If you want to go with a ready-to-use library for serializing/deserializing your array, you could take a look at one of the many JSON-libraries available.
http://www.json.org/ provides a list with several implementations.

Serialize it the way you want, two main possibilities:
encode as strings, and fix separators, etc.
encode with NBO, and send data to fix some parameters: first the length of your ints, then the length of the array and then the data; everything properly encoded.
In C, you can use XDR routines to encode properly your data.

Related

C convert char* to network byte order before transfer

I'm working on a project where I must send data to server and this client will run on different os's. I knwo the problem with endians on different machines, so I'm converting 'everything' (almost) to network byte order, using htonl and vice-versa on the other end.
Also, I know that for a single byte, I don't need to convert anything. But, what should I do for char*? ex:
send(sock,(char*)text,sizeof(text));
What's the best approach to solve this? Should I create an 'intermediate function' do intercept this 'send', then really send char-by-char of this char array? If so, do I need to convert every char to network byte order? I think no, since every char is only one byte.
Thinking of this, if I create this 'intermediate functions', I don't have to convert nothing more to network byte order, since this function will send char by char, thus don't need conversion of endians.
I any advice on this.
I am presuming from your question that the application layer protocol (more specifically everything above level 4) is under your design control. For single byte-wide (octet-wide in networking parlance) data there is no issue with endian ordering and you need do nothing special to accommodate that. Now if the character data is prepended with a length specifier that is, say 2 octets, then the ordering of the bytes must be treated consistently.
Going with network byte ordering (big-endian) will certainly fill the bill, but so would consistently using little-endian. Consistency of byte ordering on each end of a connection is the crucial issue.
If the protocol is not under your design control, then the protocol specification should offer guidance on the issue of byte ordering for multi-byte integers and you should follow that.

sprintf or itoa or memcpy for IPC

A process say PA wants to send values of 2 integers to PB by sending it in a char buf after populating it with values. Assume PA and PB are in same machine. PB knows that the buffer it reads contains values of 2 integers.
uint x=1;
uint y=65534;
Case 1
PA writes into char buf as shown
sprintf(buff,"%d%d",x,y);
Q1 - In this case how will PB able to extract their values as 1 and 65534 since it just has an array containing 1,6,5,5,3,4. Is using sprintf the problem?
Case 2
PA use itoa function to populate the value of integers in to buffer.
PB use atoi to extract the values from buffer.
Since itoa puts a null terminator after each value this should be possible.
Q2 - Now consider PA is running on a 32 bit machine with 4 byte int size and PB is running on a 16 bit machine with 2 byte int size. Will only checking for out of range make my code portable?
Q3 - Is memcpy another way of doing this?
Q4 - How is this USUALLY done ?
1) The receiver will read the string values from the network, and do its own conversion; in this case it woud get the string representation of 165,534. You need some way of delimiting the values for the receiver.
2) Checking for out of range is a good start, but portability depends on more factors, such as defining a format for the transfer, be it binary or textual.
3) Wha?
4) It's usually done by deciding on a standard for binary representation of the number, i.e., is it a signed/unsigned 16/32/64 bit value, and then converting it into what's commonly referred to as network byte order[1] on the sending side, and converting it to host byte order on the receiving side.
[1] http://en.wikipedia.org/wiki/Network_byte_order#Endianness_in_networking
I would suggest that you have a look into
As you noticed in Case 1 there is no way to extract the values from the buffer if you don't have additional information. So you need some delimitier character.
In Q2 you mention a 16 Bit machine. Not only the #bytes for an int can be a problem but also the endianess and the sign.
What I would do:
- Define an own protocol for different numbers (you can't send a 4 byte int to the 16 bit machine and use the same type without loosing information)
Or
- Check the int (must fit in 2 bytes) before writing.
I hope this helps.
Q1: Not using sprintf is the problem, but the way of using it. How about:
sprintf(buff,"%d:%d",x,y);
(Note: A comma as seperator could cause problems with international formats)
Q2: No. Other problems, e.g. regarding endianness, could arise
Q3: No if you use different machines. One a single machine, you can (mis)use your buffer as an array of bytes.
Q4: Different ways, e.g. XDR (http://en.wikipedia.org/wiki/External_Data_Representation)
You need a protocol and a transport mechanism.
Transport mechanisms include sockets, named pipes, shared memory, SSL etc.
The protocol could be as simple as space separated strings, as you suggested. It could also be something more "complicated" like an XML-based format. Or binary format.
All these protocol types are in use in various applications. Which protocol to choose depends on your requirements.

Hash a byte string

I'm working on a personal project, a file compression program, and am having trouble with my symbol dictionary. I need to store previously encountered byte strings into a structure in such a way that I can quickly check for their existence and retrieve them. I've been operating under the assumption that a hash table would be best suited for this purpose so my question will be pertaining to hash functions. However, if someone can suggest a better alternative to a hash table, I'm all ears.
All right. So the problem is that I can't come up with a good hashing key for these byte strings. Everything I think of either has a very uneven distribution, or is takes too long. Here is a list of the situation I'm working with:
All byte strings will be at least
two bytes in length.
The hash table will have a maximum size of 3839, and it is very likely it will fill.
Testing has shown that, with any given byte, the highest order bit is significantly less likely to be set, as compared to the lower seven bits.
Otherwise, bytes in the string can be any value from 0 - 255 (I'm working with raw byte-data of any format).
I'm working with the C language in a UNIX environment. I'd prefer to stick with standard libraries, but it doesn't need to be portable to other OSs. (I.E. unistd.h is fine).
Security is of NO concern.
Speed is of a HIGH concern.
The size isn't of intense concern, as it will NOT be written to file. However, considering the potential size of the byte strings being stored, memory space could become an issue during the compression.
A trie is better suited to this kind of thing because it lets you store your symbols as a tree and quickly parse it to match values (or reject them).
And as a bonus, you don't need a hash at all. You're storing/retrieving/comparing the entire sequence at once, while still only holding a minimal amount of memory.
Edit: And as an additional bonus, with only a second parse, you can look up sequences that are "close" to your current sequence, so you can get rid of a sequence and use the previous one for both of them, with some internal notation to hold the differences. That will help you compress files better because:
smaller dictionary means smaller files, you have to write the dictionary to your file
smaller number of items can free up space to hold other, more rare sequences if you add a population cap and you hit it with a large file.

Hash function for short strings

I want to send function names from a weak embedded system to the host computer for debugging purpose. Since the two are connected by RS232, which is short on bandwidth, I don't want to send the function's name literally. There are some 15 chars long function names, and I sometimes want to send those names at a pretty high rate.
The solution I thought about, was to find a hash function which would hash those function names to a single byte, and send this byte only. The host computer would scan all the functions in the source, compute their hash using the same function, and then would translate the hash to the original string.
The hash function must be
Collision free for short strings.
Simple (since I don't want too much code in my embedded system).
Fit a single byte
Obviously, it does not need to be secure by any means, only collision free. So I don't think using cryptography-related hash function is worth their complexity.
An example code:
int myfunc() {
sendToHost(hash("myfunc"));
}
The host would then be able to present me with list of times where the myfunc function was executed.
Is there some known hash function which holds the above conditions?
Edit:
I assume I will use much less than 256 function-names.
I can use more than a single byte, two bytes would have me pretty covered.
I prefer to use a hash function instead of using the same function-to-byte map on the client and the server, because (1) I have no map implementation on the client, and I'm not sure I want to put one for debugging purposes. (2) It requires another tool in my build chain to inject the function-name-table into my embedded system code. Hash is better in this regard, even if that means I'll have a collision once in many while.
Try minimal perfect hashing:
Minimal perfect hashing guarantees that n keys will map to 0..n-1 with no collisions at all.
C code is included.
Hmm with only 256 possible values, since you will parse your source code to know all possible functions, maybe the best way to do it would be to attribute a number to each of your function ???
A real hash function would probably won't work because you have only 256 possible hashes.
but you want to map at least 26^15 possible values (assuming letter-only, case-insensitive function names).
Even if you restricted the number of possible strings (by applying some mandatory formatting) you would be hard pressed to get both meaningful names and a valid hash function.
You could use a Huffman tree to abbreviate your function names according to the frequency they are used in your program. The most common function could be abbreviated to 1 bit, less common ones to 4-5, very rare functions to 10-15 bits etc. A Huffman tree is not very hard to implement but you will have to do something about the bit alignment.
No, there isn't.
You can't make a collision free hash code, or even close to it, with just an eight bit hash. If you allow strings that are longer than one character, you have more possible strings than there are possible hash codes.
Why not just extract the function names and give each function name an id? Then you only need a lookup table on each side of the wire.
(As others have shown you can generate a hash algorithm without collisions if you already have all the function names, but then it's easier to just assign a number to each name to make a lookup table...)
If you have a way to track the functions within your code (i.e. a text file generated at run-time) you can just use the memory locations of each function. Not exactly a byte, but smaller than the entire name and guaranteed to be unique. This has the added benefit of low overhead. All you would need to 'decode' the address is the text file that maps addresses to actual names; this could be sent to the remote location or, as I mentioned, stored on the local machine.
In this case you could just use an enum to identify functions. Declare function IDs in some header file:
typedef enum
{
FUNC_ID_main,
FUNC_ID_myfunc,
FUNC_ID_setled,
FUNC_ID_soundbuzzer
} FUNC_ID_t;
Then in functions:
int myfunc(void)
{
sendFuncIDToHost(FUNC_ID_myfunc);
...
}
If sender and receiver share the same set of function names, they can build identical hashtables from these. You can use the path taken to get to an hash element to communicate this. This can be {starting position+ number of hops} to communicate this. This would take 2 bytes of bandwidth. For a fixed-size table (lineair probing) only the final index is needed to address an entry.
NOTE: when building the two "synchronous" hash tables, the order of insertion is important ;-)
Described here is a simple way of implementing it yourself: http://www.devcodenote.com/2015/04/collision-free-string-hashing.html
Here is a snippet from the post:
It derives its inspiration from the way binary numbers are decoded and converted to decimal number format. Each binary string representation uniquely maps to a number in the decimal format.
if say we have a character set of capital English letters, then the length of the character set is 26 where A could be represented by the number 0, B by the number 1, C by the number 2 and so on till Z by the number 25. Now, whenever we want to map a string of this character set to a unique number , we perform the same conversion as we did in case of the binary format

C sending multiple data types using sendto

In my program I have a few structs and a char array that I want to send as a single entity over UDP.
I am struggling to think of a good way to do this.
My first thought was to create a structure which contains everything I want to send but it would be of the wrong type for using sendto()
How would I store the two structs and a char array in another array so that it will be received in the way I intended?
Thanks
Since C allows you to cast to your heart's content, there's no such thing as a wrong type for sendto(). You simply cast the address of your struct to a void * and pass that as the argument to sendto().
However, a lot of people will impress on you that it's not advisable to send structs this way in the first place:
If the programs on either side of the connection are compiled by different compilers or in different environments, chances are your structs will not have the same packing.
If the two hosts involved in the transfer don't have the same endinanness, part of your data will end up backwards.
If the host architectures differ (e.g. 32 bit vs. 64 bits) then sizes of structs may be off as well. Certainly there will be size discrepancies if the sizes of your basic data types (int, char, long, double, etc.) differ.
So... Please take the advice of the first paragraph only if you're sure your two hosts are identical twins, or close enough to it.
In other cases, consider converting your data to some kind of neutral text representation, which could be XML but doesn't need to be anything that complicated. Strings are sent as a sequence of bytes, and there's much less that can go wrong. Since you control the format, you should be able to parse that stuff with little trouble on the receiving side.
Update
You mention that you're transferring mostly bit fields. That means that your data essentially consists of a bunch of integers, all of them less than (I'm assuming) 32 bits.
My suggestion for a "clean" solution, therefore, would be to write a function to unpack all those bit fields, and to ship the whole works as an array of (perhaps unsigned) integers. Assuming that sizeof(int) is the same across machines, htons() will work successfully on the elements (each individually!) of those arrays, and you can then wrap them back into a structure on the other side.
You can send multiple pieces of data as one with writev. Just create the array of struct iovec that it needs, with one element for each data structure you want to send.

Resources