Socket data length questions - c

I have a couple of questions related to the following code:
char buffer[256];
memset(buffer,0,256);
read(socket_fd,buffer,255);
The questions:
Why I read 255 not 256 ?
Let's say I want to send the word: "Cool" from the client to the server. How many bytes should I write "in client" and how many bytes should i read "in the server"?
I'm really confused.

You already have good answers here, but I think there's a concept we should explain.
When you send data through streams (that is, something that writes a number of bytes from one end, and those bytes can be read in the same order in the other end), you almost always want to know when to stop reading. This is mandatory if you'll send more than one thing: when does the first message stop, and the second begin? In the stream, things get mixed up.
So, how do we delimit messages? There are three simple ways (and many other not so simple ones, of course):
1 Fixed-length messages:
If you know beforehand that every message is, say, 10-bytes long, then you don't have a problem. You just read 10 bytes, and the 11th one will be part of another message. This is very simple, but also very rigid.
2 Delimiting characters, or strings:
If you are sending human-readable text, you might delimit your messages the same way you delimit strings in your char*: putting a 0 character at the end. That way, when you read a 0, you know the message ended and any remaining data in the stream belongs to another message.
This is okay for ascii text, but when it comes to arbitrary data it's also somewhat rigid: there's a character, or a sequence of characters, that your messages can't contain (or your program will get confused as to where a message ends).
3 Message headers:
This is the best approach for arbitrary length, arbitrary content messages. Before sending any actual message data, send a fixed-length header (or use technique nr 2 to mark the end of the header), specifying metadata about your message. For example, it's length.
Say you want to send the message 'Cool', as you said. Well, first send a byte (or a 2-byte short, or a 4-byte integer) containing '4', the length of the message, and receive it on the other end. You know that before any message arrives, you must read 1 byte, store that somewhere and then read the remaining specified bytes.
A simplified example:
struct mheader {
int length;
}
// (...)
struct mheader in_h;
read(fd, &in_h, sizeof(struct mheader);
if (in_h.length > 0) {
read(fd, buffer, in_h.length)
}
In actual use, remember that read doesn't always read the exact amount of bytes you request. Check the return value to find out (which could be negative to indicate errors), and read again if necessary.
Hope this helps. Good luck!

So that the buffer retains the NUL at the end, as extra insurance against string overflows. Reading 256 would allow it to get overwritten.
You would write five bytes. Either write "Cool\0", or write 4 (the length) followed by the 4 characters in "Cool". Read all of it, and figure out the length after.

You look at the return value from read(); it tells you how many bytes were read.
You use the number of bytes read when you want to write the same data.
You don't have to use 255 in the read unless you definitely want to be able to put a NUL at the end - but since you know how many bytes were read, you won't go beyond that anyway. So, the 255 is an insurance policy against carelessness by the programmer.
The memset() is likewise most an insurance policy against carelessness by the programmer - it is not really necessary, unless you want to mask out previous sensitive data.

Related

Writing a CFSTR to the terminal in Mac OS X

How best would I output the following code
#include <CoreFoundation/CoreFoundation.h> // Needed for CFSTR
int main(int argc, char *argv[])
{
char *c_string = "Hello I am a C String. :-).";
CFStringRef cf_string = CFStringCreateWithCString(0, c_string, kCFStringEncodingUTF8);
// output cf_string
//
}
There's no API to write a CFString directly to any file (including stdout or stderr), because you can only write bytes to a file. Characters are a (somewhat) more ideal concept; they're too high-level to be written to a file. It's like saying “I want to write these pixels”; you must first decide what format to write them in (say, PNG), and then encode them in that format, and then write that data.
So, too, with characters. You must encode them as bytes in some format, then write those bytes.
Encoding the characters as bytes/data
First, you must pick an encoding. For display on a Terminal, you probably want UTF-8, which is kCFStringEncodingUTF8. For writing to a file… you usually want UTF-8. In fact, unless you specifically need something else, you almost always want UTF-8.
Next, you must encode the characters as bytes. Creating a C string is one way; another is to create a CFData object; still another is to extract bytes (not null-terminated) directly.
To create a C string, use the CFStringGetCString function.
To extract bytes, use the CFStringGetBytes function.
You said you want to stick to CF, so we'll skip the C string option (which is less efficient anyway, since whatever calls write is going to have to call strlen)—it's easier, but slower, particularly when you use it on large strings and/or frequently. Instead, we'll create CFData.
Fortunately, CFString provides an API to create a CFData object from the CFString's contents. Unfortunately, this only works for creating an external representation. You probably do not want to write this to stdout; it's only appropriate for writing out as the entire contents of a regular file.
So, we need to drop down a level and get bytes ourselves. This function takes a buffer (region of memory) and the size of that buffer in bytes.
Do not use CFStringGetLength for the size of the buffer. That counts characters, not bytes, and the relationship between number of characters and number of bytes is not always linear. (For example, some characters can be encoded in UTF-8 in a single byte… but not all. Not nearly all. And for the others, the number of bytes required varies.)
The correct way is to call CFStringGetBytes twice: once with no buffer (NULL), whereupon it will simply tell you how many bytes it'll give you (without trying to write into the buffer you haven't given it); then, you create a buffer of that size, and then call it again with the buffer.
You could create a buffer using malloc, but you want to stick to CF stuff, so we'll do it this way instead: create a CFMutableData object whose capacity is the number of bytes you got from your first CFStringGetBytes call, increase its length to that same number of bytes, then get the data's mutable byte pointer. That pointer is the pointer to the buffer you need to write into; it's the pointer you pass to the second call to CFStringGetBytes.
To recap the steps so far:
Call CFStringGetBytes with no buffer to find out how big the buffer needs to be.
Create a CFMutableData object of that capacity and increase its length up to that size.
Get the CFMutableData object's mutable byte pointer, which is your buffer, and call CFStringGetBytes again, this time with the buffer, to encode the characters into bytes in the data object.
Writing it out
To write bytes/data to a file in pure CF, you must use CFWriteStream.
Sadly, there's no CF equivalent to nice Cocoa APIs like [NSFileHandle fileHandleWithStandardOutput]. The only way to create a write stream to stdout is to create it using the path to stdout, wrapped in a URL.
You can create a URL easily enough from a path; the path to the standard output device is /dev/stdout, so to create the URL looks like this:
CFURLRef stdoutURL = CFURLCreateWithFileSystemPath(kCFAllocatorDefault, CFSTR("/dev/stdout"), kCFURLPOSIXPathStyle, /*isDirectory*/ false);
(Of course, like everything you Create, you need to Release that.)
Having a URL, you can then create a write stream for the file so referenced. Then, you must open the stream, whereupon you can write the data to it (you will need to get the data's byte pointer and its length), and finally close the stream.
Note that you may have missing/un-displayed text if what you're writing out doesn't end with a newline. NSLog adds a newline for you when it writes to stderr on your behalf; when you write to stderr yourself, you have to do it (or live with the consequences).
So:
Create a URL that refers to the file you want to write to.
Create a stream that can write to that file.
Open the stream.
Write bytes to the stream. (You can do this as many times as you want, or do it asynchronously.)
When you're all done, close the stream.

Serializing strings in C

I'm serializing structs into byte-streams. My method is simple:
pack all ints in little endian order and copy strings including the null terminator. The other side has to statically know how to unpack the byte-stream, there is no additional metadata.
My problem is, that I do not know how to handle the the NULL pointer?
I need to send something, because there is no additional metadata in the stream.
I considered the following two options:
Send a '\0' and make the receiving side interpret it as NULL in any case
Send a '\0' and make the receiving side interpret it as '\0' in any case (alloc a byte)
Send a special character representing char* str == NULL, e.g. ETX, EOT, EM ?
What do you think?
It looks like you are currently trying to tell the receiving end that the end of the serialized string has been reached by passing it a special character. There are a million cases that can screw you over with this:
What if your struct contains a byte that is equal to that special character. Escape it with another special character. What if your struct contains a byte sequence that is equal to your escape character followed by your special character, check for that too?
Yeah it's doable, but I think that's not a very good solution and you'll have to write a parser to look for the escape character and then anyone who takes a look at the code later will spend two hours trying to figure out what's going on.
(tl;dr) Instead... just make the first 32 bits of the serialized string equal to the number of bytes in the string. This only costs 4 bytes per serialization, solves all your problems, you won't have to write a parser or worry about special characters, and will make it a lot easier on the next guy who gets to read through your code!
edit
Thanks to JeremyP I've just realized that I didn't really answer your question. Send one of these guys for every string:
struct s_str
{
bool is_null;
int size;
char* str;
};
If it's null, simply set is_null to true and you don't really have to worry about the other two.
If it's size zero, set is_null to false and size to zero.
If str contains just a '\0', set is_null to false, size to one, and str[0] to '\0'
In my opinion, this might not be the most memory efficient way (you could probably save a byte somewhere somehow) but is definitely quite clear in what you're doing, and again the next guy that comes along will like this a lot more.
Do not do this. use some extra bytes to store length and concatenate with your data string. The receiver end can check the length to know how much it should read into his local buffer.
It depends on the significance of the pointer in your protocol.
If the pointer is significant, i.e. it is needed for the recipient to know how to rebuild the struct, then you need to send something. It could be either a byte with 0/non-zero to indicate existence, or an integer that indicates the number of bytes pointed to by the pointer.
Example:
struct Foo {
int *arr,
char *text
}
Struct Foo could be serialized like this:
<arr length>< arr ><text length>< text >
4 bytes n bytes 4 bytes n bytes
I would advise you to use an existing library for serialization. I can think of two at the moment: tpl and gwlib's gwser.
About tpl:
You can use tpl to store and reload your C data quickly and easily.
Tpl works with files, memory buffers and file descriptors so it's
suitable for use as a file format, IPC message format or any scenario
where you need to store and retrieve your data.
About gwlib, see the link, it's not very verbose, but it provides a few usage examples.

Disadvantages of strlen() in Embedded Domain

What are the disadvantages of using strlen()?
If sometimes in TCP Communication NULL character comes in string than we find length of string up to only null character.
we cant find actual length of string.
if we make other alternative of this strlen function than its also stops at NULL character. so which method i can use to find out string length in C
To read from a "TCP Communication" you are probably using read. The prototype for read is
ssize_t read(int fildes, void *buf, size_t nbyte);
and the return value is the number of bytes read (even if they are 0).
So, let's say you're about to read 10 bytes, all of which are 0. You have an array with more than enough to hold all the data
int fildes;
char data[1000];
// fildes = TCPConnection
nbytes = read(fildes, data, 1000);
Now, by inspecting nbytes you know you have read 10 bytes. If you check data[0] through data[9] you will find they have 0;
If the runtime library provides strcpy() and strcat(), then surely it provides strlen().
I suspect you are confusing NULL, an invalid pointer value, from the ASCII code NUL, for a zero character value which indicates the end of string to many C runtime functions.
As such, there is no concern for inserting a NUL value in a string, nor in detecting it.
Response to updated question:
Since you seem to be processing binary data, the string functions are not a good fit—unless you can guarantee there are no NULs in the stream. However, for this reason, most TCP/IP messages use headers with fields containing the number of bytes which follow.
"Embedded" strikes me as a red herring here.
If you're processing binary data where an embedded NUL might be valid, then you can't expect meaningful results from strlen.
If you're processing strings (as that term is defined in C -- a block of non-NUL data terminated by a NUL) then you can use strlen just fine.
A system being "embedded" would affect this only to the degree that it might be less common to process strings and more common to process binary data.
it is safer to use strnlen instead of strlen to avoid the problems with strlen. The strlen problems are present everywhere not just embedded. Many of the string functions are dangerous because they go forever or until a zero is hit, or like scanf or strtok go until a pattern is hit.
Remember tcp is a stream not a packet, you may have to wait for multiple or many packets and piece together the data before you can attempt to call it a string anyway. that is assuming the payload is an asciiz string anyway, if raw data then dont use string functions use some other solution.
Yes, strlen() uses a terminating character \0 aka NUL. Most str* functions do so. There could be a risk that data coming from files/command line/sockets would not contain this character (usually, they won't: they'll be \n-terminated), but their size will also be provided by the read()/recv() function you've used. If that's a concern, you can always use a buffer slightly larger than what declared to those functions, e.g.
char mybuf[256+4];
mybuf[256]=0;
int reallen=fgets(mybuf, 256, stdin);
// we've got a 0-terminated string in mybuf.
If your data may not contain \0, compare strlen(mybuf) with reallen and terminate the session with an error code if they differ.
If your data may contain 0, then it should be processed as a buffer and not as a string. Size must be kept aside, and memcpy / memcmp functions should be used instead of strcpy and strcmp.
Also, your network protocol should be very explicit on whether strings or binary data is expected in different parts of the communication. HTTP is for instance, and it provides many way to tell the actual size of the transmitted payload.
This isn't specific to "embedded" programs, but it has come a major concern in every programs to ensure no remote code/script injection can occur. If by "embedded", you mean you're in a non-preemptive environment and have only limited time available to perform some action ... then yeah, you don't want to end up scanning 2GB of incoming bits for a (never-appearing) \0. either the above trick, or strnlen (mentioned in another answer) could be used to ensure this isn't the case.

How can I get how many bytes sscanf_s read in its last operation?

I wrote up a quick memory reader class that emulates the same functions as fread and fscanf.
Basically, I used memcpy and increased an internal pointer to read the data like fread, but I have a fscanf_s call. I used sscanf_s, except that doesn't tell me how many bytes it read out of the data.
Is there a way to tell how many bytes sscanf_s read in the last operation in order to increase the internal pointer of the string reader? Thanks!
EDIT:
And example format I am reading is:
|172|44|40|128|32|28|
fscanf reads that fine, so does sscanf. The only reason is that, if it were to be:
|0|0|0|0|0|0|
The length would be different. What I'm wondering is how fscanf knows where to put the file pointer, but sscanf doesn't.
With scanf and family, use %n in the format string. It won't read anything in, but it will cause the number of characters read so far (by this call) to be stored in the corresponding parameter (expects an int*).
Maybe I´m silly, but I´m going to try anyway. It seems from the comment threads that there's still some misconception. You need to know the amount of bytes. But the method returns only the amount of fields read, or EOF.
To get to the amount of bytes, either use something that you can easily count, or use a size specifier in the format string. Otherwise, you won't stand a chance finding out how many bytes are read, other then going over the fields one by one. Also, what you may mean is that
sscanf_s(source, "%d%d"...)
will succeed on both inputs "123 456" and "10\t30", which has a different length. In these cases, there's no way to tell the size, unless you convert it back. So: use a fixed size field, or be left in oblivion....
Important note: remember that when using %c it's the only way to include the field separators (newline, tab and space) in the output. All others will skip the field boundaries, making it harder to find the right amount of bytes.
EDIT:
From "C++ The Complete Reference" I just read that:
%n Receives an integer value equal to
the nubmer of characters read so far
Isn't that precisely what you were after? Just add it in the format string. This is confirmed here, but I haven't tested it with sscanf_s.
From MSDN:
sscanf_s, _sscanf_s_l, swscanf_s, _swscanf_s_l
Each of these functions returns the number of fields successfully converted and assigned; the return value does not include fields that were read but not assigned. A return value of 0 indicates that no fields were assigned. The return value is EOF for an error or if the end of the string is reached before the first conversion.

Passing variable-length structures between MPI processes

I need to MPI_Gatherv() a number of int/string pairs. Let's say each pair looks like this:
struct Pair {
int x;
unsigned s_len;
char s[1]; // variable-length string of s_len chars
};
How to define an appropriate MPI datatype for Pair?
In short, it's theoretically impossible to send one message of variable size and receive it into a buffer of the perfect size. You'll either have to send a first message with the sizes of each string and then a second message with the strings themselves, or encode that metainfo into the payload and use a static receiving buffer.
If you must send only one message, then I'd forgo defining a datatype for Pair: instead, I'd create a datatype for the entire payload and dump all the data into one contiguous, untyped package. Then at the receiving end you could iterate over it, allocating the exact amount of space necessary for each string and filling it up. Let me whip up an ASCII diagram to illustrate. This would be your payload:
|..x1..|..s_len1..|....string1....|..x2..|..s_len2..|.string2.|..x3..|..s_len3..|.......string3.......|...
You send the whole thing as one unit (e.g. an array of MPI_BYTE), then the receiver would unpack it something like this:
while (buffer is not empty)
{
read x;
read s_len;
allocate s_len characters;
move s_len characters from buffer to allocated space;
}
Note however that this solution only works if the data representation of integers and chars is the same on the sending and receiving systems.
I don't think you can do quite what you want with MPI. I'm a Fortran programmer, so bear with me if my understanding of C is a little shaky. You want, it seems, to pass a data structure consisting of 1 int and 1 string (which you pass by passing the location of the first character in the string) from one process to another ? I think that what you are going to have to do is pass a fixed length string -- which would have, therefore, to be as long as any of the strings you really want to pass. The reception area for the gathering of these strings will have to be large enough to to receive all the strings together with their lengths.
You'll probably want to declare a new MPI datatype for your structs; you can then gather these and, since the gathered data includes the length of the string, recover the useful parts of the string at the receiver.
I'm not certain about this, but I've never come across truly variable message lengths as you seem to want to use and it does sort feel un-MPI-like. But it may be something implemented in the latest version of MPI that I've just never stumbled across, though looking at the documentation on-line it doesn't seem so.
MPI implementations do not inspect or interpret the actual contents of a message. Provided that you know the size of the data structure, you can represent that size in some number of char's or int's. The MPI implementation will not know or care about the actual internal details of the data.
There are a few caveats...both the sender and receiver need to agree on the interpretation of the message contents, and the buffer that you provide on the sending and receiving side needs to fit into some definable number of char's or int's.

Resources