Sending stream terminating bytes to indicate end of TCP stream - c

Say for example I wish to stream something over TCP, for instance a file share (bytes of a read file) or the output of a process (stdin, stderr, stdout) via a pipe using CreateProcess or _popen. I wouldn't be able to calculate the size in bytes of the streamed output. If I were to send a specific set of special characters to indicate the stream is over to be parsed by the server/client, would this be practical and secure?

If I were to send a specific set of special characters to indicate the stream is over to be parsed by the server/client, would this be practical and secure?
If the contents of the file can be arbitrary binary data, then, no, this is not by itself a “safe” way of doing it, because the “specific set of special characters” you use could appear somewhere in the file, so they would be sent over the connection, and the receiver would think the file has ended when it has not.
(I say this is not “safe,” meaning it is not a guaranteed way to communicate the file correctly, because it is not clear what you mean by “practical” or “secure.” Secure from what? Is there some attacker giving you files to transmit, so they can craft them in malicious ways? If all your files are known to contain only plain text ASCII characters, you could “safely” use a non-ASCII character as a sentinel.)
There are myriad ways of making a protocol to send arbitrary data. You could send the length first (although then you need to decide how many bits to use for the length, or design some protocol to send arbitrarily many bits for the length). You could send the data in blocks of, say, 512 bytes, and send each block with its length first. As long as that length is 512 bytes, it would mean another block for the same data is coming. Once the length is less than 512 bytes (including 0), it means this is the last block for the data. You could send the data as a quoted string, starting with ", then followed by the bytes of the data except that each " in the data is actually sent as \ then " and each \ in the data is sent as \ and \, and then the end of the data is indicated by a " (not preceded by a \ that escapes it).

Related

best way to read post request from a tcp socket?

i am writing a web server in c. i am working with TCP sockets and i want to know how to read post request and put it into a buffer.
what i can do is first read the post header in a buffer with fixed length and then create a dynamic buffer based on the content_length and read the body in dynamic buffer. but as the size of post header can vary, after read, the buffer containing the header could also contain body of the post request. i can try to parse it further but would it be efficient to do that?
is there a better way to do this?
Well, in your case, i would say that the 'best way' is to separate each step into its own process.
First, in order to parse the message, you continuously have to read raw bytes from the underlying device (file, socket, pipe, etc) and 'feed' the parser, until the message is complete.
Parsing the message could also be divided into different steps:
parsing the start line
parsing the headers
parsing the body (if any)
Since each step involves processing bytes of unknown size (separated by delimiter and not size specific, unless we already have 'parsed' the content length and know the body size), each process has its own needs of buffers, e.g.
the start line is composed of <method> <url> <version> \r\n
the header is composed of <key> : <value> \r\n
the composition of the body is determined by the headers
and the complete message does have the form <header> \r\n <body>, where the body part is optional (content length is zero)
So, long story short: you have to divide your message parser into different processing steps (highly recommended if you want to avoid spaghetti code) and decide when to use fixed sized or reallocatable buffers and how to reuse them (if not strictly separated by each step). Once you've done that and have a presentable implementation, then we're able to debate on how to reduce the usage of memory allocations and reuse of buffers.

Stream vs Buffer

there I'm new to C. I'm currently reading the K&R. There I got confused by a definition in it about the text streams "A text stream is a sequence of characters divided into new lines;each line consists of 0 or more characters followed by a newline character."
And trying to knowing about this streams I was introduced to a new term namely buffer.
I just know that:
A continuous flow of data (bytes or characters) in between Input and Output devices is
a STREAM .
A temporary storage area in main memory to store the input or output data temporarily
is a BUFFER.
I don't say that I'm right, but it's my basic idea upon those terms.
I want to know, what actually buffer & stream are and how these 2 things(i.e, stream & buffer) work together, in the non-abstract level of C implementation.
You have three streams in C, stdin, stdout, and stderr, you can also think of files you have opened with fopen for example as a stream. stdin is generally the keyboard, stdout is generally your monitor, stderr is generally also your monitor. But they don't have to be, they are abstractions for the hardware.
If for example you didn't have a keyboard but a keypad on a bank ATM for example, then stdin would be the keypad, if you didn't have a monitor but instead had a printer, then stdout would be the printer. You change what hardware they are using with calls to your operating system. You can also change their behaviour, again, through calls to your operating system, which is beyond the scope of what you're asking.
So in a way, think of the buffer as the memory allocated by the operating system associated with the stream to hold the data received from the hardware component. When you type at your keyboard for example the characters you type aren't being capture directly by your IDE, they are moving from there into the buffer, then you read the buffer.
That's why, for example, you have to hit enter before your code starts interacting with whatever you typed into the keyboard, because stdin is line buffered. Control passes from your program to the operating system until it encounters something that sends control back to your program, in a normal situation that would be the newline character.
So in a way, think of it like this, the stream is the device (keyboard, monitor, or a file on your hard drive), the buffer is where the data is held while the operating system has control, and then you interact with the buffer while you are processing the data.
That abstraction allows you to use all of these different things in a common manner regardless of what they are, for example: fgets(str, sizeof(str), STREAM) ... stream can be any input stream, be it stdin or a file.
Taking it a step further that's why new programmers get thrown off by scanf for an int followed by an fgets, because scanf reads the int from the buffer but leaves the \n in the buffer ... then the call to fgets reads the \n that scanf left there and the new programmer is left wondering why they were unable to input any data. So your curiosity about streams and buffers will serve you well as you move forward in your learning about C.
Those are actually pretty good working definitions.
In practical C terms, a buffer is an array (usually of char or unsigned char type) that's used to store data, either as a result of an input operation, or before sending to output. The array can be declared as a fixed size array, such as
char buffer[SOME_BUFFER_SIZE];
or dynamically, using
char *buffer = malloc( SOME_BUFFER_SIZE * sizeof *buffer );
The advantage of using dynamic memory is that the buffer can be resized if necessary; the disadvantage is you have to manage the lifetime of that memory.
For text input/output you'd typically use arrays of char; for binary input/output you'd typically use arrays of unsigned char.
It's fairly common for systems communicating over a network to send data in fixed-size "chunks", such that you may need several read or write operations to get all the data across. Think of a Web server and a browser - the server sends the HTML in multiple messages, and the browser stores the intermediate result in a buffer. It's only when all the data has been received that the browser renders the page:
Received from Web server Stored in browser's input buffer
------------------------ --------------------------------
HTTP/1.1 200 OK \r\n <!DOCTYPE HTML><html
Content-length: 20\r\n
<!DOCTYPE HTML><html
HTTP/1.1 200 OK \r\n <!DOCTYPE HTML><html><head><title>This i
Content-length: 20\r\n
><head><title>This i
HTTP/1.1 200 OK \r\n <!DOCTYPE HTML><html><head><title>This i
Content-length: 20\r\n s a test</title></he
s a test</title></he
HTTP/1.1 200 OK \r\n <!DOCTYPE HTML><html><head><title>This i
Content-length: 20\r\n s a test</title></head><body><p>Hello, W
ad><body><p>Hello, W
HTTP/1.1 200 OK \r\n <!DOCTYPE HTML><html><head><title>This i
Content-length: 19 s a test</title></head><body><p>Hello, W
orld!</body></html> orld!</body></html>
No sane server sends HTML in chunks of 20 characters, but this should illustrate why and how buffers get used.
The deifinitions are not bad, actually very good. You could add (from an object oriented perspective), that a STREAM uses a BUFFER.
The use of a BUFFER might be necessary, e.g. performance reasons, since every system call comes with a relatively high cost.
Especially IO system calls, Harddisk or Network access are slow, compared to memory access times. And they add up if a read or write consists only of a single byte.
Two common abstractions of I/O devices are:
Streams - transfers a variable number of bytes as the device becomes ready.
Block - transfers fixed-size records.
A buffer is just an area of memory which holds the data being transferred.

Sync read and write

I'm using read and write functions to communicate between client and server.
If server use two times write, in Wireshark I can see that two packets was send, but my read function concat two packets in one buffer
Question:
It is possible to my read function read only one payload at one time?
I dont want reduce buffer
Ex:
Situation now:
Send(8bytes) Send(8bytes)
Read, read 16 bytes
I want
Send(8 bytes) Send(8Bytes)
Read, read 8 bytes(first packet)
Read, read 8 bytes(second packet)
TCP/IP gives you an ordered byte stream. Reads and writes are not guaranteed to have the same boundaries, as you have seen.
To see where messages begin and end, you need to add extra information to your protocol to provide this information. A workable simple approach is to have a byte count at the start of each message. Read the byte count, then you know how many more bytes to read to get the complete message and none of the next message.
If you want to synchronize server and client use something like semaphores or you can send read/write bytes and this avoid sending information before client read it. Or if you know exactly length of message you can separate readed bytes. If you make buffer exact length of message remain bytes will be lost so make a server sending information when reader read previous message or extend buffer and separate multiple messages.

How to detect a delimiter while reading from a socket file descriptor in C?

In C, while reading into a buffer from a socket file descriptor, how do I make the read stop if a delimiter is detected? Let's assume the delimiter is a '>' character.
read(socket_filedes, buffer, MAXSZ);
/* stop if delimiter '>' is detected */
You have two options here:
Read a single byte at a time until you encounter a delimiter. This is likely to be very inefficient.
Read in a full buffer's worth of data at a time, then look for the delimiter there. When you find it, save off the remaining data in another buffer and process the data you want. When you're ready to read again, put the saved data back in the buffer and call read with the address of the next available byte in the buffer.
The read() function does not examine the data it transfers them from source to buffer. You cannot force it to stop transferring data at a specific character or characters if it would not otherwise have stopped there.
On the other hand, it is important to recognize that read() does not necessarily read the full number of bytes specified in any case. On one hand, that means that you need to be prepared to run read() calls in a loop to collect all the data you expect, but on the other hand it means that you can usually expect that if read() has already transferred at least one byte then it will return when no more data are immediately available to transfer. Thus, if the sender stops sending data after the delimiter, then read() will probably stop reading after that delimiter.
If you cannot rely on the sender to break up the transmission as you require, then you have the option of reading one byte at a time. That can be awfully inefficient, however. The more usual solution is to do the job in two stages: (1) perform fast, block-wise read()s from the kernel into a userspace buffer, and then (2) parse the buffer contents via your userspace code. This is basically what readable C streams do when using a buffer.

How many bytes should I read/write to a socket?

I'm having some doubts about the number of bytes I should write/read through a socket in C on Unix. I'm used to sending 1024 bytes, but this is really too much sometimes when I send short strings.
I read a string from a file, and I don't know how many bytes this string is, it can vary every time, it can be 10, 20 or 1000. I only know for sure that it's < 1024. So, when I write the code, I don't know the size of bytes to read on the client side, (on the server I can use strlen()). So, is the only solution to always read a maximum number of bytes (1024 in this case), regardless of the length of the string I read from the file?
For instance, with this code:
read(socket,stringBuff,SIZE);
wouldn't it be better if SIZE is 10 instead of 1024 if I want to read a 10 byte string?
In the code in your question, if there are only 10 bytes to be read, then it makes no difference whether SIZE is 10 bytes, 1,024 bytes, or 1,000,024 bytes - it'll still just read 10 bytes. The only difference is how much memory you set aside for it, and if it's possible for you to receive a string up to 1,024 bytes, then you're going to have to set aside that much memory anyway.
However, regardless of how many bytes you are trying to read in, you always have to be prepared for the possibility that read() will actually read a different number of them. Particularly on a network, when you can get delays in transmission, even if your server is sending a 1,024 byte string, less than that number of bytes may have arrived by the time your client calls read(), in which case you'll read less than 1,024.
So, you always have to be prepared for the need to get your input in more than one read() call. This means you need to be able to tell when you're done reading input - you can't rely alone on the fact that read() has returned to tell you that you're done. If your server might send more than one message before you've read the first one, then you obviously can't hope to rely on this.
You have three main options:
Always send messages which are the same size, perhaps padding smaller strings with zeros if necessary. This is usually suboptimal for a TCP stream. Just read until you've received exactly this number of bytes.
Have some kind of sentinel mechanism for telling you when a message is over. This might be a newline character, a CRLF, a blank line, or a single dot on a line followed by a blank line, or whatever works for your protocol. Keep reading until you have received this sentinel. To avoid making inefficient system calls of one character at a time, you need to implement some kind of buffering mechanism to make this work well. If you can be sure that your server is sending you lines terminated with a single '\n' character, then using fdopen() and the standard C I/O library may be an option.
Have your server tell you how big the message is (either in an initial fixed length field, or using the same kind of sentinel mechanism from point 2), and then keep reading until you've got that number of bytes.
The read() system call blocks until it can read one or more bytes, or until an error occurs.
It DOESN'T guarantee that it will read the number of bytes you request! With TCP sockets, it's very common that read() returns less than you request, because it can't return bytes that are still propagating through the network.
So, you'll have to check the return value of read() and call it again to get more data if you didn't get everything you wanted, and again, and again, until you have everything.

Resources