Reading and storing data from a TCP buffer in C - c

I am reading some HTTP POST data from an HTTP wrapped TCP socket. My apparatus works but there is a strange syndrome. Basically I know what the content length is (via HTTP header Content-length) but I more often than not seem to build up a buffer that is 2-3 bytes longer than expected. I know that I am not setting my buffer size on initialization but when I do I get a lot of compile errors. The following code almost works but often produces more data in the buffer than there should be.
long bytesRead;
unsigned long bytesRemaining;
sbyte *pBuffer;
sbyte *pTmpBuffer;
pBuffer = malloc(contentLength);
memset(pBuffer, 0, contentLength);
pTmpBuffer = pBuffer;
bytesRemaining = contentLength;
while(bytesRemaining > 0){
if(maxBuffSize < bytesRemaining){
chunkSize = maxBuffSize;
}
else {
chunkSize = bytesRemaining;
}
bytesRead = tcpBlockReader(pHttpData, pTempBuff, chunkSize);
bytesRemaining -= bytesRead;
pTempBuff += bytesRead;
}
printf("Data is %s\n", pBuffer);
printf("Length is %d\n", strlen(pBuffer));
Now sometimes it will be perfect, ie
Data is expected+data
Length is 13
And sometimes it will be
Data is expected+data+(weird characters)
Length is 15
So the problem here I think is I don't set a size for the buffer (ie pBuffer[contentLength]). When I do this though I get errors of incompatible types and what not. I am not a well versed C programmer (usually stick to chars and ints). What can I do to ensure that the buffer is not full of extra garbage at the end?

I was missing the elusive NULL terminator.
pBuffer = malloc(contentLength + 1)
...
pBuffer[contentLength] = '\0';

Related

C. Loop compression + send (gzip) ZLIB

I'm currently building an HTTP server in C.
Please mind this piece of code :
#define CHUNK 0x4000
z_stream strm;
unsigned char out[CHUNK];
int ret;
char buff[200];
strm.zalloc = Z_NULL;
strm.zfree = Z_NULL;
strm.opaque = Z_NULL;
int windowsBits = 15;
int GZIP_ENCODING = 16;
ret = deflateInit2(&strm, Z_BEST_SPEED, Z_DEFLATED, windowsBits | GZIP_ENCODING, 1,
Z_DEFAULT_STRATEGY);
fill(buff); //fill buff with infos
do {
strm.next_in = (z_const unsigned char *)buff;
strm.avail_in = strlen(buff);
do {
strm.avail_out = CHUNK;
strm.next_out = out;
ret = deflate(&strm, Z_FINISH);
} while (strm.avail_out == 0);
send_to_client(out); //sending a part of the gzip encoded string
fill(buff);
}while(strlen(buff)!=0);
The idea is : I'm trying to send gzip'ed buffers, one by one, that (when they're concatened) is a whole body request.
BUT : for now, my client (a browser) only get the infos of the first buffer. No errors at all though.
How do I achieve this job, how to gzip some buffers inside a loop so I can send them everytime (in the loop) ?
First off, you need to do something with the generated deflate data after each deflate() call. Your code discards the compressed data generated in the inner loop. From this example, after the deflate() you would need something like:
have = CHUNK - strm.avail_out;
if (fwrite(out, 1, have, dest) != have || ferror(dest)) {
(void)deflateEnd(&strm);
return Z_ERRNO;
}
That's where your send_to_client needs to be, sending have bytes.
In your case, your CHUNK is so much larger than your buff, that loop is always executing only once, so you are not discarding any data. However that is only happening because of the Z_FINISH, so when you make the next fix, you will be discarding data with your current code.
Second, you are finishing the deflate() stream each time after no more than 199 bytes of input. This will greatly limit how much compression you can get. Furthermore, you are sending individual gzip streams, for which most browsers will only interpret the first one. (This is actually a bug in those browsers, but I don't imagine they will be fixed.)
You need to give the compressor at least 10's to 100's of Kbytes to work with in order get decent compression. You need to use Z_NO_FLUSH instead of Z_FINISH until you get to your last buff you want to send. Then you use Z_FINISH. Again, take a look at the example and read the comments.

How can I use a dynamic array for UDP receipt in D?

I would like to create a simple UDP server that can receive messages of varying length. However, it seems as if D's Socket.receiveFrom() expects a static length buffer array. When the following code runs:
void main() {
UdpSocket server_s;
Address client_addr;
ubyte[] in_buf;
ptrdiff_t bytesin;
server_s = new UdpSocket();
server_s.bind(new InternetAddress(InternetAddress.ADDR_ANY, PORT_NUM));
bytesin = server_s.receiveFrom(in_buf, client_addr);
if (bytesin == 0 || bytesin == Socket.ERROR) {
writeln("Error receiving, bytesin: ", bytesin);
return;
}
// Do stuff
}
receiveFrom() immediately falls through with bytesin == 0. Why is this? Can I even use dynamic arrays for receiving over UDP?
receive and receiveFrom don't do the allocation themselves. You can pass an array of some fixed size big enough to hold any packet you expect, then slice it based on how many bytes you received.
If you preallocated 64 KB, that ought to fit everything you could conceivably get. I tend to just use 4 KB buffers though.
ubyte[4096] in_buf;
bytesin = server_s.receiveFrom(in_buf, client_addr);
// check for error first then
auto message_received = in_buf[0 .. bytesin];
// process it
// keep looping, reusing the buffer, to get more stuff

Sending/Handling partial data writes in TCP

So, I have the following code which sends out my packet on TCP. Its working pretty well. I just have to test partial writes. So I write 1 byte at time either by setting sendbuf to 1 or do a hack as shown below. When i took a tcpdump, it was all incorrect except the first byte.. what am i doing wrong?
int tmi_transmit_packet(struct tmi_msg_pdu *tmi_pkt, int len, int *written_len)
{
int bytes;
// This works
bytes = write(g_tmi_mgr->tmi_conn_fd, (void*) tmi_pkt, len);
// This doesn't:
// bytes = write(g_tmi_mgr->tmi_conn_fd, (void*) tmi_pkt, 1);
if (bytes < 0) {
if (errno == EAGAIN) {
return (TMI_SOCK_FULL);
}
return (TMI_WRITE_FAILED);
} else if (bytes < len) {
*written_len += bytes;
tmi_pkt += bytes;
return (tmi_transmit_packet(tmi_pkt, len - bytes, written_len));
} else {
*written_len += len;
}
return TMI_SUCCESS;
}
This line
tmi_pkt += bytes;
most propably does not do what you expect.
It does increment tmi_pkt by sizeof(*tmp_pkt) * bytes and not only by bytes. For a nice explanation on pointer arithmetics you might like to click here and have a look at binky.
To get around this you might mod you code as follows:
...
else if (bytes < len) {
void * pv = ((char *) tmp_pkt) + bytes;
*written_len += bytes;
return (tmi_transmit_packet(pv, len - bytes, written_len));
}
...
Anyhow this somehow smells dirty as the data pointed to by the pointer passed into the write function does not necessarly need to correspond to it's type.
So a cleaner solution would be to not used struct tmi_msg_pdu *tmi_pkt but void * or char * as the function parameter declaration.
Although quiet extravagant the use of recursive calls here is not necessary nor recommended. For much data and/or a slow transmission it may run out of stack memory. A simple loop would do also. The latter has the advantage that you could use a temporary pointer to the buffer to be written and could stick to a typed interface.

Using bzip2 low-level routines to compress chunks of data

The Overview
I am using the low-level calls in the libbzip2 library: BZ2_bzCompressInit(), BZ2_bzCompress() and BZ2_bzCompressEnd() to compress chunks of data to standard output.
I am migrating working code from higher-level calls, because I have a stream of bytes coming in and I want to compress those bytes in sets of discrete chunks (a discrete chunk is a set of bytes that contains a group of tokens of interest — my input is logically divided into groups of these chunks).
A complete group of chunks might contain, say, 500 chunks, which I want to compress to one bzip2 stream and write to standard output.
Within a set, using the pseudocode I outline below, if my example buffer is able to hold 101 chunks at a time, I would open a new stream, compress 500 chunks in runs of 101, 101, 101, 101, and one final run of 96 chunks that closes the stream.
The Problem
The issue is that my bz_stream structure instance, which keeps tracks of the number of compressed bytes in a single pass of the BZ2_bzCompress() routine, seems to claim to be writing more compressed bytes than the total bytes in the final, compressed file.
For example, the compressed output could be a file with a true size of 1234 bytes, while the number of reported compressed bytes (which I track while debugging) is somewhat higher than 1234 bytes (say 2345 bytes).
My rough pseudocode is in two parts.
The first part is a rough sketch of what I do to compress a subset of chunks (and I know that I have another subset coming after this one):
bz_stream bzStream;
unsigned char bzBuffer[BZIP2_BUFFER_MAX_LENGTH] = {0};
unsigned long bzBytesWritten = 0UL;
unsigned long long cumulativeBytesWritten = 0ULL;
unsigned char myBuffer[UNCOMPRESSED_MAX_LENGTH] = {0};
size_t myBufferLength = 0;
/* initialize bzStream */
bzStream.next_in = NULL;
bzStream.avail_in = 0U;
bzStream.avail_out = 0U;
bzStream.bzalloc = NULL;
bzStream.bzfree = NULL;
bzStream.opaque = NULL;
int bzError = BZ2_bzCompressInit(&bzStream, 9, 0, 0);
/* bzError checking... */
do
{
/* read some bytes into myBuffer... */
/* compress bytes in myBuffer */
bzStream.next_in = myBuffer;
bzStream.avail_in = myBufferLength;
bzStream.next_out = bzBuffer;
bzStream.avail_out = BZIP2_BUFFER_MAX_LENGTH;
do
{
bzStream.next_out = bzBuffer;
bzStream.avail_out = BZIP2_BUFFER_MAX_LENGTH;
bzError = BZ2_bzCompress(&bzStream, BZ_RUN);
/* error checking... */
bzBytesWritten = ((unsigned long) bzStream.total_out_hi32 << 32) + bzStream.total_out_lo32;
cumulativeBytesWritten += bzBytesWritten;
/* write compressed data in bzBuffer to standard output */
fwrite(bzBuffer, 1, bzBytesWritten, stdout);
fflush(stdout);
}
while (bzError == BZ_OK);
}
while (/* while there is a non-final myBuffer full of discrete chunks left to compress... */);
Now we wrap up the output:
/* read in the final batch of bytes into myBuffer (with a total byte size of `myBufferLength`... */
/* compress remaining myBufferLength bytes in myBuffer */
bzStream.next_in = myBuffer;
bzStream.avail_in = myBufferLength;
bzStream.next_out = bzBuffer;
bzStream.avail_out = BZIP2_BUFFER_MAX_LENGTH;
do
{
bzStream.next_out = bzBuffer;
bzStream.avail_out = BZIP2_BUFFER_MAX_LENGTH;
bzError = BZ2_bzCompress(&bzStream, (bzStream.avail_in) ? BZ_RUN : BZ_FINISH);
/* bzError error checking... */
/* increment cumulativeBytesWritten by `bz_stream` struct `total_out_*` members */
bzBytesWritten = ((unsigned long) bzStream.total_out_hi32 << 32) + bzStream.total_out_lo32;
cumulativeBytesWritten += bzBytesWritten;
/* write compressed data in bzBuffer to standard output */
fwrite(bzBuffer, 1, bzBytesWritten, stdout);
fflush(stdout);
}
while (bzError != BZ_STREAM_END);
/* close stream */
bzError = BZ2_bzCompressEnd(&bzStream);
/* bzError checking... */
The Questions
Am I calculating cumulativeBytesWritten (or, specifically, bzBytesWritten) incorrectly, and how would I fix that?
I have been tracking these values in a debug build, and I do not seem to be "double counting" the bzBytesWritten value. This value is counted and used once to increment cumulativeBytesWritten after each successful BZ2_bzCompress() pass.
Alternatively, am I not understanding the correct use of the bz_stream state flags?
For example, does the following compress and keep the bzip2 stream open, so long as I keep sending some bytes?
bzError = BZ2_bzCompress(&bzStream, BZ_RUN);
Likewise, can the following statement compress data, so long as there are at least some bytes are available to access from the bzStream.next_in pointer (BZ_RUN), and then the stream is wrapped up when there are no more bytes available (BZ_FINISH)?
bzError = BZ2_bzCompress(&bzStream, (bzStream.avail_in) ? BZ_RUN : BZ_FINISH);
Or, am I not using these low-level calls correctly at all? Should I go back to using the higher-level calls to continuously append a grouping of compressed chunks of data to one main file?
There's probably a simple solution to this, but I've been banging my head on the table for a couple days in the course of debugging what could be wrong, and I'm not making much progress. Thank you for any advice.
In answer to my own question, it appears I am miscalculating the number of bytes written. I should not use the total_out_* members. The following correction works properly:
bzBytesWritten = sizeof(bzBuffer) - bzStream.avail_out;
The rest of the calculations follow.

Techniques for handling short reads/writes with scatter-gather?

Scatter-gather - readv()/writev()/preadv()/pwritev() - reads/writes a variable number of iovec structs in a single system call. Basically it reads/write each buffer sequentially from the 0th iovec to the Nth. However according to the documentation it can also return less on the readv/writev calls than was requested. I was wondering if there is a standard/best practice/elegant way to handle that situation.
If we are just handling a bunch of character buffers or similar this isn't a big deal. But one of the niceties is using scatter-gather for structs and/or discrete variables as the individual iovec items. How do you handle the situation where the readv/writev only reads/writes a portion of a struct or half of a long or something like that.
Below is some contrived code of what I am getting at:
int fd;
struct iovec iov[3];
long aLong = 74775767;
int aInt = 949;
char aBuff[100]; //filled from where ever
ssize_t bytesWritten = 0;
ssize_t bytesToWrite = 0;
iov[0].iov_base = &aLong;
iov[0].iov_len = sizeof(aLong);
bytesToWrite += iov[0].iov_len;
iov[1].iov_base = &aInt;
iov[1].iov_len = sizeof(aInt);
bytesToWrite += iov[1].iov_len;
iov[2].iov_base = &aBuff;
iov[2].iov_len = sizeof(aBuff);
bytesToWrite += iov[2].iov_len;
bytesWritten = writev(fd, iov, 3);
if (bytesWritten == -1)
{
//handle error
}
if (bytesWritten < bytesToWrite)
//how to gracefully continue?.........
Use a loop like the following to advance the partially-processed iov:
for (;;) {
written = writev(fd, iov+cur, count-cur);
if (written < 0) goto error;
while (cur < count && written >= iov[cur].iov_len)
written -= iov[cur++].iov_len;
if (cur == count) break;
iov[cur].iov_base = (char *)iov[cur].iov_base + written;
iov[cur].iov_len -= written;
}
Note that if you don't check for cur < count you will read past the end of iov which might contain zero.
AFAICS the vectored read/write functions work the same wrt short reads/writes as the normal ones. That is, you get back the number of bytes read/written, but this might well point into the middle of a struct, just like with read()/write(). There is no guarantee that the possible "interruption points" (for lack of a better term) coincide with the vector boundaries. So unfortunately the vectored IO functions offer no more help for dealing with short reads/writes than the normal IO functions. In fact, it's more complicated since you need to map the byte count into an IO vector element and offset within the element.
Also note that the idea of using vectored IO for individual structs or data items might not work that well; the max allowed value for the iovcnt argument (IOV_MAX) is usually quite small, something like 1024 or so. So if you data is contiguous in memory, just pass it as a single element rather than artificially splitting it up.
Vectored write will write all the data you have provided with one call to "writev" function. So byteswritten will be always be equal to total number of bytes provided as input. this is what my understanding is.
Please correct me if I am wrong

Resources