Why does reordering gzip packets damages the output? - c

I'm using the idea of the gzip code posted in zlib.
For initialization I use deflateInit2(p_strm, Z_DEFAULT_COMPRESSION, Z_DEFLATED, (15+16), 8, Z_DEFAULT_STRATEGY).
I'm zipping a stream. Each packet with Z_FULL_FLUSH, except from the last which I use Z_FINISH.
After zipping each packet, I'm reordering the packets.
data in packets ---> [zip] ---> [reordering] ---> ...
If I inflate the data after the zip, I'm getting the exact file before zipping.
If I inflate the data after the reordering of the packets (again: each packet is deflated with Z_FULL_FLUSH, except for the last Z_FINISH) I get a file that is very similar to the original file before zipping. The difference is in the end of the file: it lack of bytes. That's because when I'm inflating it, I get an error for the last packet (Z_DATA_ERROR). If I inflate, let's say, with chunks of 50KB, the inflated file after reordering is the same file as the input, less <50KB (the whole last packet is gone cause of the error). If I decrease the inflating chunk size to 8B, I still get the Z_DATA_ERROR, but now I loose less data while inflating, (In my example I lack one Byte from the original file).
I'm not reordering the last packet (Z_FINISH).
I tried to send all of the packets with Z_FULL_FLUSH and then, send another "empty" packet (only Z_FINISH which is 10 bytes).
Why is this happening?
If I use Z_FULL_FLUSH, Why can't the inflater inflate it correctly?
does it remember the order of the deflated packets?
Any information will help,
Thanks.

Since you are using Z_FULL_FLUSH which erases the history at each flush, you can reorder the packets, except for the last one. The one you did Z_FINISH on must be the last packet. It doesn't need to have any data though. You can feed all of your data from your last packet using Z_FULL_FLUSH, and then do one final packet with no input data and Z_FINISH. That will permit you to reorder the packets before that empty one all you like. Just always have that last one at the end.
The reason is that the deflate format is self terminating, so that last piece marks the end of the stream. If you reorder it to the middle somewhere, then the inflation with stop when it hits that packet.
The gzip header and trailer need to be maintained at the beginning and the end, and the CRC in the trailer updated accordingly. The CRC check at the end depends on the order of the data.
Why are trying to do what you're trying to do? What are you optimizing?

GZip is a streaming protocol. The compression depends on the prior history of the stream. You can't reorder it.

Related

Sending multiple Package very fast in C

I'm trying to do a multiplayer game in c, but when I send multiple package like "ARV 2\n\0" and "POS 2 0 0\n\0" from the server to the client (with send()), when I try to read them with recv(), he only found 1 package that appear to be the 2 package in 1..
So I'm asking, is that normal ? And if yes, how could I force my client to read 1 by 1 the packages ? (or my server to send them 1 by 1 if the problem come from the call send)
Thanks !
Short answer: Yes, this is normal. You are using TCP/IP, I assume. It is a byte stream protocol, there are no "packets". Network and OS on either end may combine and split the data you send in any way that fits in some buffers, or parts of network. Only thing guaranteed is, that you get the same bytes in same order.
You need to use your own packet framing. For text protocol, separate packets with, for example, '\0' bytes or newlines. Also note that network or OS may give you partial packets per single "read", so you need to handle that in your code as well. This is easiest if packet separator is single byte.
Especially for a binary protocol where there are no "unused" byte values to mark packet boundaries, you could write length of packet as binary data, then that many data bytes, then again length, data, and so on. Note that the data stream may get split to different "read" calls even in the middle of the length info as well (unless length is single byte), so you may need a few lines more of code to handle receiving split packets.
Another option would be to use UDP protocol, which indeed sends packets. But UDP packets may get lost or delivered in wrong order (and have a few other problems), so you need to handle that somehow, and this often results in you re-inventing TCP, poorly. So unless you notice TCP/IP just won't cut it, stick with that.

FFmpeg: what does av_parser_parse2 do?

When sending h264 data for frame decoding, it seems like a common method is to first call av_parser_parse2 from the libav library on the raw data.
I looked for documentation but I couldn't find anything other than some example codes. Does it group up packets of data so that the resulting data starts out with NAL headers so it can be perceived a frame?
The following is a link to a sample code that uses av_parser_parse2:
https://github.com/DJI-Mobile-SDK-Tutorials/Android-VideoStreamDecodingSample/blob/master/android-videostreamdecodingsample/jni/dji_video_jni.c
I would appreciate if anyone could explain those library details to me or link me resources for better understanding.
Thank you.
It is like you guessed, av_parser_parse2() for H.264 consumes input data, looks for NAL start codes 0x000001 and checks the NAL unit type looking for frame starts and outputs the input data, but with a different framing.
That is it consumes the input data, ignores its framing by putting all consecutive data into a big buffer and then restores the framing from the H.264 byte stream alone, which is possible because of the start codes and the NAL unit types. It does not increase or decrease the amount of data given to it. If you get 30k out, you have put 30k in. But maybe you did it in little pieces of around 1500 bytes, the payload of the network packets you received.
Btw, when the function declaration is not documented well, it is a good idea to look at the implementation.
Just to recover the framing is not involved enough to call it parsing. But the H.264 parser in ffmpeg also gathers some more information from the H.264 stream, eg. whether it is interlaced, so it really deserves its name.
It however does not decode the image data of the H.264 stream.
DJI's video transmission does not guarantee the data in each packet belongs to a single video frame. Mostly a packet contains only part of the data needed for a single frame. It also does not guarantee that a packet contains data from one frame and not two consecutive frames.
Android's MediaCodec need to be queued with buffers, each holding the full data for a single frame.
This is where av_parser_parse2() comes in. It gathers packets until it can find enough data for a full frame. This frame is then sent to MediaCodec for decoding.

Sending variable sized packets over the network using TCP/IP

I want to send variable sized packets between 2 linux OSes over an internal network. The packet is variable sized and its length and CRC are indicated in the header which is also sent along with the packet. Something roughly like-
struct hdr {
uint32 crc;
uint32 dataSize;
void *data;
};
I'm using CRC at the application layer to overcome the inherent limitation of TCP checksums
The problem I have is, there is a chance that the dataSize field itself is corrupted, in which case, I dont know where the next packet starts? Cos at the reciver, when I read the socket buffer, I read n such packets next to one another. So dataSize is the only way I can get to the next packet correctly.
Some ideas I have is to-
Restart the connection if a CRC mismatch occurs.
Aggregate X such packets into one big packet of fixed size and discard the big packet if any CRC error is detected. The big packet is to make sure we lose <= sizeof of one packet in case of errors
Any other ideas for these variable sized packets?
Since TCP is stream based, data length is the generally used way to extract one full message for processing at the application. If you believe that the length byte itself is wrong for some reason, there is not much we can do except discard the packet,"flush" the connection and expect that the sender and receiver would re-sync. But the best is to disconnect the line unless, there is a protocol at the application layer to get to re-sync the connection.
Another method other than length bytes would be to use markers. Start-of-Message and End-of-Message. Application when encountering Start-of-Message should start collecting data until End-of-Message byte is received and then further process the message. This requires that the message escapes the markers appropriately.
I think that you are dealing with second order error possibilities, when major risk is somewhere else.
When we used serial line transmissions, errors were frequent (one or two every several kBytes). We used good old Kermit with a CRC and a packet size of about 100 bytes and that was enough: I encountered many times a failed transfer because the line went off, but never a correct transfer with a bad file.
With current networks, unless you have very very poor lines, the hardware level is not that bad, and anyway the level 2 data link layer already has a checksum to control that each packet was not modified between 2 nodes. HDLC is commonly used at that level and it uses normaly a CRC16 or CRC32 checksum which is a very correct checksum.
So the checksum as TCP level is not meant to detect random errors in the byte stream, but simply as a last defense line for unexpected errors, for exemple if a router gets mad because of an electrical shock and sends full garbage. I do not have any statistical data on it, but I am pretty sure that the number of errors reaching the TCP level is already very very low. Said differently, do not worry about that: unless you are dealing with highly sensitive data - and in that case I would prefere to have two different channels, former for data, latter for a global checksum - TCP/IP is enough.
That being said, adding a control at the application level as an ultime defense is perfectly acceptable. It will only process the errors that could have been undetected at data link and TCP level, or more probably errors in the peer application (who wrote it and how was it tested?). So the probability to get an error is low enough to use a very rough recovery procedure:
close the connection
open a new one
restart after last packet correctly exchanged (if it makes sense) or simply continue sending new packets if you can
But the risk is much higher to get a physical disconnection, or a power outage anywhere in the network, not speaking in a flaw in application level implementations...
And do not forget to fully specify the byte order and the size of the crc and datasize...

Heartbleed bug: Why is it even possible to process the heartbeat request before the payload is delivered?

First, I am no C programmer and the OpenSSL codebase is huge, so forgive me for asking a question that I could probably find the answer to, given I had the time and skill to dig through the code.
TLS runs over TCP from what I can tell. TCP is stream oriented, so there is no way to know when a message has been delivered. You must know in advance how long the incoming message should be or have a delimiter to scan for.
With that in mind, how is it possible for OpenSSL to process a heartbeat request before the full payload has been received?
If OpenSSL just starts processing the first chunk of data it reads from the TCP socket after the payload length is received, then OpenSSL would appear to be not just insecure, but broken under normal operation. Since the maximum segment size of TCP is 536 bytes, any payload larger than that would span multiple TCP segments and therefore potentially span multiple socket reads.
So the question is: How/Why can OpenSSL start processing a message that is yet to be delivered?
This is the definition of a heartbeat packet.
struct {
HeartbeatMessageType type;
uint16 payload_length;
opaque payload[HeartbeatMessage.payload_length];
opaque padding[padding_length];
} HeartbeatMessage;
Incorrect handling of the payload_length field is what caused the heartbleed bug.
However this whole packet is itself encapsulated within another record that has it's own payload length, looking roughly like this:
struct {
ContentType type;
ProtocolVersion version;
uint16 length;
opaque fragment[TLSPlaintext.length];
} TLSPlaintext;
The struct HeartbeatMessage is placed inside the above fragment.
So one whole TLS "packet" can be processed when the data according to the length field here has arrived, but in the inner Heartbeat message, openssl failed to validate its payload_length.
Here's a screenshot of a packet capture, in which you can see the outer length of 3 specifies the length of a "packet", and the inner (wrong) payload length of 16384 is what caused the exploit, as openssl failed to validate this against the actual received length of the packet.
Ofcourse, similar care must be taken when processing the length field of this outer record, you really want to make sure you have actually received length data before beginning to process/parse the content of the packet.
Note also that there's not a particular correlation between socket reads and TCP segments, 1 socket read can read many segments, or just part of a segment. To the application, TCP is just a byte stream, and one socket read could read just up to half the length field of one TLSPlaintext packet, or it could read several whole TLSPlaintext packets.
The Heartbleed Wikipedia article explains the exploit quite well. To paraphrase, RFC 6520 is an extension to the TLS protocol for a "Heartbeat Request" message (a kind of keep-alive mechanism). The request consists of a 16 bit length field and a message to match, and the response is supposed to echo the provided message. OpenSSL's implementation has a bug that does not perform bounds checking. It accepts this length field at face value without checking to see if it is reading into something it shouldn't be (i.e., beyond the boundary indicated by the SSL Record). The exploit occurs when the "Heartbeat Request" is malformed with a small message but a large value in the length field. This allows a malicious client to attempt to read information out of the server that would otherwise not have been read (this information would come back in the response). What is actually in that information depends on how things got stored into memory on the server, but the potential for reading sensitive information is considered catastrophic for OpenSSL, which is supposed to provide a secure platform.

Padding data over TCP

I am working on a client-server project and need to implement a logic where I need to check whether I have received the last data over a TCP socket connection, before I proceed.
To make sure that I have received all the data , I am planning to pad a flag to the last packet sent.I had two options in mind as below and also related prob.
i. Use a struct as below and populate the vst_pad for the last packet sent and check the same on the recv side for its presence. The advantage over option two is that, I dont have to remove the flag from actual data before writing it to a file.Just check the first member of the struct
typedef struct
{
/* String holding padding for last packet when socket is changed */
char vst_pad[10];
/* Pointer to data being transmitted */
char *vst_data;
//unsigned char vst_data[1];
} st_packetData;
The problem is I have to serialize the struct on every send call. Also I am not sure whether I will receive the entire struct over TCP in one recv call and so have to add logic/overhead to check this every time. I have implemented this so far but figured it later that stream based TCP may not guarantee to recv entire struct in one call.
ii. Use function like strncat to add that flag at the end to the last data being sent.
The prob is I have to check on every receive call either using regex functions or function like strstr for the presence of that flag and if so have to remove it from the data.
This application is going to be used for large data transfers and hence want to add minimal overhead on every send/recv/read/write call. Would really appreciate to know if there is a better option then the above two or any other option to check the receipt of last packet. The program is multithreaded.
Edit: I do not know the total size of file I am going to send, but I am sending fixed amount of data. That is fgets read until the size specified -1 or until a new line is encountered.
Do you know the size of the data in advance, and is it a requirement that you implement a end of message flag?
Because I would simplify the design, add a 4-byte header (assuming you're not sending more than 4gb of data per message), that contains the expected size of the message.
Thus you parse out the first 4 bytes, calculate the size, then continue calling recv until you get that much data.
You'll need to handle the case where your recv call gets data from the next message, and obviously error handling.
Another issue not raised with your 10byte pad solution is what happens if the actual message contains 10 zero bytes--assuming you're padding it with zeros? You'd need to escape the 10bytes of zeros otherwise you may mistakenly truncate the message.
Using a fixed sized header and a known size value will alleviate this problem.
For a message (data packet) first send a short (in network order) of the size, followed by the data. This can be achieved in one write system call.
On the reception end, just read the short and convert back into host order (this will enable one to use different processors at a later state. You can then read the rest of the data.
In such cases, it's common to block up the data into chunks and provide a chunk header as well as a trailer. The header contains the length of the data in the chunk and so the peer knows when the trailer is expected - all it has to do is count rx bytes and then check for a valid trailer. The chunks allow large data transfers without huge buffers at both ends.
It's no great hassle to add a 'status' byte in the header that can identify the last chunk.
An alternative is to open another data connection, stream the entire serialization and then close this data connection, (like FTP does).
Could you make use of an open source network communication library written in C#? If so checkout networkComms.net.
If this is truely the last data sent by your application, use shutdown(socket, SHUT_WR); on the sender side.
This will set the FIN TCP flag, which signals that the sender->receiver stream is over. The receiver will know this because his recv() will return 0 (just like an EOF condition) when everything has been received. The receiver can still send data afterward, and the sender can still listen for them, but it cannot send more using this connection.

Resources