Identifying Frame boundaries in RTP stream

Identifying Frame boundaries in RTP stream - c

I have few doubts regarding frame boundaries in RTP packets.
First, If the marker bit is set, does it say that a new frame has begun(this is what I understand from RFC 3551)?
Second, According to what I read a frame starts with I-frame followed by P, B frames. Now, which field indicates this? And is the I frame has the marker bit set?
Third, If I need to find the start and end of a frame, would the check for marker bit suffice?
Thanks!

The RTP entry on the Wireshark Wiki provides a lot of information, including (edit) sample captures. You could exlore it, and it might answer some of your questions. If you're going to write code to work with RTP, Wireshark is useful for monitoring/debugging.
Edit For your first question about Marker bit, this FAQ might help. Also, finding the frames (I, P, B) depends on the payload. There's another question here that has an answer showing how I, P, B are found for MPEG. The h263-over-rtp.pcap has examples with I and P frames for H.263.

This in an old question but I think it is a good one.
As you mention I,P and B frames, in 2012 you are likely referring to H.264 over RTP.
According to [rfc6184]1 , the marker bit is set on the last packet of a frame , so indeed the marker bit can be used as an indicator of the end of 1 frame and the next packet in sequence will be the start of the next frame.
According to this rfc, all packets of a frame also have the same RTPTIME so changes in RTPTIME is another indicator of the ending of the previous frame and start of a new frame.
Things get more tricky when you lose packets. For example, let's say you lose packets 5 and 6 and that these were the last packet of frame 1 and the first packet of frame 2. You know to discard frame 1 because you never got a packet with a marker bit for that frame, but how can you know if frame 2 is whole or not. Maybe the 2 lost packets were both part of frame 1 or maybe the second packet was part of frame 2?
rfc6184 defines the start bit that is present in the first packet of a fragmented NAL unit. If the NAL unit is not fragmented then by definition, we got the whole NAL unit if we got the packet. This means that we can know if we got a full NAL unit. Unfortunately, this does not guarantee we have the full frame since a frame could contain multiple NAL units (e.g. multiple slices) and we may have lost the first one. I don't have a solution for this problem but maybe somone will provide one sometime in the next 10 years.

Related

FFmpeg: what does av_parser_parse2 do?

When sending h264 data for frame decoding, it seems like a common method is to first call av_parser_parse2 from the libav library on the raw data.
I looked for documentation but I couldn't find anything other than some example codes. Does it group up packets of data so that the resulting data starts out with NAL headers so it can be perceived a frame?
The following is a link to a sample code that uses av_parser_parse2:
https://github.com/DJI-Mobile-SDK-Tutorials/Android-VideoStreamDecodingSample/blob/master/android-videostreamdecodingsample/jni/dji_video_jni.c
I would appreciate if anyone could explain those library details to me or link me resources for better understanding.
Thank you.

It is like you guessed, av_parser_parse2() for H.264 consumes input data, looks for NAL start codes 0x000001 and checks the NAL unit type looking for frame starts and outputs the input data, but with a different framing.
That is it consumes the input data, ignores its framing by putting all consecutive data into a big buffer and then restores the framing from the H.264 byte stream alone, which is possible because of the start codes and the NAL unit types. It does not increase or decrease the amount of data given to it. If you get 30k out, you have put 30k in. But maybe you did it in little pieces of around 1500 bytes, the payload of the network packets you received.
Btw, when the function declaration is not documented well, it is a good idea to look at the implementation.
Just to recover the framing is not involved enough to call it parsing. But the H.264 parser in ffmpeg also gathers some more information from the H.264 stream, eg. whether it is interlaced, so it really deserves its name.
It however does not decode the image data of the H.264 stream.

DJI's video transmission does not guarantee the data in each packet belongs to a single video frame. Mostly a packet contains only part of the data needed for a single frame. It also does not guarantee that a packet contains data from one frame and not two consecutive frames.
Android's MediaCodec need to be queued with buffers, each holding the full data for a single frame.
This is where av_parser_parse2() comes in. It gathers packets until it can find enough data for a full frame. This frame is then sent to MediaCodec for decoding.

Sending variable sized packets over the network using TCP/IP

I want to send variable sized packets between 2 linux OSes over an internal network. The packet is variable sized and its length and CRC are indicated in the header which is also sent along with the packet. Something roughly like-
struct hdr {
uint32 crc;
uint32 dataSize;
void *data;
};
I'm using CRC at the application layer to overcome the inherent limitation of TCP checksums
The problem I have is, there is a chance that the dataSize field itself is corrupted, in which case, I dont know where the next packet starts? Cos at the reciver, when I read the socket buffer, I read n such packets next to one another. So dataSize is the only way I can get to the next packet correctly.
Some ideas I have is to-
Restart the connection if a CRC mismatch occurs.
Aggregate X such packets into one big packet of fixed size and discard the big packet if any CRC error is detected. The big packet is to make sure we lose <= sizeof of one packet in case of errors
Any other ideas for these variable sized packets?

Since TCP is stream based, data length is the generally used way to extract one full message for processing at the application. If you believe that the length byte itself is wrong for some reason, there is not much we can do except discard the packet,"flush" the connection and expect that the sender and receiver would re-sync. But the best is to disconnect the line unless, there is a protocol at the application layer to get to re-sync the connection.
Another method other than length bytes would be to use markers. Start-of-Message and End-of-Message. Application when encountering Start-of-Message should start collecting data until End-of-Message byte is received and then further process the message. This requires that the message escapes the markers appropriately.

I think that you are dealing with second order error possibilities, when major risk is somewhere else.
When we used serial line transmissions, errors were frequent (one or two every several kBytes). We used good old Kermit with a CRC and a packet size of about 100 bytes and that was enough: I encountered many times a failed transfer because the line went off, but never a correct transfer with a bad file.
With current networks, unless you have very very poor lines, the hardware level is not that bad, and anyway the level 2 data link layer already has a checksum to control that each packet was not modified between 2 nodes. HDLC is commonly used at that level and it uses normaly a CRC16 or CRC32 checksum which is a very correct checksum.
So the checksum as TCP level is not meant to detect random errors in the byte stream, but simply as a last defense line for unexpected errors, for exemple if a router gets mad because of an electrical shock and sends full garbage. I do not have any statistical data on it, but I am pretty sure that the number of errors reaching the TCP level is already very very low. Said differently, do not worry about that: unless you are dealing with highly sensitive data - and in that case I would prefere to have two different channels, former for data, latter for a global checksum - TCP/IP is enough.
That being said, adding a control at the application level as an ultime defense is perfectly acceptable. It will only process the errors that could have been undetected at data link and TCP level, or more probably errors in the peer application (who wrote it and how was it tested?). So the probability to get an error is low enough to use a very rough recovery procedure:
close the connection
open a new one
restart after last packet correctly exchanged (if it makes sense) or simply continue sending new packets if you can
But the risk is much higher to get a physical disconnection, or a power outage anywhere in the network, not speaking in a flaw in application level implementations...
And do not forget to fully specify the byte order and the size of the crc and datasize...

audio packet type

I am starting working on a project that transmits G.711 audio over Ethernet, written in C (not C++) and running on Fedora 15. Rather than doing the smart thing and using RTP, I am using UDP to transfer the audio data.
To somewhat overcome the problem of re-ordering of packets I have decided to use a struct in the body of each packet that looks a bit like this:
struct payload {
char cc;
char audio_data[160];
};
The variable "cc" is a continuity counter that runs from 0-15, and when the packet arrives at the recipient it is put into an array of these structs based upon the value of the cc. The audio output routine then loops through this array and plays the data.
My question is, is this the best way to package the audio? The output array will end up being two-dimensional and surely it will be slow reading through each member and writing that to the output? Is there a way to define a type that is 160 bytes wide that i can just write to the audio interface at the other end?
Any suggestions would be greatly appreciated, as would links to helpful resources on ALSA (which seem to be very rare!)
Josh

Do you worry about cache optimization? I hope you profiled this simpler approach before complicating it. If cache misses is a real problem I would suggest to use ring (circular) buffer. It will be your jitter buffer too. This gives you fixed memory footprint and consecutive memory for faster access.
Since G.711 is constant bitrate codec, you can choose buffer size in time units (200 ms for conversation). You play always the last packet in the buffer. For example, the last packet you received has cc = n, then you receive cc = m (> n). So you mark all packets between n and m as missing and replace them if you receive them later.

Tell libavcodec/ffmpeg to drop frame

I'm building an app in which I create a video.
Problem is, sometime (well... most of the time) the frame acquisition process isn't quick enough.
What I'm currently doing is to skip the current frame acquisition if I'm late, however FFMPEG/libavcodec considers every frame I pass to it as the next frame in line, so If I drop 1 out of 2 frames, a 20seconds video will only last 10. More problems come in as soon as I add sound, since sound processing is way faster...
What I'd like would be to tell FFMPEG : "last frame should last twice longer that originally intended", or anything that could allow me to process in real time.
I tried to stack the frames at a point, but this ends up killing all my memory (I also tried to 'stack' my frames in the hard drive, which was way to slow, as I expected)
I guess I'll have to work with the pts manually, but all my attempts have failed, and reading some other apps code which use ffmpeg, such as VLC, wasn't of a great help... so any advice would be much appreciated!
Thanks a lot in advance!

your output will probably be considered variable framerate (vfr), but you can simply generate a timestamp using wallclock time when a frame arrives and apply it to your AVFrame before encoding it. then the frame will be displayed at the correct time on playback.
for an example of how to do this (at least the specifying your own timestamp part), see doc/examples/muxing.c in the ffmpeg distribution (line 491 in my current git pull):
frame->pts += av_rescale_q(1, video_st->codec->time_base, video_st->time_base);
here the author is incrementing the frame timestamp by 1 in the video codec's timebase rescaled to the video stream's timebase, but in your case you can simply rescale the number of seconds since you started capturing frames from an arbitrary timebase to your output video stream's timebase (as in the above example). for example, if your arbitrary timebase is 1/1000, and you receive a frame 0.25 seconds since you started capturing, then do this:
AVRational my_timebase = {1, 1000};
frame->pts = av_rescale_q(250, my_timebase, avstream->time_base);
then encode the frame as usual.

Many (most?) video formats don't permit leaving out frames. Instead, try reusing old video frames when you can't get a fresh one in time.

Just an idea.. when it's lagging with the processing have you tried to pass to it the same frame again (and drop the current one)? Maybe it can process the duplicated frame quickly.

There's this ffmpeg command line switch -threads ... for multicore processing, so you should be able to do something similar with the API (though I have no idea how). This might solve your problem.

Wireshark Dissector: How to Identify Missing UDP Frames?

How do you identify missing UDP frames in a custom Wireshark dissector?
I have written a custom dissector for the CQS feed (reference page). One of our servers gaps when receiving this feed. According to Wireshark, some UDP frames are never received. I know that the frames were sent because all of our other servers are gap-free.
A CQS frame consists of multiple messages, each having its own sequence number. My custom dissector provides the following data to Wireshark:
cqs.frame_gaps - the number of gaps within a UDP frame (always zero)
cqs.frame_first_seq - the first sequence number in a UDP frame
cqs.frame_expected_seq - the first sequence number expected in the next UDP frame
cqs.frame_msg_count - the number of messages in this UDP frame
And I am displaying each of these values in custom columns, as shown in this screenshot:
I tried adding code to my dissector that simply saves the last-processed sequence number (as a local static), and flags gaps when the dissector processes a frame where current_sequence != (previous_sequence + 1). This did not work because the dissector can be called in random-access order, depending on where you click in the GUI. So you could process frame 10, then frame 15, then frame 11, etc.
Is there any way for my dissector to know if the frame that came before it (or the frame that follows) is missing?
The dissector is written in C.
(See also a companion post on serverfault.com)

You should keep in mind that Wireshark does dissection multiple times. First time it dissects packets in strict order when you load file. Then it calls dissectors when you scroll packet_tree_view or select a packet to build it's tree.
You can check if a dissector is called fot ther first time:
if (PINFO_IS_VISITED(pinfo)) { ... };
Your dissector should behave differently for the first and for the next dissections.
At first dissection you have to store some information for each packet (in a hash table for example) as it's sequence number and if it is out of order. You will need it to build packet tree properly when you are called second time.

I don't konw if you can peek into previous or following frames, but when Wireshark is loading a tcpdump it will call your dissector on each of the frames in order. So I could add a static local variable which is an array or hash table and simply store your values in there. Then your dissector can check that array for previous and following frames and do its analysis.
You should look at that pinfo vairable, that's one of the function arguments for information about the frame number, IP information etc.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight