Silverlight MediaElement requires many H.264 frames to render one image - silverlight

I am working on a Silverlight application that implements a custom MediaStreamSource that is feeding a MediaElement with elementary H.264 NAL units. My stream starts off with a key frame, but I noticed that on average it takes about 20 frames to render one image. I have my GOP size set to 8 on my H.264 encoder. This video is coming from a security camera and is being viewed in a live stream application.
The main issue is that this induces a decent amount of latency between when events happen in real life and when the image is actually rendered. A small latency is expected, but it turns out to be about 3 seconds from receiving the first frame until the first image is rendered. Shouldn't the first key frame theoretically contain enough information to decode an image? For testing I have a sample Silverlight application that loads a captured H.264 stream from a file and internally buffers the entire file, this way once the MediaStreamSource opens, it immediately consumes ~20 frames and renders the image with virtually no latency.
Any Microsoft / Silverlight / H.264 experts care to elaborate on why this might possibly be happening?

Related

How do I capture the audio of a wpf window or cscore output in c#?

I made a music player in wpf using cscore. Now, I want to add a feature so I can stream the output in real-time (like a radio) to another instance of the music player over internet. I could see how to stream the data later, but first I need to know how to get the bytes of the audio output. I'm asking for help because I'm lost, I've done some research and found nothing but how to stream the desktop audio. That's not a solution, because I want to listen to the same music with some friends while hanging out on Discord, so if I stream the desktop audio, they will listen to themselves besides the music. Any help will be welcome. Thanks in advance!
I am have not used cscore I mainly use naudio a similar library that facilitates getting audio to and from the sound card. So I will try and answer in a way that allows you to find what you are looking for in cscore.
In your player code you will be pulling data from the audio file. In naudio this is done with a audio file reader. I think it is called a wavFileReader in cscore, This file reader translates the audio file into a stream of audio samples in the form of byte arrays, the byte arrays are then used to feed the WASAPI Out to allow the audio to play on the sound card.
The ideal place to start with your streaming system would be in the middle of those two processes. So rather than just passing the audio samples to the sound card you need to take a copy of the byte array containing the samples. it is this data you will need to stream to your friends.
From here you will need to look at compressing the audio and streaming protocols like RTP all can be done in c#. The issue will be, as it always is in audio having your data stream keep pace with the sound card. Every time WASAPIOut asks for more samples you need to have the ready otherwise the audio will be choppy.
I do hope this helps point you in the right direction. Others with experience with cscore may have some code examples to assist you more directly I am simply trying to point you in the right direction

How to create a video stream from a series of bitmaps and send it over IP network?

I have a bare-metal application running on a tiny 16 bit microcontroller (ST10) with 10BASE-T Ethernet (CS8900) and a Tcp/IP implementation based upon the EasyWeb project.
The application's main job is to control a led matrix display for public traffic passenger information. It generates display information with about about 41 fps and configurable display size of e.g. 160 × 32 pixel, 1 bit color depth (each led can be just either on or off).
Example:
There is a tiny webserver implemented, which provides the respective frame buffer content (equals to led matrix display content) as PNG or BMP for download (both uncompressed because of CPU load and 1 Bit color depth). So I can receive snapshots by e.g.:
wget http://$IP/content.png
or
wget http://$IP/content.bmp
or put appropriate html code into the controller's index.html to view that in a web browser.
I also could write html / javascript code to update that picture periodically, e.g. each second so that the user can see changes of the display content.
Now for the next step, I want to provide the display content as some kind of video stream and then put appropriate html code to my index.html or just open that "streaming URI" with e.g. vlc.
As my framebuffer bitmaps are built uncompressed, I expect a constant bitrate.
I'm not sure what's the best way to start with this.
(1) Which video format is the most easy to generate if I already have a PNG for each frame (but I have that PNG only for a couple of milliseconds and cannot buffer it for a longer time)?
Note that my target system is very resource restricted in both memory and computing power.
(2) Which way for distribution over IP?
I already have some tcp sockets open for listening on port 80. I could stream the video over HTTP (after received) by using chunked transfer encoding (each frame as an own chunk).
(Maybe HTTP Live Streaming doing like this?)
I'd also read about thinks like SCTP, RTP and RTSP but it looks like more work to implement this on my target. And as there is also the potential firewall drawback, I think I prefer HTTP for transport.
Please note, that the application is coded in plain C, without operating system or powerful libraries. All stuff is coded from the scratch, even the web server and PNG generation.
Edit 2017-09-14, tryout with APNG
As suggested by Nominal Animal, I gave a try with using APNG.
I'd extend my code to produce appropriate fcTL and fdAT chunks for each frame and provide that bla.apng with HTTP Content-Type image/apng.
After downloading those bla.apng it looks useful when e.g. opening in firefox or chrome (but not in
konqueror,
vlc,
dragon player,
gwenview).
Trying to stream that apng works nicely but only with firefox.
Chrome wants first to download the file completely.
So APNG might be a solution, but with the disadvantage that it currently only works with firefox. After further testing I found out, that 32 Bit versions of Firefox (55.0.2) crashing after about 1h of APNG playback were about 100 MiB of data has been transfered in this time. Looks that they don't discard old / obsolete frames.
Further restrictions: As APNG needs to have a 32 bit "sequence number" at each animation chunk (need 2 for each frame), there might to be a limit for the maximum playback duration. But for my frame rate of 24 ms this duration limit is at about 600 days and so I could live with.
Note that APNG mime type was specified by mozilla.org to be image/apng. But in my tests I found out that it's a bit better supported when my HTTP server delivers APNG with Content-Type image/png instead. E.g. Chromium and Safari on iOS will play my APNG files after download (but still not streaming). Even the wikipedia server delivers e.g. this beach ball APNG with Content-Type image/png.
Edit 2017-09-17, tryout with animated GIF
As also suggested by Nominal Animal, I now tried animated GIF.
Looks ok in some browsers and viewers after complete download (of e.g. 100 or 1000 frames).
Trying live streaming it looks ok in Firefox, Chrome, Opera, Rekonq and Safari (on macOS Sierra).
Not working Safari (on OSX El Capitan and iOS 10.3.1), Konqueror, vlc, dragon player, gwenview.
E.g. Safari (tested on iOS 10.3.3 and OSX El Capitan) first want to download the gif completely before display / playback.
Drawback of using GIF: For some reason (e.g. cpu usage) I don't want to implement data compression for the generated frame pictures. For e.g. PNG, I use uncompressed data in IDAT chunk and for a 160x32 PNG with 1 Bit color depth a got about 740 Byte for each frame. But when using GIF without compression, especially for 1 Bit black/white bitmaps, it blows up the pixel data by factor 3-4.
At first, embedded low-level devices not very friendly with very complex modern web browsers. It very bad idea to "connect" such sides. But if you have tech spec with this strong requirements...
MJPEG is well known for streaming video, but in your case it is very bad, as requires much CPU resources and produces bad compression ratio and high graphics quality impact. This is nature of jpeg compression - it's best with photographs (images with many gradients), but bad with pixel art (images with sharp lines).
Looks that they don't discard old / obsolete frames.
And this is correct behavior, since this is not video, but animation format and can be repeated! Exactly same will be with GIF format. Case with MJPEG may be better, as this is established as video stream.
If I were doing this project, I would do something like this:
No browser AT ALL. Write very simple native player with winapi or some low-level library to just create window, receive UDP packet and display binary data. In controller part, you must just fill udp packets and send it to client. UDP protocol is better for realtime streaming, it's drop packets (frames) in case of latency, very simple to maintain.
Stream with TCP, but raw data (1 bit per pixel). TCP will always produce some latency and caching, you can't avoid it. Same as before, but you don't need handshaking mechanism for starting video stream. Also, you can write your application in old good technologies like Flash and Applets, read raw socket and place your app in webpage.
You can try to stream AVI files with raw data over TCP (HTTP). Without indexes, it will unplayable almost everywhere, except VLC. Strange solution, but if you can't write client code and wand VLC - it will work.
You can write transcoder on intermediate server. For example, your controller sent UDP packets to this server, server transcode it in h264 and streams via RTMP to youtube... Your clients can play it with browsers, VLC, stream will in good quality upto few mbits/sec. But you need some server.
And finally, I think this is best solution: send to client only text, coordinates, animations and so on, everything what renders your controller. With Emscripten, you can convert your sources to JS and write exact same renderer in browser. As transport, you can use websockets or some tricks with long-long HTML page with multiple <script> elements, like we do in older days.
Please, tell me, which country/city have this public traffic passenger information display? It looks very cool. In my city every bus already have LED panel, but it just shows static text, it's just awful that the huge potential of the devices is not used.
Have you tried just piping this through a websocket and handling the binary data in javascript?
Every websocket frame sent would match a frame of your animation.
you would then take this data and draw it into an html canvas. This would work on every browser with websocket support - which would be quite a lot - and would give you all the flexibility you need. (and the player could be more high end than the "encoder" in the embedded device)

measure video playback delay on local computer

We need to determine the delay imposed by hardware and software components of a video being shown on a local computer.
What I mean with delay is, the time spent decoding a frame, sending it over the wire (HDMI) to the monitor and the time it takes the monitor to show the frame (i.e. display lag).
We are measuring the user reactions to visual stimuli in the video and hence need to know the delay caused by the hardware/software toolchain when a playing a video. At the end of the day the delay shall be minimized.
We are currently playing the video with .net WPF MediaElement.
The video itself:
codec: H264 - MPEG-4 AVC (part 10)(avc1)
resolution: 720*578
25 FPS
decoded format: Planar 4:2:0 YUV
I'd especially interested in:
how to minimize the delay
the penalty imposed by the managed environment
how ffmeg or the microsoft media foundation coudl reduce delay

Timestamp for v4l2 image capture

I have a Linux application that processes camera images. Currently I provide buffers to the v4l2 kernel subsystem that are filled with image data.
However I need to know, as exact as possible, when this frame was captured (by the camera). With buffers, I may not know precisely when this happened as I may not be able to process all frames in a timely manner (i.e. I may request an image at a time when it is already available for a few milliseconds).
What I am looking for is a way to determine (or estimate) the time an image was captured (or the age of it), e.g. by having the kernel record it somehow, or in worst case by not having images streamed to me but rather only sent upon my explicit request.
Environment: UVC web camera, Linux kernel 2.6.3x, V4L2 API
The v4l2_buffer structure has a timestamp field. But see also this question: Where does v4l2_buffer->timestamp value starts counting?

Difference in CPU & memory utilization while using VLC Mozilla plugin and VLC player for playback of RTSP streams

For one of our ongoing projects we were planning to use some multimedia framework like VLC / Gstreamer to capture and playback / render h.264 encoded rtsp streams. For the same we have been observing the performance (CPU & memory utilization) of VLC using two demo applications that we have built. One of the demo application uses the mozilla vlc plugin using which we have embedded up to four h.264 encoded RTSP streams on a single html webpage while the other demo application simply invoked the vlc player and plays a single h.264 encoded rtsp stream.
I was surprised to observe that the results were as under (Tests were conducted on Ubuntu 11.04):
Demo 2 (Mozilla VLC plugin - 4 parallel streams)
CPU utilization: 16%
Memory utilization: ~61MB
Demo 2 (VLC player - 1 stream)
CPU utilization: 16%
Memory utilization: ~17MB
My question is, why is the CPU utilization lesser for the mozilla VLC plugin even though it is decoding more video streams.
Reply awaited.
Regards,
Saurabh Gandhi
I'm also using VLC mozilla plugin for my project and I have problem with h264 streams. The only way to handle such stream was to use --ffmpeg-hw (for vaapi use) which due Xlib works only in standalone VLC app (--no-xlib flag in vlcplugin_base.cpp). So I removed that flag and added XInitThreads() and it works now BUT far from performance level you had and besides no-xlib flag was there for reason (it might come to some unwanted behavior).
So the main question is HOW did you come to such results and if its possible to share your configuration flags with me and the rest.
The system I'm using is 4 core CPU and nvidia ION graphics. CPU cores stay at moderate level but stream on fullscreen doesn't play smoothly. If the same streams gets run in cvlc it works perfect. ffmpeg-hw flag is used in both accounts without any warning messages (vaapi successfully returns).
If you have hardware acceleration of some sort, then CPU only takes care of routing the data..

Resources