FFmpeg filter for selecting video/audio streams - c

I am trying to create a node (a collection of nodes is fine too), that takes in many streams and an index, and outputs one stream specified by the index. Basically, I want to create a mux node, something like:
Node : Stream ... Number -> Stream
FFmpeg's filter graph API seems to have two filters for doing that: streamselect (for video) and astreamselect (for audio). And for the most part, they seem to do what I want:
[in0][in1][in2]streamselect=inputs=3:map=1[out]
This stream will take in three video streams, and output the second one in1.
You can use a similar filter for audio streams:
[in0][in1]astreamselect=inputs=2:map=0[out]
Which will take in two streams and output the first one in0.
The question is, can I create a filter that takes in a list of both audio and video streams and outputs the stream based only on the stream index? So something like:
[v0][v1][a0][a1][a2]avstreamselect=inputs=5:map=3[out]
Which maps a1 to out?
If it helps I am using the libavfilter C API rather than the command line interface.

While it may not be possible with one filter1, it is certainly possible to do this by combining multiple filters, one for either audio or video (depending on which one you are selecting), and a bunch of nullsink or anullsink filters for the rest of them.
For example, the would be filter:
[v0][v1][a0][a1]avstreamselect=inputs=4:map=2[out]
which takes in two video streams and two audio streams, and returns the third stream (the first audio stream), can be written as:
[a0][a1]astreamselect=inputs=2:map=0[out];
[v0]nullsink;[v1]nullsink
Here, we run select for the first stream, and all of the remaining ones are mapped to sinks. This idea could potentially be generalized to only use nullsink, anullsink, copy, and acopy, for example, we could have also written it with 4 nodes:
[a0]acopy[out];
[a1]anullsink;
[v0]nullsink;
[v1]nullsink
1I still don't know if it is or not. Feel free to remove this if it actually is possible.

Related

Akka.Net Stream Ordering (Stream Ordering Over the Network)

Does Akka.net Streams preserve input order of elements? If so, does this hold true when working with reactive streams over the network (when using stream refs)?
Current Version of Akka.Streams being used is 1.4.39
Unfortunately, I was unable to find a definitive answer in the Akka.net Documentation.
After further reading I found my answer.
https://getakka.net/articles/streams/basics.html#stream-ordering
In Akka Streams almost all computation stages preserve input order of elements. This means that if inputs {IA1,IA2,...,IAn} "cause" outputs {OA1,OA2,...,OAk} and inputs {IB1,IB2,...,IBm} "cause" outputs {OB1,OB2,...,OBl} and all of IAi happened before all IBi then OAi happens before OBi.
This property is even uphold by async operations such as SelectAsync, however an unordered version exists called SelectAsyncUnordered which does not preserve this ordering.
...

Extract motion vectors from versatile video coding

How do I go about extracting motion vector into a .txt or .xml file from VVC VTM reference software. I managed to extract the motion vectors to a text file but I don't have a proper index indicating which motion vector belongs where. If anyone could guide me on getting proper index along with motion vectors, that would be very helpful.
Are you doing it at the encoder side?
If so, I suggest that you move to the decoder side and do this:
Encode the sequence from which you want to extract MVs.
Modify the decoder so it prints the MV of each coding unit, if any (e.g. not intra). To do so, you may go to CABAC Reader.cpp file, somewhere inside coding_unit() function, and find the place where MV is parsed. There, in addition to the parsed MV, you have access to coordinates of the ongoing CU.
Decode your encoded bitstream with the modified VTM decoder and print what you wanted to be printed.
As Mosen's answer, I recommend you to extract any information(include MVs) from the decoder.
If you just want to extract MVs to file, you may utilize traverseCU().
VTM's picture class has CodingStructure class which traverses all CUs in picture(even CTU or CU can be treated as CodingStructure class, so you can use traverseCU() at block level too).
So I suggest you to
Access picture class(its name might be different, e.g., m_pcPic at DecLib.cpp) at the decoder side(insert you code before/after execute loop filters).
Iterate each CUs in picutre by using traverseCU().
Extract MVs from every CU you accessed, and save those information(MVs, indices, etc.)
Although there might be better ways to answer your question, i hope this answer helps you.

How can I get current microphone input level with C WinAPI?

Using Windows API, I want to implement something like following:
i.e. Getting current microphone input level.
I am not allowed to use external audio libraries, but I can use Windows libraries. So I tried using waveIn functions, but I do not know how to process audio input data in real time.
This is the method I am currently using:
Record for 100 milliseconds
Select highest value from the recorded data buffer
Repeat forever
But I think this is way too hacky, and not a recommended way. How can I do this properly?
Having built a tuning wizard for a very dated, but well known, A/V conferencing applicaiton, what you describe is nearly identical to what I did.
A few considerations:
Enqueue 5 to 10 of those 100ms buffers into the audio device via waveInAddBuffer. IIRC, when the waveIn queue goes empty, weird things happen. Then as the waveInProc callbacks occurs, search for the sample with the highest absolute value in the completed buffer as you describe. Then plot that onto your visualization. Requeue the completed buffers.
It might seem obvious to map the sample value as follows onto your visualization linearly.
For example, to plot a 16-bit sample
// convert sample magnitude from 0..32768 to 0..N
length = (sample * N) / 32768;
DrawLine(length);
But then when you speak into the microphone, that visualization won't seem as "active" or "vibrant".
But a better approach would be to give more strength to those lower energy samples. Easy way to do this is to replot along the μ-law curve (or use a table lookup).
length = (sample * N) / 32768;
length = log(1+length)/log(N);
length = max(length,N)
DrawLine(length);
You can tweak the above approach to whatever looks good.
Instead of computing the values yourself, you can rely on values from Windows. This is actually the values displayed in your screenshot from the Windows Settings.
See the following sample for the IAudioMeterInformation interface:
https://learn.microsoft.com/en-us/windows/win32/coreaudio/peak-meters.
It is made for the playback but you can use it for capture also.
Some remarks, if you open the IAudioMeterInformation for a microphone but no application opened a stream from this microphone, then the level will be 0.
It means that while you want to display your microphone peak meter, you will need to open a microphone stream, like you already did.
Also read the documentation about IAudioMeterInformation it may not be what you need as it is the peak value. It depends on what you want to do with it.

0xC00D4A44 MF_E_SINK_NO_SAMPLES_PROCESSED with MPEG 4 sink

I am running out of ideas on why I am getting this HRESULT.
I have a pipeline in Media Foundation. A file is loaded through the source resolver. I am using the media session.
Here is my general pipeline:
Source Reader -> Decoder -> Color Converter (to RGB24) -> Custom MFT -> Color Converter (To YUY2) -> H264 Encoder -> Mpeg 4 Sink
In my custom MFT I do some editing to the frames. One of the tasks of the MFT is to filter samples and drop the undesired ones.
This pipeline is used to trim video and output an MP4 file.
For example if the user wants to trim 3 seconds from the 10 second marker, my MFT will read the uncompressed sample time and discard it by asking for more samples. If a sample is in range, it will be passed to the next color converter. My MFT handles frames in RGB24, hence the reason for the initial color converter. The second color converter transforms the color space for the H264 encoder. I am using the High Profile Level 4.1 encoder.
The pipeline gets setup properly. All of the frames get passed to the sink and I have a wrapper for the MPEG4 sink. I see that the BeginFinalize and EndFinalize gets called.
However on some of my trim operations, the EndFinalize with spit out the MF_E_SINK_NO_SAMPLES_PROCESSED. I think it is random. It usually happens when a range not close to the beginning is selected.
It might be due to sample times. I am rebasing the sample times and duration.
For example, if the adjusted frame duration is 50ms (selected by user), I will grab the first acceptable sample (let's say 1500ms) and rebase it to 0. The next one will be 1550ms in my MFT and then set to 50ms and so on. So frame times are set in 50ms increments.
Is this approach correct? Could it be that the sink is not receiving enough samples to write the headers and finalize the file?
As mentioned, it work in some cases and it fails in most. I am running my code on Windows 10.
I tried to implement the same task using IMFMediaSession/IMFtopology, but had the same problems you faced. I think that IMFMediaSession either modifies the timestamps outside your MFT, or expects them not to be modified by your MFT.
So in order to make this work, I took the IMFSourceReader->IMFSinkWriter approach.
This way I could modify the timestmaps of the samples read from the reader and pass to the writer only those that fall into the given range.
Furthermore, you can take a look at the old MFCopy example. It does exactly the file trimming as you described it. You can download it from here: https://sourceforge.net/projects/mfnode/

.obj file format - alternates between different data types

I'm writing a method to parse the data in wavefront obj files and I understand the format for the most part, however some things are still a bit confusing to me. For instance, I would have expected most files to list all the vertices first, followed by the texture and normal map coordinates and then the face indices. However, some files that I have opened alternate between these different sections. For instance, one .obj file I have of the Venus de Milo (obtained here: http://graphics.im.ntu.edu.tw/~robin/courses/cg03/model/ ) starts off with the vertices (v), then does normal coordinates (vn), then faces (f), then defines more vertices, normals and faces again. Why is the file broken up into two sections like this? Why not list all the vertices up front? Is this meant to signify that there are multiple segments to the mesh? If so, how do I deal with this?
Because this is how the file format was designed. There is no requirement for a specific ordering of the data inside the OBJ, so each modelling package writes it in its own way. Here is one brief summary of the file format, if you haven't read this one yet.
That said, the OBJ format is quite outdated and doesn't support animation by default. It is useful for exchanging of static meshes between modelling tools but not much else. If you need a more robust and modern file format, I'd suggest taking a look at the Collada format or the FBX.
not an direct answer but it will be unreadable in comment
I do not use this file-format but mesh segmentation is usually done for these reasons:
more easy management of the model for editing
separation of parts of model with different material or texture properties
mainly to speed up the rendering by cut down unnecessary material or texture switching
if the mesh has dynamically moving parts then they must be separated
Most 3D mesh file formats contains also transform matrix for each mesh part and some even an skeleton hierarchy
Now how to handle segmented meshes:
if your engine supports only unsegmented models then merge all parts together
This will loose all the advantages of segmented mesh. Do not forget to apply transform matrices of sub segments before merging
or you can implement mesh segmentation into your model class
By adding model hierarchy , transform matrices , ...
Now how to handle mixed model fileformat:
scan file for all necessary chunks of data
remember if they are present
also store their size,and start address in file
and do not forget that there may be more that one chunk of the same data type
preallocate space for all data you need
load/merge all data you need
load chunks of data to you model classes or merge it to single model
of course check if all data needed id present like number of points match number of normals or texture coords ...

Resources