How do I detect whether the sample supplied by VideoSink.OnSample() is right-side up? - silverlight

We're currently using the Silverlight VideoSink to capture video from users' local webcams, kinda like so:
protected override void OnSample(long sampleTime, long frameDuration, byte[] sampleData)
{
if (FrameShouldBeSubmitted())
{
byte[] resampledData = ResizeFrame(sampleData);
mediaController.SetVideoFrame(resampledData);
}
}
Now, on most of the machines that we've tested, the video sample provided in the byte[] sampleData parameter is upside-down, i.e., if you try to take the RGBA data and turn it into, say, a WriteableBitmap, the bitmap will be upside-down. That's odd, but fairly easy to correct, of course -- you just have to reverse the array as you encode it.
The problem is that at least on some machines (e.g., the single Macintosh in our test environment), the video sample provided is no longer upside-down, but right-side up, and hence, flipping the image actually results in an image that's received upside-down on the far side.
I reported this to MS as a bug, but their (terse) response was that it was "As Designed". Further attempts at clarification have so far been ignored.
Now, I'll grant that it's kinda entertaining to imagine the discussions behind this design decision: "OK, just to make it interesting, let's play the video rightside up on a Mac, but let's turn it upside down for Windows!" "Great idea!" "Yeah, that'll keep those developers guessing!" But beyond that, I can't find this, umm, "feature" documented anywhere, nor can I find any documentation on how one is supposed to be able to tell that a given video sample is upside down or rightside up. Any thoughts on how to tell this?
EDIT 3/29/10 4:50 pm - I got a response from MS which said that the appropriate way to tell was through the Stride property on the VideoFormat object, i.e., if the stride value is negative, the image will be upside-down. However, my own testing indicates that unless I'm doing something wrong, this isn't the case. At least on my own machine, whether the stride value is zero or negative (the only options I see), the sampled image is still upside-down.

I was going to suggest looking at VideoFormat.Stride provided at VideoSink.OnFormatChange but then I noticed your edit. I went ahead and tested it at my dev machine, image is upside down and stride is negative as expected. Have you checked again recently?
Even though stride made perfect sense for native applications (using stride at pointer operations), I agree that current behavior is not what you expect from a modern API. However performance wise, it is better not to make changes on data received from native API.
Yet at this point, while we are talking about performance, why not provide samples in formats other than PixelFormatType.Format32bppArgb so that we can avoid color space conversion? BTW, there is a VideoCaptureDevice.DesiredFormat property which only works for resolution as there is no alternative to PixelFormatType.Format32bppArgb.

Related

How can I get current microphone input level with C WinAPI?

Using Windows API, I want to implement something like following:
i.e. Getting current microphone input level.
I am not allowed to use external audio libraries, but I can use Windows libraries. So I tried using waveIn functions, but I do not know how to process audio input data in real time.
This is the method I am currently using:
Record for 100 milliseconds
Select highest value from the recorded data buffer
Repeat forever
But I think this is way too hacky, and not a recommended way. How can I do this properly?
Having built a tuning wizard for a very dated, but well known, A/V conferencing applicaiton, what you describe is nearly identical to what I did.
A few considerations:
Enqueue 5 to 10 of those 100ms buffers into the audio device via waveInAddBuffer. IIRC, when the waveIn queue goes empty, weird things happen. Then as the waveInProc callbacks occurs, search for the sample with the highest absolute value in the completed buffer as you describe. Then plot that onto your visualization. Requeue the completed buffers.
It might seem obvious to map the sample value as follows onto your visualization linearly.
For example, to plot a 16-bit sample
// convert sample magnitude from 0..32768 to 0..N
length = (sample * N) / 32768;
DrawLine(length);
But then when you speak into the microphone, that visualization won't seem as "active" or "vibrant".
But a better approach would be to give more strength to those lower energy samples. Easy way to do this is to replot along the μ-law curve (or use a table lookup).
length = (sample * N) / 32768;
length = log(1+length)/log(N);
length = max(length,N)
DrawLine(length);
You can tweak the above approach to whatever looks good.
Instead of computing the values yourself, you can rely on values from Windows. This is actually the values displayed in your screenshot from the Windows Settings.
See the following sample for the IAudioMeterInformation interface:
https://learn.microsoft.com/en-us/windows/win32/coreaudio/peak-meters.
It is made for the playback but you can use it for capture also.
Some remarks, if you open the IAudioMeterInformation for a microphone but no application opened a stream from this microphone, then the level will be 0.
It means that while you want to display your microphone peak meter, you will need to open a microphone stream, like you already did.
Also read the documentation about IAudioMeterInformation it may not be what you need as it is the peak value. It depends on what you want to do with it.

changing HM reference software to display some information about the bitstream

I am very new to the HM HEVC (and the JEM) reference software, and I am currently trying to understand the source code. I want to add some lines to display for each component: name of Algo (i.e. inter/intra Algos) + length of the bitstream+ position in output bin file.
To know which component cost more bits to code and how codec is working. I want to do same thing for the JEM also after that.
my problem first is that I am unable of understanding a lot of function there, the comment is not sufficient, so is there any references to understand the code??!! (I already read the Manuel ,doesn’t help).
2nd I don’t know where & how exactly to add these lines; is it in TEncGOP, TEncSlice or TEncCU. Ps: I don’t think in TEncGOP.compressGOP so maybe in the 2 other classes.
(I put the answer to comment that #Mourad put four hours ago here, becuase it will be long)
I assume that you could manage to find where the actual encoding after the RDO loop is implemented. As you correctly mentioned, xEncodeCU is the function you need to refer to make sure you are not still in the RDO.
Now you need to find the exact function in xEncodeCU that is responsible for your target codec tool.
For instance, if you want to count the number of bits for coefficient coding, you should be looking into the m_pcEntropyCoder->encodeCoeff() (It's a JEM function and may have different name in the HM). Once you find this line in the xEncodeCU, you may do this and get the number of bits written inside encodeCoeff() function:
UInt b_before = m_pcEntropyCoder->getNumberOfWrittenBits();
m_pcEntropyCoder->encodeCoeff( ... );
UInt b_after = m_pcEntropyCoder->getNumberOfWrittenBits();
UInt writtenBitsCoeff = b_after - b_before;
One important point: as you cas see, the function getNumberOfWrittenBits() gives you integer rates, which is obtained by rounding sum of fractional rates corresponding to all syntax elements coded inside the function encodeCoeff. This error might or might not be acceptable, depending on your problem. For example, if instead of coefficient coding rate, you wanted to know the rate of CBF, then this error would not be acceptable at all. Because, CBF rate is mostly less than one bit. If this is your case, then you would need to calculate the fractional bits one-by-one. It would be totally different and relatively more complicated than this.
Point 1: There is one rule of tumb that logging coding decisions (e.g. pred mode, MV, IPM, block size) is much easier at the decoder side than encoder. This is because of the fact that you have super complicated RDO process at the encoder side that can easily make you get lost in the loops. But at the decoder side, everything appears only once. However, if you insist on doing it at the encoder side, you may find some tips here: Get some information from HEVC reference software
Point 2: Unlike coding decisions, logging rate (i.e. number of written bits for different syntax elements) is more complicated at the decoder side than encoder. This is particularly true for fractional bits associated to anything that is encoded in non-EP mode (i.e. with CABAC contexts). So you may do this part at the ecoder side. But I am afraid it is not easy.
Point 3: I think the best way to understand the code is to read it line-by-line. It's very time-consuming but if you theoritically know the standard(s), you will probably be able to distiguish important parts and ignore the rest.
PS: I think there are too many questions, mostly too general, in your post. It makes it a bit difficult for me to answer them all together. So you I'll wait for you to take your next step and ask more precise questions.

Making OpenGL polygonOffset and DirectX9 depthBias behave the same

For our multiplatform engine that supports both OpenGL and DirectX9 I am adding support for decals. In OpenGL I can set glPolygonOffset(-1.0f, -1.0f) to fix z-fighting between the wall and the decals. I want the DirectX version to behave exactly the same, so I call this:
float offsetFloat = -1.0f;
DWORD offsetDWord = *((DWORD*)&offsetFloat);
device->SetRenderState(D3DRS_DEPTHBIAS, offsetDWord);
device->SetRenderState(D3DRS_SLOPESCALEDEPTHBIAS, offsetDWord);
However, this gives me an extremely large depth bias. It seems I need to use extremely small values in DirectX9. However, I can't seem to find how small.
I noticed that in the OGRE engine's source they're dividing by 250000, but despite the comment I don't quite see where that number comes from. Also, they only divide the constant by that for some reason?
// D3D also expresses the constant bias as an absolute value, rather than
// relative to minimum depth unit, so scale to fit
constantBias = -constantBias / 250000.0f;
__SetRenderState(D3DRS_DEPTHBIAS, FLOAT2DWORD(constantBias));
slopeScaleBias = -slopeScaleBias;
__SetRenderState(D3DRS_SLOPESCALEDEPTHBIAS, FLOAT2DWORD(slopeScaleBias));
So my question: what do I need to pass to DirectX9 to get the exact same result as glPolygonOffset?
I haven't found an exact number anywhere, but by experimenting I have figured out that to get roughly the same effect in OpenGL and DirectX, I need to divide by 3500000, instead of the 250000 mentioned above.
If anyone knows the exact number or why it's this, I'd love to hear that, but for practical purposes I think this conclusion will do for me.
If you look at the equations for OpenGL polygon offset and the equivalent Direct3D 9 renderstates, you'll find that they're identical, other than OpenGL has a term for an implementation-dependent constant value. If we assume that value is 1, the equations do become identical.
The obvious problem here is: what happens when the value is not 1?
Unfortunately OpenGL doesn't seem to provide a way of querying it, so the only thing you can do is twiddle parameters until you find something that works, then twiddle them some more as edge cases arise, and never assume that what works for you will work for anyone else.
Direct3D's assumption that the value is always 1 may not necessarily hold good all the time either.
Bottom line is that polygon offset is not a 100% robust method for fixing z-fighting under any circumstances. Have you tried other methods, such as pushing your decals out an epsilon along the surface normal?

Measure difference between two files

I have a question that's very specific, yet very general at the same time. (Also, I don't know if this is quite the right site for this.)
The Scenario
Let's say I have an uncompressed video vid.avi. It is then run through [Some compression algorithm], which is lossy. I want to compare vid.avi and the new, compressed file to determine just how much data was lost in the compression. How can I compare the files and how can I measure the difference between the two, using the original as the reference point? Is it possible at all? I would prefer a generic answer that will work with any language, but I would also gladly accept an answer that's specific to a language.
EDIT: Let me be more specific. I want something that compares two video files in a similar way that the Notepad++ Compare plugin compares text files. I just want to find out how close each individual pixel's colour is to the original file's colour for that pixel.
Thanks in advance, and thank you for taking the time to read this question.
It is generally the change in video quality that people want to measure when comparing compression methods, rather than a loss of data.
If you did want to measure somehow the data loss, you would have to define what you mean by 'data' and how you wanted to measure it. Video compression is quite complex and the approach may even differ frame by frame within a video. Data could mean the colour depth for each pixel, the number of frames per second, whether a frame is encoded based on a delay to other frames etc.
Video quality is subjective so the reduction in quality after compression will not be an absolute value. The usual way to measure the quality is similar to the technique used for audio - Mean Opinion Score: https://en.wikipedia.org/wiki/Mean_opinion_score. Its essentially uses a well defined process to try to apply some objectivity to a test audiences subjective experience.

How could we get a variable value from GLSL?

I'm doing a project with a lot of calculation and i got an idea is throw pieces of work to GPU, but i wonder whether could we retrieve results from GLSL, if it is posible, how?
GLSL does not provide outputs besides what is placed in the frame buffer.
To program a GPU and get results more conveniently, use CUDA (NVidia only) or OpenCL (cross-platform).
In general, what you want to do is use OpenCL for general-purpose GPU tasks. However, if you are insistent about pretending that OpenGL is not a rendering API...
Framebuffer Objects make it relatively easy to render to multiple outputs. This of course means that you have to structure your processing such that what gets rendered matches what you want. You can render to 32-bit floating-point "images", so you have access to plenty of precision. The biggest difficulty is what I stated: figuring out how to structure your task to match rendering.
It's a bit easier when using transform feedback. This is the ability to write the output of the vertex (or geometry) shader processing to a buffer object. This still requires structuring your tasks into something like rendering, but it's easier because vertex shaders have a strict one-vertex-to-one-vertex mapping. For every input vertex, there is exactly one output. And if you draw GL_POINTS, it's not too difficult to use attributes to pass the data that changes.
Both easier and harder is the use of shader_image_load_store. This is effectively the ability to read/write from/to arbitrary images "whenever you want". I put that last part in quotes because there are lots of esoteric rules about data race conditions: reading from a value written by another shader invocation and so forth. These are not trivial to deal with. You can try to structure your code to avoid them, by not writing to the same image location in the same shader. But in many cases, if you could do that, you could just render to the framebuffer.
Ultimately, it's pretty much impossible to answer this question in the general case, without knowing what exactly you're trying to actually do. How you approach GPGPU through a rendering API depends greatly on exactly what you're trying to compute.

Resources