Set webcam grabbing resolution with ImageIO in Python - python-imageio

Using simple Windows/Python to read from Webcam:
camera = iio.get_reader("<video0>")
screenshot = camera.get_data(0)
camera.close()
I'm getting a default resolution of 1980x1920. The webcam had different, larger resolutions available. How do I set that UP?
ALSO -
How do I set exposure time? image comes out pretty dark.
Thanks

You can set the resolution via the size kwarg, e.g. size=(1280, 720)
My webcam is my third device and the resolution defaults to 640x360 but has 1280x720 available, so I would do something like:
import imageio.v3 as iio
frame = iio.imread("<video2>", index=0, size=(1280, 720))
On a tangent, I'd also suggest switching to the easier iio.imiter for stream reading. It tends to produce cleaner code than the old iio.get_reader syntax.
import imageio.v3 as iio
for idx, frame in enumerate(iio.imiter("<video0>", size=(1980, 1920))):
... # do something with the frame
if idx == 9:
# read 10 frames
break
Response to your edit:
Setting webcam exposure is a question that actually hasn't come up yet. Webcams typically feature automatic brightness adjustment, but that might take a few frames depending on the webcam's quality.
Manual adjustment might already be possible and I just don't know about it (never looked into it). This is a separate question though and is probably better tracked as a new issue over at the ImageIO repo.

Related

How can I get current microphone input level with C WinAPI?

Using Windows API, I want to implement something like following:
i.e. Getting current microphone input level.
I am not allowed to use external audio libraries, but I can use Windows libraries. So I tried using waveIn functions, but I do not know how to process audio input data in real time.
This is the method I am currently using:
Record for 100 milliseconds
Select highest value from the recorded data buffer
Repeat forever
But I think this is way too hacky, and not a recommended way. How can I do this properly?
Having built a tuning wizard for a very dated, but well known, A/V conferencing applicaiton, what you describe is nearly identical to what I did.
A few considerations:
Enqueue 5 to 10 of those 100ms buffers into the audio device via waveInAddBuffer. IIRC, when the waveIn queue goes empty, weird things happen. Then as the waveInProc callbacks occurs, search for the sample with the highest absolute value in the completed buffer as you describe. Then plot that onto your visualization. Requeue the completed buffers.
It might seem obvious to map the sample value as follows onto your visualization linearly.
For example, to plot a 16-bit sample
// convert sample magnitude from 0..32768 to 0..N
length = (sample * N) / 32768;
DrawLine(length);
But then when you speak into the microphone, that visualization won't seem as "active" or "vibrant".
But a better approach would be to give more strength to those lower energy samples. Easy way to do this is to replot along the μ-law curve (or use a table lookup).
length = (sample * N) / 32768;
length = log(1+length)/log(N);
length = max(length,N)
DrawLine(length);
You can tweak the above approach to whatever looks good.
Instead of computing the values yourself, you can rely on values from Windows. This is actually the values displayed in your screenshot from the Windows Settings.
See the following sample for the IAudioMeterInformation interface:
https://learn.microsoft.com/en-us/windows/win32/coreaudio/peak-meters.
It is made for the playback but you can use it for capture also.
Some remarks, if you open the IAudioMeterInformation for a microphone but no application opened a stream from this microphone, then the level will be 0.
It means that while you want to display your microphone peak meter, you will need to open a microphone stream, like you already did.
Also read the documentation about IAudioMeterInformation it may not be what you need as it is the peak value. It depends on what you want to do with it.

Measure difference between two files

I have a question that's very specific, yet very general at the same time. (Also, I don't know if this is quite the right site for this.)
The Scenario
Let's say I have an uncompressed video vid.avi. It is then run through [Some compression algorithm], which is lossy. I want to compare vid.avi and the new, compressed file to determine just how much data was lost in the compression. How can I compare the files and how can I measure the difference between the two, using the original as the reference point? Is it possible at all? I would prefer a generic answer that will work with any language, but I would also gladly accept an answer that's specific to a language.
EDIT: Let me be more specific. I want something that compares two video files in a similar way that the Notepad++ Compare plugin compares text files. I just want to find out how close each individual pixel's colour is to the original file's colour for that pixel.
Thanks in advance, and thank you for taking the time to read this question.
It is generally the change in video quality that people want to measure when comparing compression methods, rather than a loss of data.
If you did want to measure somehow the data loss, you would have to define what you mean by 'data' and how you wanted to measure it. Video compression is quite complex and the approach may even differ frame by frame within a video. Data could mean the colour depth for each pixel, the number of frames per second, whether a frame is encoded based on a delay to other frames etc.
Video quality is subjective so the reduction in quality after compression will not be an absolute value. The usual way to measure the quality is similar to the technique used for audio - Mean Opinion Score: https://en.wikipedia.org/wiki/Mean_opinion_score. Its essentially uses a well defined process to try to apply some objectivity to a test audiences subjective experience.

How should the result of getDeviceDensity() method from Codename One be used?

Depending on which skin I use in the simulator, the result from the following method differs :
Display.getInstance().getDeviceDensity();
The results have nothing to do with the real device density since for a Xoom skin it outputs 30 (149 ppi in reality), for a an Iphone 6 it outputs 50 (329 in reality).
I noticed that because I need to translate char height measured in Gimp (72 dpi) into the device world so that it looks alike on an image.
Any help on that topic would be appreciated!
Cheers
The JavaDocs for getDeviceDensity state:
Returns one of the density variables appropriate for this device,
notice that density doesn't always correspond to resolution and an
implementation might decide to change the density based on DPI
constraints.
Returns:
one of the DENSITY constants of Display
The DENSITY constants refers to one of these.
Notice you can also use convertToPixels which is probably a far better API to use. The density API is mostly used to pick the right multi image and should rarely be used in user code.

Which video encoding algorithm should I use for a video with just one static image and sound?

I'm doing video processing tasks and one of the problems I need to solve is choosing the appropriate encoding algorithm for a video that has just one static image throughout the entire video.
Currently I tried several algorithms, such as DivX and XviD, but they produce 3MB video for a 1 minute long video. The audio is 64kbit/s mp3, so the audio takes just 480KB. So the video is 2.5MB!
As the image in the video is not changing, it could be compressed really efficiently as there is no motion. The image size itself (it's a jpg) is just 50KB.
So ideally I'd expect this video to be about 550KB - 600KB and not 3MB.
Any ideas about how I could optimize the video so it's not that huge?
I hope this is the right stackexchange forum to ask this question.
Set the frames-per-second to be very low. Lower than 1fps if you can. Your goal would be to get as close to two keyframes (one at the start, and one at the end) as possible.
Whether you can do this depends on the scheme/codec you are using, and also the encoder.
Many codecs will have keyframe-related options. For example, here are some open-source encoders:
lavc (libavcodec):
keyint=<0-300> - maximum interval between keyframes in frames (default: 250 or one keyframe every ten seconds in a 25fps movie.
This is the recommended default for MPEG-4). Most codecs require regular keyframes in order to limit the accumulation of mismatch error. Keyframes are also needed for seeking, as seeking is only possible to a keyframe - but keyframes need more space than other frames, so larger numbers here mean slightly smaller files but less precise seeking. 0 is equivalent to 1, which makes every frame a keyframe. Values >300 are not recommended as the quality might be bad depending upon decoder, encoder and luck. It is common for MPEG-1/2 to use values <=30.
xvidenc:
max_key_interval= - maximum interval between keyframes (default: 10*fps)
Interestingly, this solution may reduce the ability to seek in the file, so you will want to test that.
I think this problem is related to the implementation of video encoder, not the video encoding standard itself.
Actually, most video encoder implementations are not designed for videos of static image, thus it will not produce perfect bitstream as we imagined when a video of static image is inputted. Most video encoder implementations are designed for processing "natural" video.
If you really need a better encoding result for video of static image, you may do a hack on an open source video encoder, from 2nd frame on, mark all MBs' as "skip"...

How do I detect whether the sample supplied by VideoSink.OnSample() is right-side up?

We're currently using the Silverlight VideoSink to capture video from users' local webcams, kinda like so:
protected override void OnSample(long sampleTime, long frameDuration, byte[] sampleData)
{
if (FrameShouldBeSubmitted())
{
byte[] resampledData = ResizeFrame(sampleData);
mediaController.SetVideoFrame(resampledData);
}
}
Now, on most of the machines that we've tested, the video sample provided in the byte[] sampleData parameter is upside-down, i.e., if you try to take the RGBA data and turn it into, say, a WriteableBitmap, the bitmap will be upside-down. That's odd, but fairly easy to correct, of course -- you just have to reverse the array as you encode it.
The problem is that at least on some machines (e.g., the single Macintosh in our test environment), the video sample provided is no longer upside-down, but right-side up, and hence, flipping the image actually results in an image that's received upside-down on the far side.
I reported this to MS as a bug, but their (terse) response was that it was "As Designed". Further attempts at clarification have so far been ignored.
Now, I'll grant that it's kinda entertaining to imagine the discussions behind this design decision: "OK, just to make it interesting, let's play the video rightside up on a Mac, but let's turn it upside down for Windows!" "Great idea!" "Yeah, that'll keep those developers guessing!" But beyond that, I can't find this, umm, "feature" documented anywhere, nor can I find any documentation on how one is supposed to be able to tell that a given video sample is upside down or rightside up. Any thoughts on how to tell this?
EDIT 3/29/10 4:50 pm - I got a response from MS which said that the appropriate way to tell was through the Stride property on the VideoFormat object, i.e., if the stride value is negative, the image will be upside-down. However, my own testing indicates that unless I'm doing something wrong, this isn't the case. At least on my own machine, whether the stride value is zero or negative (the only options I see), the sampled image is still upside-down.
I was going to suggest looking at VideoFormat.Stride provided at VideoSink.OnFormatChange but then I noticed your edit. I went ahead and tested it at my dev machine, image is upside down and stride is negative as expected. Have you checked again recently?
Even though stride made perfect sense for native applications (using stride at pointer operations), I agree that current behavior is not what you expect from a modern API. However performance wise, it is better not to make changes on data received from native API.
Yet at this point, while we are talking about performance, why not provide samples in formats other than PixelFormatType.Format32bppArgb so that we can avoid color space conversion? BTW, there is a VideoCaptureDevice.DesiredFormat property which only works for resolution as there is no alternative to PixelFormatType.Format32bppArgb.

Resources