DirectShow data copy is TOO slow - c

Have USB 3.0 HDMI Capture device. It uses YUY2 format (2 bytes per pixel) and 1920x1080 resolution.
Video capture Output Pin connects directly to Video Render input Pin.
And all works good. It shows me 1920x1080 without any freezes.
But I need to make screenshot every second. So this is what I do:
void CaptureInterface::ScreenShoot() {
IMemInputPin* p_MemoryInputPin = nullptr;
hr = p_RenderInputPin->QueryInterface(IID_IMemInputPin, (void**)&p_MemoryInputPin);
IMemAllocator* p_MemoryAllocator = nullptr;
hr = p_MemoryInputPin->GetAllocator(&p_MemoryAllocator);
IMediaSample* p_MediaSample = nullptr;
hr = p_MemoryAllocator->GetBuffer(&p_MediaSample, 0, 0, 0);
long buff_size = p_MediaSample->GetSize(); //buff_size = 4147200 Bytes
BYTE* buff = nullptr;
hr = p_MediaSample->GetPointer(&buff);
//BYTE CaptureInterface::ScreenBuff[1920*1080*2]; defined in header
//--------- TOO SLOW (1.5 seconds for 4 MBytes) ----------
std::memcpy(ScreenBuff, buff, buff_size);
//--------------------------------------------
p_MediaSample->Release();
p_MemoryAllocator->Release();
p_MemoryInputPin->Release();
return;
}
Any other operations with this buffer is very slow too.
But If I use memcpy on other data (2 arrays in my class for example same size 4MB) It is very fast. <0.01sec

Video memory is (might be) slow to read back by its nature (e.g. VMR9 IBasicVideo->GetCurrentImage very slow and you can find other references). You normally want to grab the data before it actually reaches video memory.
Additionally, the way you read data is not quite reliable. You don't know what frame you are actually copying and it might so happen that you even read blackness or garbage, or vice versa your acquiring access to buffer freezes the main video streaming. This is because you are grabbing an unused buffer from pool of available buffers rather than a buffer that corresponds to specific video frame. Your getting an image from such buffer happen in a fragile assumption that unused data from previously streamed frame was initialized and is not yet overwritten by anything else.

Related

is there any way to crop an jpg image captured by esp cam?

//I am trying to crop an image captured by espcam the image is in a jpg format I would like to crop it. As the image is stored as a single-dimensional array I tried to rearrange the elements in the array but no changes occurred //
I have cropped the image in RGB565 but I am struggling to understand the single-dimensional array(image buffer)
camera_config_t config;
config.ledc_channel = LEDC_CHANNEL_0;
config.ledc_timer = LEDC_TIMER_0;
config.pin_d0 = Y2_GPIO_NUM;
config.pin_d1 = Y3_GPIO_NUM;
config.pin_d2 = Y4_GPIO_NUM;
config.pin_d3 = Y5_GPIO_NUM;
config.pin_d4 = Y6_GPIO_NUM;
config.pin_d5 = Y7_GPIO_NUM;
config.pin_d6 = Y8_GPIO_NUM;
config.pin_d7 = Y9_GPIO_NUM;
config.pin_xclk = XCLK_GPIO_NUM;
config.pin_pclk = PCLK_GPIO_NUM;
config.pin_vsync = VSYNC_GPIO_NUM;
config.pin_href = HREF_GPIO_NUM;
config.pin_sscb_sda = SIOD_GPIO_NUM;
config.pin_sscb_scl = SIOC_GPIO_NUM;
config.pin_pwdn = PWDN_GPIO_NUM;
config.pin_reset = RESET_GPIO_NUM;
config.xclk_freq_hz = 20000000;
config.pixel_format = PIXFORMAT_RGB565;
config.frame_size = FRAMESIZE_SVGA;
// config.jpeg_quality = 10;
config.fb_count = 2;
esp_err_t result = esp_camera_init(&config);
if (result != ESP_OK) {
return false;
}
camera_fb_t * fb = NULL;
fb = esp_camera_fb_get();
if (!fb)
{
Serial.println("Camera capture failed");
}
the Fb buffer is a single-dimensional array I want to extract each individual RGB value.
JPG is a compressed format, meaning that your rows and columns are not corresponding to what you would see by displaying a 1:1 grid on the screen. You need to convert it to the plain RGB (or equivalents) format and then copy it.
JPG achieves compression by splitting the image into YCbCR components, using a mathematical transformation and then filtering. For additional information I refer to this page.
Luckily you can follow this tutorial to do the inverse JPEG transformation on an Arduino (tip: forget to do this in real time, unless your time constraints are very relaxed).
The idea is to use a library that converts the JPEG image into an array of data:
Using the library is fairly simple: we give it the JPEG file, and the library will start generating arrays of pixels – so called Minimum Coded Units, or MCUs for short. The MCU is a block of 16 by 8 pixels. The functions in the library will return the color value for each pixel as 16-bit color value. The upper 5 bits are the red value, the middle 6 are green and the lower 5 are blue. Now we can send these values by any sort of communication channel we like.
For your use case you won't send the data through the communication channel, but rather store it in a local array by pushing the blocks into adjacent tiles, then do the crop.
That depends on what kind of hardware (camera and board) you are using.
I'm basing this on the OV2640 camera module because it's the one I've been working with. It delivers the image to the frame buffer already encoded, so I'm guessing this might be what you are facing.
Trying to crop the image after it has been encoded can be tricky, but you might be able to instruct the camera chip to only deliver a certain part of the sensor output in the first place using a window function.
The easiest way to access this setting is to define a function to access this:
void setWindow(int resolution , int xOffset, int yOffset, int xLength, int yLength) {
sensor_t * s = esp_camera_sensor_get();
resolution = 0;
s->set_res_raw(s, resolution, 0, 0, 0, xOffset, yOffset, xLength, yLength, xLength, yLength, true, true);
}
/*
* resolution = 0 \\ 1600 x 1200
* resolution = 1 \\ 800 x 600
* resolution = 2 \\ 400 x 296
*/
where (xOffset,yOffset) is the origin of the window in pixels and (xLength,yLength) is the size of the window in pixels. Be aware that changing the resolution will effectively overwrite these settings. Otherwise this works great for me, although for some reason only if the aspect ratio of 4:3 is preserved in the window size.
Looking at the output format table for the ESP32 Camera Driver one can see that most output formats are non-jpeg. If you can handle a RAW format instead (it will be slower to save/transfer, and be MUCH larger) then that would allow you to more easily crop the image by make a copy with a couple of loops. JPEG is compressed and not easily cropped. The page linked also mentions this:
Using YUV or RGB puts a lot of strain on the chip because writing to PSRAM is not particularly fast. The result is that image data might be missing. This is particularly true if WiFi is enabled. If you need RGB data, it is recommended that JPEG is captured and then turned into RGB using fmt2rgb888 or fmt2bmp/frame2bmp
If you are using PIXFORMAT_RGB565 (which means each pixel value will be kept in TWO bytes, and the image is not jpeg compressed) and FRAMESIZE_SVGA (800x600 pixels), you should be able to access the framebuffer as a two-dimensional array if you want:
uint16_t *buffer = fb->buf;
uint16_t pxl = buffer[row * 800 + column]; // 800 is the SVGA width
// pxl now contains 5 R-bits, 6 G-bits, 5 B-bits

OpenGL - Is it efficient to call glBufferSubData (nearly) each frame?

I have a spritesheet that contains a simple sprite animation. I can successfully extract each animation frame sprite and display it as a texture. However, I managed to do that by calling glBufferSubData to change the texture coordinates (before the game loop, at initialization). I want to play a simple sprite animation using these extracted textures, and I guess I will do it by changing the texture coordinates each frame (except if animation is triggered by user input). Anyways, this results in calling glBufferSubData almost every frame (to change texture data), and my question is that is this approach efficient? If not, how can I solve the issue? (In Reddit, I saw a comment saying that the goal must be to minimize the traffic between CPU and GPU memory in modern OpenGL, and I guess my approach violates this goal.) Thanks in advance.
For anyone interested, here is my approach:
void set_sprite_texture_from_spritesheet(Sprite* sprite, const char* path, int x_offset, int y_offset, int sprite_width, int sprite_height)
{
float* uv_coords = get_uv_coords_from_spritesheet(path, x_offset, y_offset, sprite_width, sprite_height);
for (int i = 0; i < 8; i++)
{
/* 8 means that I am changing a total of 8 texture coordinates (2 for each 4 vertices) */
edit_vertex_data_by_index(sprite, &uv_coords[i], (i / 2) * 5 + 3 + (i % 2 != 0));
/*
the last argument in this function gives the index of the desired texture coordinate
(5 is for stride, 3 for offset of texture coordinates in each row)
*/
}
free(uv_coords);
sprite->texture = load_texture(path); /* loads the texture -
since the UV coordinate is
adjusted based on the spritesheet
I am loading the entire spritesheet as
a texture.
*/
}
void edit_vertex_data_by_index(Sprite *sprite, float *data, unsigned int start_index)
{
glBindBuffer(GL_ARRAY_BUFFER, sprite->vbo);
glBufferSubData(GL_ARRAY_BUFFER, start_index * sizeof(float), sizeof(data), data);
glBindBuffer(GL_ARRAY_BUFFER, 0);
/*
My concern is that if I call this almost every frame, it could be not efficient, but I am not sure.
*/
}
Editing buffers is fine. Literally every game has buffers that change every frame. Buffers are how you get the data to the GPU so it can render it! (And uniforms. Your driver is likely to secretly put uniforms in buffers though!)
Yes, you should minimize the amount of buffer updates. You should minimize everything, really. The less stuff the computer does, the faster it can do it! That doesn't mean you should avoid doing stuff entirely. It means you should only do as much stuff as you need to, instead of doing wasteful stuff that you don't need.
Every time you call an OpenGL function, the driver takes some time to check how to process your request, which buffer is bound, that it's big enough, that the GPU isn't using it at the same time, etc. You want to do as few calls as possible, because that way, the driver has to check all this stuff less often.
You are doing 8 separate glBufferSubData calls in this function. If you put the UV coordinates all next to each other in the buffer, you could update them all at once with 1 call. And if you have lots of animated sprites, you should try to put all of their UV coordinates in one big array, and update the whole array in one call - all the sprites at once.
And loading textures from paths is really slow. Maybe your program can load 100 textures per second but that still means you blew half your frame time budget on texture loading. The texture hasn't changed anyway so why would you load it again?

concurrent I/O - buffers corruption, block device driver

I developing block layered device driver. So, I intercept WRITE request and encrypt data, and decrypt data in the end_bio() routine (during processing and READ request).
So all works fine in single stream. But I getting buffers content corruption if have tried to performs I/O from two and more processes simultaneously. I have not any local storage for buffers.
Do I'm need to count a BIO merging in my driver?
Is the Linux I/O subsystem have some requirements related to the a number of concurrent I/O request?
Is there some tips and tricks related stack using or compilation?
This is under kernel 4.15.
At the time I use next constriction to run over disk sectors:
/*
* A portion of the bio_copy_data() ...
*/
for (vcnt = 0, src_iter = src->bi_iter; ; vcnt++)
{
if ( !src_iter.bi_size)
{
if ( !(src = src->bi_next) )
break;
src_iter = src->bi_iter;
}
src_bv = bio_iter_iovec(src, src_iter);
src_p = bv_page = kmap_atomic(src_bv.bv_page);
src_p += src_bv.bv_offset;
nlbn = src_bv.bv_len512;
for ( ; nlbn--; lbn++ , src_p += 512 )
{
{
/* Simulate a processing of data in the I/O buffer */
char *srcp = src_p, *dstp = src_p;
int count = DUDRV$K_SECTORSZ;
while ( count--)
{
*(dstp++) = ~ (*(srcp++));
}
}
}
kunmap_atomic(bv_page);
**bio_advance_iter**(src, &src_iter, src_bv.bv_len);
}
Is this correct ? Or I'm need to use something like **bio_for_each_segment(bvl, bio, iter) ** ?
The root of the problem is a "feature" of the Block I/O methods. In particularly (see description at Linex site reference )
** Biovecs can be shared between multiple bios - a bvec iter can represent an
arbitrary range of an existing biovec, both starting and ending midway
through biovecs. This is what enables efficient splitting of arbitrary
bios. Note that this means we only use bi_size to determine when we've
reached the end of a bio, not bi_vcnt - and the bio_iovec() macro takes
bi_size into account when constructing biovecs.*
So, in my case this is a cause of the buffer with disk sector overrun.
The trick is set REQ_NOMERGE_FLAGS in the .bi_opf before sending BIO to backed device driver.
The second reason is the non-actual .bi_iter is returned by backed device driver. So, we need to save it (before submiting BIO request to backend) and restore it in the our "bio_endio()" routine.
Have you considered the use of vmap with global synchronization, instead?
The use of kmap_atomic has some restrictions:
Since the mapping is restricted to the CPU that issued it, it
performs well, but the issuing task is therefore required to stay on that
CPU until it has finished, lest some other task displace its mappings.
kmap_atomic() may also be used by interrupt contexts, since it is does not
sleep and the caller may not sleep until after kunmap_atomic() is called.
Reference: https://www.kernel.org/doc/Documentation/vm/highmem.txt

Getting individual frames using CV_CAP_PROP_POS_FRAMES in cvSetCaptureProperty

I am trying to jump to a specific frame by setting the CV_CAP_PROP_POS_FRAMES property and then reading the frame like this:
cvSetCaptureProperty( input_video, CV_CAP_PROP_POS_FRAMES, current_frame );
frame = cvQueryFrame( input_video );
The problem I am facing is that, OpenCV 2.1 returns the same frame for the 12 consecutive values of current_frame whereas I want to read each individual frame, not just the key frames. Can anyone please tell me what's wrong?
I did some research and found out that the problem is caused by the decompression algorithm.
The MPEG-like algorithms (including HD, et all) do not compress each frame separately, but save a keyframe from time to time, and then only the differences between the last frame and subsequent frames.
The problem you reported is caused by the fact that, when you select a frame, the decoder (ffmpeg, likely) automatically advances to the next keyframe.
So, is there a way around this? I don't want only key frames but each individual frame.
I don't know whether or not this would be precise enough for your purpose, but I've had success getting to a particular point in an MPEG video by grabbing the frame rate, converting the frame number to a time, then advancing to the time. Like so:
cv::VideoCapture sourceVideo("/some/file/name.mpg");
double frameRate = sourceVideo.get(CV_CAP_PROP_FPS);
double frameTime = 1000.0 * frameNumber / frameRate;
sourceVideo.set(CV_CAP_PROP_POS_MSEC, frameTime);
Due to this limitation in OpenCV, it may be wise to to use FFMPEG instead. Moviepy is a nice wrapper library.
# Get nth frame from a video
from moviepy.video.io.ffmpeg_reader import FFMPEG_VideoReader
cap = FFMPEG_VideoReader("movie.mov",True)
cap.initialize()
cap.get_frame(n/FPS)
Performance is great too. Seeking to the nth frame with get_frame is O(1), and a speed-up is used if (nearly) consecutive frames are requested. I've gotten better-than-realtime results loading three 720p videos simultaneously.
CV_CAP_PROP_POS_FRAMES jumps to a key frame. I had the same issue and worked around it using this (python-)code. It's probably not totally efficient, but get's the job done:
def seekTo(cap, position):
positiontoset = position
pos = -1
cap.set(cv.CV_CAP_PROP_POS_FRAMES, position)
while pos < position:
ret, image = cap.read()
pos = cap.get(cv.CV_CAP_PROP_POS_FRAMES)
if pos == position:
return image
elif pos > position:
positiontoset -= 1
cap.set(cv.CV_CAP_PROP_POS_FRAMES, positiontoset)
pos = -1
I've successfully used the following on OpenCV 3 / Python 3:
# Skip to 150 frame then read the 151th frame
cap.set(cv2.CAP_PROP_POS_FRAMES, 150))
ret, frame = cap.read()
After some years assuming this as a unsavable bug, I think I've figured out a way to use with a good balance between speed and correctness.
A previous solution suggested to use the CV_CAP_PROP_POS_MSEC property before reading the frame:
cv::VideoCapture sourceVideo("/some/file/name.mpg");
const auto frameRate = sourceVideo.get(CV_CAP_PROP_FPS);
void readFrame(int frameNumber, cv::Mat& image) {
const double frameTime = 1000.0 * frameNumber / frameRate;
sourceVideo.set(CV_CAP_PROP_POS_MSEC, frameTime);
sourceVideo.read(image);
}
It does return the expected frame, but the problem is that using CV_CAP_PROP_POS_MSEC may be very slow, for example for a video conversion.
Note: using global variables for simplicity.
On the other hand, if you just want to read the video sequentially, it is enough to read frame without seeking at all.
for (int frameNumber = 0; frameNumber < nFrames; ++frameNumber) {
sourceVideo.read(image);
}
The solution comes from combining both: using a variable to remember the last queried frame, lastFrameNumber, and only seek when requested frame is not the next one. In this way it is possible to increase the speed in a sequential reading while allowing random seek if necessary.
cv::VideoCapture sourceVideo("/some/file/name.mpg");
const auto frameRate = sourceVideo.get(CV_CAP_PROP_FPS);
const int lastFrameNumber = -2; // guarantee seeking the first time
void readFrame(int frameNumber, cv::Mat& image) {
if (lastFrameNumber + 1 != frameNumber) { // not the next frame? seek
const double frameTime = 1000.0 * frameNumber / frameRate;
sourceVideo.set(CV_CAP_PROP_POS_MSEC, frameTime);
}
sourceVideo.read(image);
lastFrameNumber = frameNumber;
}

Programatically determining file "size on disk" in advance

I need to know how big a given in-memory buffer will be as an on-disk (usb stick) file before I write it. I know that unless the size falls on the block size boundary, its likely to get rounded up, e.g. a 1 byte file takes up 4096 bytes on-disk. I'm currently doing this using GetDiskFreeSpace() to work out the disk block size, then using this to calculate the on-disk size like this:
GetDiskFreeSpace(szDrive, &dwSectorsPerCluster,
&dwBytesPerSector, NULL, NULL);
dwBlockSize = dwSectorsPerCuster * dwBytesPerSector;
if (dwInMemorySize % dwBlockSize != 0)
{
dwSizeOnDisk = ((dwInMemorySize / dwBlockSize) * dwBlockSize) + dwBlockSize;
}
else
{
dwSizeOnDisk = dwInMemorySize;
}
Which seems to work fine, BUT GetDiskFreeSpace() only works on disks up to 2GB according to MSDN. GetDiskFreeSpaceEx() doesn't return the same information, so my question is, how else can I calculate this information for drives >2GB? Is there an API call I've missed? Can I assume some hard values depending on the overall disk size?
MSDN only states that the GetDiskFreeSpace() function cannot report volume sizes greater than 2GB. It works fine for retrieving sectors per cluster and bytes per sector, I've used it myself for very similar-looking code ;-)
But if you want disk capacity too, you'll need an additional call to GetDiskFreeSpaceEx().
The size of a file on disk is a fuzzy concept. In NTFS, a file consists of a set of data elements. You're primarilty thinking of the "unnamed data stream". That's an attribute of a file that, if small, can be packed with the other attributes in the directory entry. Apparently, you can store a data stream of up to 700-800 bytes in the directory entry itself. Hence, your hypothetical 1 byte file would be as big as a 0 byte or 700 byte file.
Another influence is file compression. This will make the on-disk size potentially smaller than the in-memory size.
You should be able to obtain this information using the DeviceIoControl function and
DISK_GEOMETRY_EX. It will return a structure that contains the information you are looking for I think
http://msdn.microsoft.com/en-us/library/aa363216(VS.85).aspx
http://msdn.microsoft.com/en-us/library/ms809010.aspx
In actionscript!
var size:Number = 19912;
var sizeOnDisk:Number = size;
var reminder:Number = size % (1024 * 4);
if(reminder>0){
sizeOnDisk = size + ((1024 * 4)- reminder)
}
trace(size)
trace(sizeOnDisk)

Resources