Gstreamer zero copy dmabuf encoding - c

I want to encode a framebuffer I got via dmabuf into a video stream.
I have a dmabuf file descriptor which contains the framebuffer. I got the filedescriptor from the intel i915 driver via ioctl VFIO_DEVICE_QUERY_GFX_PLANE.
Now I want to encode it per zero copy in gstreamer into a video stream (h264, h265 etc...). I push the single frames per appsrc into the gstreamer pipline. Since I use intel hardware I thought it makes sense to use VAAPI.
The problem is that the sink pads of vaapi only support video/x-raw and video/x-raw(memory:VASurface) and I have video/x-raw(memory:DMABuf).
Is there any way to convert video/x-raw(memory:DMABuf) to video/x-raw(memory:VASurface) (zero copy) or import the DMABuf directly as video/x-raw(memory:VASurface)?
Alternatively is there a framework which is better suited than vaapi?
My code to push the frames into gstreamer currently looks like this:
GstMemory* mem = gst_dmabuf_allocator_alloc(vedpy->gdata.allocator, dmabuf->fd, dmabuf->width * dmabuf->height * (dmabuf->stride/1024));
vedpy->gdata.buffer = gst_buffer_new();
gst_buffer_append_memory(vedpy->gdata.buffer, mem );
gsize offset[GST_VIDEO_MAX_PLANES] = {0, 0, 0, 0};
gint stride[GST_VIDEO_MAX_PLANES] = {dmabuf->stride, 0, 0, 0};
gst_buffer_add_video_meta_full( vedpy->gdata.buffer, GST_VIDEO_FRAME_FLAG_NONE,
GST_VIDEO_FORMAT_ENCODED,
dmabuf->width, dmabuf->height, 1, offset, stride);
GstFlowReturn ret;
vfio_encode_dpy *vedpy = container_of(dcl, vfio_encode_dpy, dcl);
g_signal_emit_by_name (vedpy->gdata.source, "push-buffer", vedpy->gdata.buffer, &ret);
And my pipline:
char launch_stream[] = "appsrc name=source ! "
" video/x-raw(memory:DMABuf),width=1024,height=768,framerate=0/1,format={BGRx,BGRx:0x0100000000000001} ! "
" vaapipostproc !"
" vaapih265enc !
...
which obviously does not work, because vaapipostproc can not be linked with the filter.

Related

DirectShow data copy is TOO slow

Have USB 3.0 HDMI Capture device. It uses YUY2 format (2 bytes per pixel) and 1920x1080 resolution.
Video capture Output Pin connects directly to Video Render input Pin.
And all works good. It shows me 1920x1080 without any freezes.
But I need to make screenshot every second. So this is what I do:
void CaptureInterface::ScreenShoot() {
IMemInputPin* p_MemoryInputPin = nullptr;
hr = p_RenderInputPin->QueryInterface(IID_IMemInputPin, (void**)&p_MemoryInputPin);
IMemAllocator* p_MemoryAllocator = nullptr;
hr = p_MemoryInputPin->GetAllocator(&p_MemoryAllocator);
IMediaSample* p_MediaSample = nullptr;
hr = p_MemoryAllocator->GetBuffer(&p_MediaSample, 0, 0, 0);
long buff_size = p_MediaSample->GetSize(); //buff_size = 4147200 Bytes
BYTE* buff = nullptr;
hr = p_MediaSample->GetPointer(&buff);
//BYTE CaptureInterface::ScreenBuff[1920*1080*2]; defined in header
//--------- TOO SLOW (1.5 seconds for 4 MBytes) ----------
std::memcpy(ScreenBuff, buff, buff_size);
//--------------------------------------------
p_MediaSample->Release();
p_MemoryAllocator->Release();
p_MemoryInputPin->Release();
return;
}
Any other operations with this buffer is very slow too.
But If I use memcpy on other data (2 arrays in my class for example same size 4MB) It is very fast. <0.01sec
Video memory is (might be) slow to read back by its nature (e.g. VMR9 IBasicVideo->GetCurrentImage very slow and you can find other references). You normally want to grab the data before it actually reaches video memory.
Additionally, the way you read data is not quite reliable. You don't know what frame you are actually copying and it might so happen that you even read blackness or garbage, or vice versa your acquiring access to buffer freezes the main video streaming. This is because you are grabbing an unused buffer from pool of available buffers rather than a buffer that corresponds to specific video frame. Your getting an image from such buffer happen in a fragile assumption that unused data from previously streamed frame was initialized and is not yet overwritten by anything else.

convert AVPicture to array<unsigned char>

I use ffmpeg to extract frame of video in c++. I want to get array<unsigned char> of frame in c++ but I get AVFrame from this line code.
avcodec_decode_video2(codecContext, DecodedFrame, &gotPicture, Packet);
So I use sws_scale to convert AVFrame to AVPicture but also I cannot got array<unsigned char> from frame.
sws_scale(convertContext, DecodedFrame->data, DecodedFrame->linesize, 0, (codecContext)->height, convertedFrame->data, convertedFrame->linesize);
So can anyone help me to convert AVFrame or AVPicture to array<unsigned char>?
AVPicture is deprecated. Converting to it is meaningless since AVFrame is its alternative.
If I understand this question correctly, you're trying to get the raw picture pixels value to a std::array. If such so, just dump the data fields of AVFrame into it.
avcodec_decode_video2(codecContext, DecodedFrame, &gotPicture, Packet);
// If you need rgb, create a swscontext to convert from video pixel format
sws_ctx = sws_getContext(DecodedFrame->width, DecodedFrame->height, codecContext->pix_fmt, DecodedFrame->width, DecodedFrame->height, AV_PIX_FMT_RGB24, 0, 0, 0, 0);
uint8_t* rgb_data[4]; int rgb_linesize[4];
av_image_alloc(rgb_data, rgb_linesize, DecodedFrame->width, DecodedFrame->height, AV_PIX_FMT_RGB24, 32);
sws_scale(sws_ctx, DecodedFrame->data, DecodedFrame->linesize, 0, DecodedFrame->height, rgb_data, rgb_linesize);
// RGB24 is a packed format. It means there is only one plane and all data in it.
size_t rgb_size = DecodedFrame->width * DecodedFrame->height * 3;
std::array<uint8_t, rgb_size> rgb_arr;
std::copy_n(rgb_data[0], rgb_size, rgb_arr);

How to FFmpeg decode and extract metadata from last frame?

I am decoding using FFMpeg. The videos I am decoding are H.264 or MPEG4 videos using C code. I am using the 32bit libs. I have successfully decoded and extracted the metadata for the first frame. I would now like to decode the last frame. I have a defined duration of the video, and felt it was a safe assumption to say that isLastFrame = duration. Here's what I have, any suggestions?
AVFormatContext* pFormatCtx = avformat_alloc_context();
avformat_open_input(&pFormatCtx, filename, NULL, NULL);
int64_t duration = pFormatCtx->duration;
i=0;
while(av_read_frame(pFormatCtx, &packet)>=0) {
/* Is this a packet from the video stream? */
if(packet.stream_index==videoStream) {
/* Decode video frame*/
avcodec_decode_video2(pCodecCtx, pFrame, &duration, &packet);
}
Any help is much appreciated! :)
Thanks everyone for your help but I found that the reason the AV_SEEK_FRAME duration wasn't working was because you must multiply it by 1000 for it to be applicable in read frame. Also please note that the reason I have but decode_video instead of the decode functions calls is because I was using 32 bit and created my own but if you plug in video_decode() or I believe it's decode_video2 it works just as well. Hopefully this will help any fellow decoders in the future.
AVFormat Format;
int64_t duration = Format->duration;
duration = duration * 1000;
if (av_seek_frame(Format, Packet->stream_index, duration, AVSEEK_FLAG_ANY) <= 0)
{
/* read the frame and decode the packet */
if (av_read_frame(FormatContext, &Packet) >= 0)
{
/*decode the video frame*/
decode_video(CodecContext, Frame, &duration, &Packet);
}
This might be what you're looking for:
Codecs which have the CODEC_CAP_DELAY capability set have a delay
between input and output, these need to be fed with avpkt->data=NULL,
avpkt->size=0 at the end to return the remaining frames.
Link to FFmpeg documentation

Setting arguments in a kernel in OpenCL causes error

I am a beginner in OpenCL and thus writing a simple program to double the elements of an array.
The kernel code is:-
__kernel void dataParallel(__global int* A, __global int* B)
{
int base = get_local_id(0);
B[base]=A[base]+A[base];
}
The local_work_size=32 as I am squaring 32 elements.
In my program I have declared an integer array which holds the elements to be squared.
int *A;
A=(int*)malloc(sizeof(int)*64);
for (i=0; i < 32; i++) { A[i] = i; }
platforms[i] stores the platform id, devices[j] stores the corresponding device id. Their types:
cl_platform_id* platforms;
cl_device_id* devices;
Creating context
cl_context context=clCreateContext(NULL,1,&devices[j],NULL,NULL,NULL);
Next comes the command queue
cl_command_queue cmdqueue=cmdqueue=clCreateCommandQueue(context,devices[j],NULL,&err);
Next I created 2 memory buffers, one to hold the input data and the other to hold the result.
cl_mem Abuffer,Bbuffer;
Abuffer=clCreateBuffer(context, CL_MEM_READ_WRITE ,32*sizeof(int),NULL,&err);
Bbuffer=clCreateBuffer(context, CL_MEM_READ_WRITE ,32*sizeof(int),NULL,&err);
Then I copied the data of array A to Abuffer
ret=clEnqueueWriteBuffer(cmdqueue, Abuffer, CL_TRUE, 0, 32*sizeof(int), A, 0, NULL, NULL);
printf("%d",ret);//output is 0 thus data written successfully into the buffer
The kernel code was then read into a character string source_str and the program was created.
kernelprgrm=clCreateProgramWithSource(context,1,(const char **)&source_str,(const size_t *)&source_size,&err);
if(!err)
{
printf("\nKernel program created successfully\n");
}//Outputs -Kernel program created successfully
I then built the program using:
ret=clBuildProgram(kernelprgrm,1,&devices[j],NULL,NULL,NULL);//returns CL_SUCCESS
Getting buildinfo next
ret=clGetProgramBuildInfo(kernelprgrm,devices[j], CL_PROGRAM_BUILD_STATUS ,0,NULL,&size);//Returns success
Creating kernel
kernel = clCreateKernel(kernelprgrm, "dataParallel", &ret);
printf("\nReturn kernel program=%d",ret);
if(!ret)
{
printf("\nProgram created successfully!\n");
}
//Outputs -Program created successfully!
Now comes the devil:-
ret=clSetKernelArg(kernel,0,sizeof(cl_mem),(void *) Abuffer);
printf("\nKernel argument 1 ret=%d",ret);
ret=clSetKernelArg(kernel,1,sizeof(cl_mem),(void *) Bbuffer);
printf("\nKernel argument 2 ret=%d",ret);
Both return -38 meaning CL_INVALID_MEM_OBJECT.
P.S.:As per the errors pointed out i.e. use &Abuffer instead of Abuffer in the argument and after making the necessary changes, both return 0
size_t global_item_size = 32;
size_t local_item_size = 32;
Also ret = clEnqueueNDRangeKernel(cmdqueue, kernel, 1, NULL,&global_item_size, &local_item_size, 0, NULL, NULL); returns 0.
Trying to get the result
ret = clEnqueueReadBuffer(cmdqueue, Bbuffer, CL_TRUE, 0, 32*sizeof(int), B, 0, NULL, NULL);`
printf("\nB:-\n");
for (t=0; t < 32; t++) {
printf("%d\t ", B[t]);
}
This returns buildstatus=0 with core getting dumped for my AMD GPU (running AMD Accelerated Parallel Processing platform) and NVIDIA GPU whereas it works perfectly fine if the selected device is CPU using Intel(R) OpenCL platform.
Also I tried getting the build log using:
cl_build_status *status=(cl_build_status *)malloc(sizeof(cl_build_status )*size);
clGetProgramBuildInfo(kernelprgrm,devices[j], CL_PROGRAM_BUILD_STATUS ,size,status,NULL);
printf("\nBuild status=%d\n",*status);
//Getting build info if not successful
clGetProgramBuildInfo(kernelprgrm,devices[i], CL_PROGRAM_BUILD_LOG ,0,NULL,&size);
char *buildlog=(char*)malloc(size);
clGetProgramBuildInfo(kernelprgrm,devices[i], CL_PROGRAM_BUILD_LOG ,size,buildlog,NULL);
printf("\n!!!!!!!!!!!!!!!!!!!!!Program ended!!!!!!!!!!!\n");
printf("\n\nBuildlog: %s\n\n",buildlog);
But it returns Buildlog: Compilation started
Compilation done
Linking started
Linking done
Device build started
Device build done
Kernel <dataParallel> was successfully vectorized (4)
Done.
Here's what the OpenCL 1.2 spec has to say about setting buffers as kernel arguments:
If the argument is a memory object (buffer, image or image array), the arg_value entry will be a pointer to the appropriate buffer, image or image array object.
So, you need to pass a pointer to the cl_mem objects:
ret=clSetKernelArg(kernel,0,sizeof(cl_mem),(void *) &Abuffer);
Why are you using clEnqueueTask? I think you should use clEnqueueNDRangeKernel if you have parallel work to do. Also, just set the global work size; pass NULL for the local work group size. 32x32 is larger than some devices can do.

gamepad force feedback (vibration) on windows using raw input

I'm currently writing a cross platform library in C which includes gamepad support. Gamepad communication on windows is handled by both raw input and xinput, depending on the specific gamepad.
While xinput facilitates force feedback on xbox360 controllers, I have not found a way to do this using raw input. I have some gamepads that can give force feedback, but I cannot find a way to do this through raw input. Is there a way to do this?
I prefer not to use the directinput api, since it's deprecated and discouraged by Microsoft.
Edit:
Since I've implemented gamepads for a large part, maybe I can narrow down the question a bit. I suspect the amount of rumble motors in a gamepad can be found by reading the NumberOutputValueCaps of a HIDP_CAPS structure. This gives the correct result for all my test gamepads.
I'm using the funtcion HidP_GetUsageValue to read axis values, which works fine. Now I suspect calling HidP_SetUsageValue should allow me to change this output value, turning on the rumble motor. Calling this function fails, however. Should this function be able to access rumble motors?
HidP_SetUsageValue() only modifies a report buffer -- you need to first prepare an appropriately-sized buffer (which may be why the function was failing; input reports and output reports won't necessarily be the same size) then send it to the device before it will have any effect. MSDN suggests you can use HidD_SetOutputReport() for that purpose, but I had better luck with WriteFile(), following the sample code at: https://code.msdn.microsoft.com/windowshardware/HClient-HID-Sample-4ec99697/sourcecode?fileId=51262&pathId=340791466
This snippet (based on the Linux driver) lets me control the motors and LED on a DualShock 4:
const char *path = /* from GetRawInputDeviceInfo(RIDI_DEVICENAME) */;
HANDLE hid_device = CreateFile(path, GENERIC_READ | GENERIC_WRITE,
FILE_SHARE_READ | FILE_SHARE_WRITE, NULL,
OPEN_EXISTING, 0, NULL);
assert(hid_device != INVALID_HANDLE_VALUE);
uint8_t buf[32];
memset(buf, 0, sizeof(buf));
buf[0] = 0x05;
buf[1] = 0xFF;
buf[4] = right_motor_strength; // 0-255
buf[5] = left_motor_strength; // 0-255
buf[6] = led_red_level; // 0-255
buf[7] = led_green_level; // 0-255
buf[8] = led_blue_level; // 0-255
DWORD bytes_written;
assert(WriteFile(hid_device, buf, sizeof(buf), &bytes_written, NULL));
assert(bytes_written == 32);
(EDIT: fixed buffer offsets)

Resources