How to stream Apache Arrow RecordBatches in C?

How to stream Apache Arrow RecordBatches in C? - c

I read some data from a PostgreSQL database, convert it into RecordBatches and try to send the data to a client. But I fail to properly understand the usage of Apache Arrow C/GLib.
My information sources are the C++ docs, the Apache Arrow C/GLib reference manual and the C/GLib Github files.
By following the usage description of Apache Arrow C++ and experimenting with the wrapper classes in C, I build this minimal example of writing out a RecordBatch into a buffer and (after theoretically sending and receiving the buffer) trying to read that buffer back into a RecordBatch. But it fails and i would be glad, if you could point out my mistakes!
I omitted the error catching for readability. The code errors out at creation of the GArrowRecordBatchStreamReader. If i use the arrowbuffer or the buffer from the top in creating the InputStream, the error reads [record-batch-stream-reader][open]: IOError: Expected IPC message of type schema but got record batch. If i use the testBuffer the error complains about an invalid IPC stream, so the data is just corrupt.
void testRecordbatchStream(GArrowRecordBatch *rb){
GError *error = NULL;
// Write Recordbatch
GArrowResizableBuffer *buffer = garrow_resizable_buffer_new(300, &error);
GArrowBufferOutputStream *bufferStream = garrow_buffer_output_stream_new(buffer);
long written = garrow_output_stream_write_record_batch(GARROW_OUTPUT_STREAM(bufferStream), rb, NULL, &error);
// Use buffer as plain bytes
void *data = garrow_buffer_get_data(GARROW_BUFFER(buffer));
size_t length = garrow_buffer_get_size(GARROW_BUFFER(buffer));
// Read plain bytes and test serialize function
GArrowBuffer *testBuffer = garrow_buffer_new(data, length);
GArrowBuffer *arrowbuffer = garrow_record_batch_serialize(rb, NULL, &error);
// Read RecordBatch from buffer
GArrowBufferInputStream *inputStream = garrow_buffer_input_stream_new(arrowbuffer);
GArrowRecordBatchStreamReader *sr = garrow_record_batch_stream_reader_new(GARROW_INPUT_STREAM(inputStream), &error);
GArrowRecordBatch *rb2 = garrow_record_batch_reader_read_next(sr, &error);
printf("Received RB: \n%s\n", garrow_record_batch_to_string(rb2, &error));
}

So my solution was to use the class GArrowRecordBatchStreamWriter and Reader, instead of the function garrow_output_stream_write_record_batch(), because the latter only writes a record batch without a stream header and schema. Furthermore one has to properly access the data of the GArrowBuffer after writing. (Again, error handling omitted)
GError *error = NULL;
GArrowResizableBuffer *buffer = garrow_resizable_buffer_new(4096, &error);
GArrowBufferOutputStream *bufferStream = garrow_buffer_output_stream_new(buffer);
GArrowSchema *schema = garrow_record_batch_get_schema(recordbatch);
GArrowRecordBatchStreamWriter *sw = garrow_record_batch_stream_writer_new(GARROW_OUTPUT_STREAM(bufferStream), schema, &error);
g_object_unref(bufferStream);
g_object_unref(schema);
gboolean test = garrow_record_batch_writer_write_record_batch(GARROW_RECORD_BATCH_WRITER(sw), recordbatch, &error);
GBytes *data = garrow_buffer_get_data(GARROW_BUFFER(buffer));
gint64 length = garrow_buffer_get_size(GARROW_BUFFER(buffer));
gsize datasize;
gconstpointer datap = g_bytes_get_data(data, &datasize);
GArrowBuffer *receivingBuffer = garrow_buffer_new(datap, datasize);
GArrowBufferInputStream *inputStream = garrow_buffer_input_stream_new(GARROW_BUFFER(receivingBuffer));
GArrowRecordBatchStreamReader *sr = garrow_record_batch_stream_reader_new(GARROW_INPUT_STREAM(inputStream), &error);
printf("Reading RecordBatch:\n");
GArrowRecordBatch *recordbatch2 = garrow_record_batch_reader_read_next(GARROW_RECORD_BATCH_READER(sr), &error);
printf("%s\n", garrow_record_batch_to_string(recordbatch2, &error));

Related

FFMPEG remux sample without writing to file

Let's consider this very nice and easy to use remux sample by horgh.
I'd like to achieve the same task: convert an RTSP H264 encoded stream to a fragmented MP4 stream.
This code does exactly this task.
However I don't want to write the mp4 onto disk at all, but I need to get a byte buffer or array in C with the contents that would normally written to disk.
How is that achievable?
This sample uses vs_open_output to define the output format and this function needs an output url.
If I would get rid of outputting the contents to disk, how shall I modify this code?
Or there might be better alternatives as well, those are also welcomed.
Update:
As szatmary recommended, I have checked his example link.
However as I stated in the question I need the output as buffer instead of a file.
This example demonstrates nicely how can I read my custom source and give it to ffmpeg.
What I need is how can open the input as standard (with avformat_open_input) then do my custom modification with the packets and then instead writing to file, write to a buffer.
What have I tried?
Based on szatmary's example I created some buffers and initialization:
uint8_t *buffer;
buffer = (uint8_t *)av_malloc(4096);
format_ctx = avformat_alloc_context();
format_ctx->pb = avio_alloc_context(
buffer, 4096, // internal buffer and its size
1, // write flag (1=true, 0=false)
opaque, // user data, will be passed to our callback functions
0, // no read
&IOWriteFunc,
&IOSeekFunc
);
format_ctx->flags |= AVFMT_FLAG_CUSTOM_IO;
AVOutputFormat * const output_format = av_guess_format("mp4", NULL, NULL);
format_ctx->oformat = output_format;
avformat_alloc_output_context2(&format_ctx, output_format,
NULL, NULL)
Then of course I have created 'IOWriteFunc' and 'IOSeekFunc':
static int IOWriteFunc(void *opaque, uint8_t *buf, int buf_size) {
printf("Bytes read: %d\n", buf_size);
int len = buf_size;
return (int)len;
}
static int64_t IOSeekFunc (void *opaque, int64_t offset, int whence) {
switch(whence){
case SEEK_SET:
return 1;
break;
case SEEK_CUR:
return 1;
break;
case SEEK_END:
return 1;
break;
case AVSEEK_SIZE:
return 4096;
break;
default:
return -1;
}
return 1;
}
Then I need to write the header to the output buffer, and the expected behaviour here is to print "Bytes read: x":
AVDictionary * opts = NULL;
av_dict_set(&opts, "movflags", "frag_keyframe+empty_moov", 0);
av_dict_set_int(&opts, "flush_packets", 1, 0);
avformat_write_header(output->format_ctx, &opts)
In the last line during execution, it always runs into segfault, here is the backtrace:
#0 0x00007ffff7a6ee30 in () at /usr/lib/x86_64-linux-gnu/libavformat.so.57
#1 0x00007ffff7a98189 in avformat_init_output () at /usr/lib/x86_64-linux-gnu/libavformat.so.57
#2 0x00007ffff7a98ca5 in avformat_write_header () at /usr/lib/x86_64-linux-gnu/libavformat.so.57
...
The hard thing for me with the example is that it uses avformat_open_input.
However there is no such thing for the output (no avformat_open_ouput).
Update2:
I have found another example for reading: doc/examples/avio_reading.c.
There are mentions of a similar example for writing (avio_writing.c), but ffmpeg does not have this available (at least in my google search).
Is this task really this hard to solve? standard rtsp input to custom avio?
Fortunately ffmpeg.org is down. Great.

It was a silly mistake:
In the initialization part I called this:
avformat_alloc_output_context2(&format_ctx, output_format,
NULL, NULL)
However before this I already put the avio buffers into format_ctx:
format_ctx->pb = ...
Also, this line is unnecessary:
format_ctx = avformat_alloc_context();
Correct order:
AVOutputFormat * const output_format = av_guess_format("mp4", NULL, NULL);
avformat_alloc_output_context2(&format_ctx, output_format,
NULL, NULL)
format_ctx->pb = avio_alloc_context(
buffer, 4096, // internal buffer and its size
1, // write flag (1=true, 0=false)
opaque, // user data, will be passed to our callback functions
0, // no read
&IOWriteFunc,
&IOSeekFunc
);
format_ctx->flags |= AVFMT_FLAG_CUSTOM_IO;
format_ctx->oformat = output_format; //might be unncessary too
Segfault is gone now.

You need to write a AVIOContext implementation.

Nanopb without callbacks

I'm using Nanopb to try and send protobuf messages from a VxWorks based National Instruments Compact RIO (9025). My cross compilation works great, and I can even send a complete message with data types that don't require extra encoding. What's getting me is the callbacks. My code is cross compiled and called from LabVIEW and the callback based structure of Nanopb seems to break (error out, crash, target reboots, whatever) on the target machine. If I run it without any callbacks it works great.
Here is the code in question:
bool encode_string(pb_ostream_t *stream, const pb_field_t *field, void * const *arg)
{
char *str = "Woo hoo!";
if (!pb_encode_tag_for_field(stream, field))
return false;
return pb_encode_string(stream, (uint8_t*)str, strlen(str));
}
extern "C" uint16_t getPacket(uint8_t* packet)
{
uint8_t buffer[256];
uint16_t packetSize;
ExampleMsg msg = {};
pb_ostream_t stream = pb_ostream_from_buffer(buffer, sizeof(buffer));
msg.name.funcs.encode = &encode_string;
msg.value = 17;
msg.number = 18;
pb_encode(&stream, ExampleMsg_fields, &msg);
packetSize = stream.bytes_written;
memcpy(packet, buffer, 256);
return packetSize;
}
And here's the proto file:
syntax = "proto2"
message ExampleMsg {
required int32 value = 1;
required int32 number = 2;
required string name = 3;
}
I have tried making the callback an extern "C" as well and it didn't change anything. I've also tried adding a nanopb options file with a max length and either didn't understand it correctly or it didn't work either.
If I remove the string from the proto message and remove the callback, it works great. It seems like the callback structure is not going to work in this LabVIEW -> C library environment. Is there another way I can encode the message without the callback structure? Or somehow embed the callback into the getPacket() function?
Updated code:
extern "C" uint16_t getPacket(uint8_t* packet)
{
uint8_t buffer[256];
for (unsigned int i = 0; i < 256; ++i)
buffer[i] = 0;
uint16_t packetSize;
ExampleMsg msg = {};
pb_ostream_t stream = pb_ostream_from_buffer(buffer, sizeof(buffer));
msg.name.funcs.encode = &encode_string;
msg.value = 17;
msg.number = 18;
char name[] = "Woo hoo!";
strncpy(msg.name, name, strlen(name));
pb_encode(&stream, ExampleMsg_fields, &msg);
packetSize = stream.bytes_written;
memcpy(packet, buffer, sizeof(buffer));
return packetSize;
}
Updated proto file:
syntax = "proto2"
import "nanopb.proto";
message ExampleMsg {
required int32 value = 1;
required int32 number = 2;
required string name = 3 [(nanopb).max_size = 40];
}

You can avoid callbacks by giving a maximum size for the string field using the option (nanopb).max_size = 123 in the .proto file. Then nanopb can generate a simple char array in the structure (relevant part of documentation).
Regarding why callbacks don't work: just a guess, but try adding extern "C" also to the callback function. I assume you are using C++ there, so perhaps on that platform the C and C++ calling conventions differ and that causes the crash.
Does the VxWorks serial console give any more information about the crash? I don't remember if it does that for functions called from LabView, so running some test code directly from the VxWorks shell may be worth a try also.

Perhaps the first hurdle is how the code handles strings.
LabVIEW's native string representation is not null-terminated like C, but you can configure LabVIEW to use a different representation or update your code to handle LabVIEW's native format.
LabVIEW stores a string in a special format in which the first four bytes of the array of characters form a 32-bit signed integer that stores how many characters appear in the string. Thus, a string with n characters requires n + 4 bytes to store in memory.
LabVIEW Help: Using Arrays and Strings in the Call Library Function Node
http://zone.ni.com/reference/en-XX/help/371361L-01/lvexcodeconcepts/array_and_string_options/

Redland RDF libraries: why does parsing model from Turtle without base URI cause error?

Why does the following test produce an error? Does Redland's turtle parser insist on a base URI even if all actual URIs are absolute? (Apache Jena apparently does not.) And how could I find out more about what actually went wrong (i.e. what API call would return an error description, or similar)?
librdf_world *world = librdf_new_world();
librdf_world_open(world);
librdf_storage *storage = librdf_new_storage(world, "memory", NULL, NULL);
librdf_model *model = librdf_new_model(world, storage, NULL);
librdf_parser* parser = librdf_new_parser(world, NULL, "text/turtle", NULL);
librdf_uri *baseUri = NULL;
const char *turtle = "<http://example.com/SomeSubject> <http://example.com/SomePredicate> <http://example.com/SomeObject> .";
int error = librdf_parser_parse_string_into_model(parser, (const unsigned char *)turtle, baseUri, model);

A base URI is required because the parser says so using RAPTOR_SYNTAX_NEED_BASE_URI flag. It produces the error before even looking at the content in raptor_parser_parse_start().
If you know a real base URI is not needed, you can supply a dummy URI such as . instead:
librdf_uri *baseUri = librdf_new_uri(world, (const unsigned char *)".");
To enable better error reports, you should register a logger with librdf_world_set_logger() - the default logger just spits to stderr. Return non-0 from the logger function to signal you handler the message yourself. Example:
#include <librdf.h>
int customlogger(void *user_data, librdf_log_message *message) {
fputs("mad custom logger: ", stderr);
fputs(message->message, stderr);
fputs("\n", stderr);
return 1;
}
int main() {
librdf_world *world = librdf_new_world();
librdf_world_set_logger(world, /*user_data=*/ 0, customlogger);
librdf_world_open(world);
librdf_storage *storage = librdf_new_storage(world, "memory", NULL, NULL);
librdf_model *model = librdf_new_model(world, storage, NULL);
librdf_parser* parser = librdf_new_parser(world, NULL, "text/turtle", NULL);
librdf_uri *baseUri = NULL;
const char *turtle = "<http://example.com/SomeSubject> <http://example.com/SomePredicate> <http://example.com/SomeObject> .";
int error = librdf_parser_parse_string_into_model(parser, (const unsigned char *)turtle, baseUri, model);
}
Running this will result in
mad custom logger: Missing base URI for turtle parser
(For a real program, add some cleanup etc.)

libxml2 fails to parse from buffer but parses successfully from file

I have a function that writes an XML document to a buffer using the libxml2 writer, but when I try to parse the document from memory using xmlParseMemory, it only returns parser errors. I have also tried writing the document to a file and parsing it using xmlParseFile and it parses successfully.
This is how I initialize the writer and buffer for the xml document.
int rc, i = 0;
xmlTextWriterPtr writer;
xmlBufferPtr buf;
// Create a new XML buffer, to which the XML document will be written
buf = xmlBufferCreate();
if (buf == NULL)
{
printf("testXmlwriterMemory: Error creating the xml buffer\n");
return;
}
// Create a new XmlWriter for memory, with no compression.
// Remark: there is no compression for this kind of xmlTextWriter
writer = xmlNewTextWriterMemory(buf, 0);
if (writer == NULL)
{
printf("testXmlwriterMemory: Error creating the xml writer\n");
return;
}
// Start the document with the xml default for the version,
// encoding UTF-8 and the default for the standalone
// declaration.
rc = xmlTextWriterStartDocument(writer, NULL, ENCODING, NULL);
if (rc < 0)
{
printf
("testXmlwriterMemory: Error at xmlTextWriterStartDocument\n");
return;
}
I pass the xml document to another function to be validated using
int ret = validateXML(buf->content);
Here is the first part of validateXML
int validateXML(char *buffer)
{
xmlDocPtr doc;
xmlSchemaPtr schema = NULL;
xmlSchemaParserCtxtPtr ctxt;
char *XSDFileName = XSDFILE;
char *XMLFile = buffer;
int ret = 1;
doc = xmlReadMemory(XMLFile, sizeof(XMLFile), "noname.xml", NULL, 0);
doc is always NULL after calling this function, which means that it failed to parse the document.
Here are the errors that running the program returns
Entity: line 1: parser error : ParsePI: PI xm space expected
<?xm
^
Entity: line 1: parser error : ParsePI: PI xm never end ...
<?xm
^
Entity: line 1: parser error : Start tag expected, '<' not found
<?xm
^
I have been unable to figure this out for quite a while now and I am out of ideas. If anyone has any, I would be grateful if you would share it.

You are using sizeof to determine the size of the xml data. For a char pointer that is always going to return 4. What you probably need is strlen.
doc = xmlReadMemory(XMLFile, strlen(XMLFile), "noname.xml", NULL, 0);

Load data to GdkPixbufLoader from g_input_stream_read

I load some data from file:
GInputStream* input_stream;
GFile *file = g_file_new_for_path(file_path);
input_stream = g_file_read(file,generator_cancellable ,NULL);
g_input_stream_read(input_stream, buffer, sizeof (buffer),generator_cancellable,error);
How can i load g_input_stream_read function result to the GdkPixbufLoader object?
Thank you.

You need to create a new GdkPixbufLoader and pass the data you read from GInputStream to it:
GdkPixbufLoader *loader = gdk_pixbuf_loader_new ();
gint num_bytes = g_input_stream_read (input_stream, buffer, ...);
gdk_pixbuf_loader_write (loader, buffer, num_bytes, error);
However, this makes sense if you perform reading asynchronously or in chunks (to e.g. show a progressively loaded JPEG or PNG). If you just read all the data at once in a blocking manner, use simpler gdk_pixbuf_new_from_stream().

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to stream Apache Arrow RecordBatches in C? - c

Related

FFMPEG remux sample without writing to file

Nanopb without callbacks

Redland RDF libraries: why does parsing model from Turtle without base URI cause error?

libxml2 fails to parse from buffer but parses successfully from file

Load data to GdkPixbufLoader from g_input_stream_read

Categories

Resources