libxml2 fails to parse from buffer but parses successfully from file - c

I have a function that writes an XML document to a buffer using the libxml2 writer, but when I try to parse the document from memory using xmlParseMemory, it only returns parser errors. I have also tried writing the document to a file and parsing it using xmlParseFile and it parses successfully.
This is how I initialize the writer and buffer for the xml document.
int rc, i = 0;
xmlTextWriterPtr writer;
xmlBufferPtr buf;
// Create a new XML buffer, to which the XML document will be written
buf = xmlBufferCreate();
if (buf == NULL)
{
printf("testXmlwriterMemory: Error creating the xml buffer\n");
return;
}
// Create a new XmlWriter for memory, with no compression.
// Remark: there is no compression for this kind of xmlTextWriter
writer = xmlNewTextWriterMemory(buf, 0);
if (writer == NULL)
{
printf("testXmlwriterMemory: Error creating the xml writer\n");
return;
}
// Start the document with the xml default for the version,
// encoding UTF-8 and the default for the standalone
// declaration.
rc = xmlTextWriterStartDocument(writer, NULL, ENCODING, NULL);
if (rc < 0)
{
printf
("testXmlwriterMemory: Error at xmlTextWriterStartDocument\n");
return;
}
I pass the xml document to another function to be validated using
int ret = validateXML(buf->content);
Here is the first part of validateXML
int validateXML(char *buffer)
{
xmlDocPtr doc;
xmlSchemaPtr schema = NULL;
xmlSchemaParserCtxtPtr ctxt;
char *XSDFileName = XSDFILE;
char *XMLFile = buffer;
int ret = 1;
doc = xmlReadMemory(XMLFile, sizeof(XMLFile), "noname.xml", NULL, 0);
doc is always NULL after calling this function, which means that it failed to parse the document.
Here are the errors that running the program returns
Entity: line 1: parser error : ParsePI: PI xm space expected
<?xm
^
Entity: line 1: parser error : ParsePI: PI xm never end ...
<?xm
^
Entity: line 1: parser error : Start tag expected, '<' not found
<?xm
^
I have been unable to figure this out for quite a while now and I am out of ideas. If anyone has any, I would be grateful if you would share it.

You are using sizeof to determine the size of the xml data. For a char pointer that is always going to return 4. What you probably need is strlen.
doc = xmlReadMemory(XMLFile, strlen(XMLFile), "noname.xml", NULL, 0);

Related

How to stream Apache Arrow RecordBatches in C?

I read some data from a PostgreSQL database, convert it into RecordBatches and try to send the data to a client. But I fail to properly understand the usage of Apache Arrow C/GLib.
My information sources are the C++ docs, the Apache Arrow C/GLib reference manual and the C/GLib Github files.
By following the usage description of Apache Arrow C++ and experimenting with the wrapper classes in C, I build this minimal example of writing out a RecordBatch into a buffer and (after theoretically sending and receiving the buffer) trying to read that buffer back into a RecordBatch. But it fails and i would be glad, if you could point out my mistakes!
I omitted the error catching for readability. The code errors out at creation of the GArrowRecordBatchStreamReader. If i use the arrowbuffer or the buffer from the top in creating the InputStream, the error reads [record-batch-stream-reader][open]: IOError: Expected IPC message of type schema but got record batch. If i use the testBuffer the error complains about an invalid IPC stream, so the data is just corrupt.
void testRecordbatchStream(GArrowRecordBatch *rb){
GError *error = NULL;
// Write Recordbatch
GArrowResizableBuffer *buffer = garrow_resizable_buffer_new(300, &error);
GArrowBufferOutputStream *bufferStream = garrow_buffer_output_stream_new(buffer);
long written = garrow_output_stream_write_record_batch(GARROW_OUTPUT_STREAM(bufferStream), rb, NULL, &error);
// Use buffer as plain bytes
void *data = garrow_buffer_get_data(GARROW_BUFFER(buffer));
size_t length = garrow_buffer_get_size(GARROW_BUFFER(buffer));
// Read plain bytes and test serialize function
GArrowBuffer *testBuffer = garrow_buffer_new(data, length);
GArrowBuffer *arrowbuffer = garrow_record_batch_serialize(rb, NULL, &error);
// Read RecordBatch from buffer
GArrowBufferInputStream *inputStream = garrow_buffer_input_stream_new(arrowbuffer);
GArrowRecordBatchStreamReader *sr = garrow_record_batch_stream_reader_new(GARROW_INPUT_STREAM(inputStream), &error);
GArrowRecordBatch *rb2 = garrow_record_batch_reader_read_next(sr, &error);
printf("Received RB: \n%s\n", garrow_record_batch_to_string(rb2, &error));
}
So my solution was to use the class GArrowRecordBatchStreamWriter and Reader, instead of the function garrow_output_stream_write_record_batch(), because the latter only writes a record batch without a stream header and schema. Furthermore one has to properly access the data of the GArrowBuffer after writing. (Again, error handling omitted)
GError *error = NULL;
GArrowResizableBuffer *buffer = garrow_resizable_buffer_new(4096, &error);
GArrowBufferOutputStream *bufferStream = garrow_buffer_output_stream_new(buffer);
GArrowSchema *schema = garrow_record_batch_get_schema(recordbatch);
GArrowRecordBatchStreamWriter *sw = garrow_record_batch_stream_writer_new(GARROW_OUTPUT_STREAM(bufferStream), schema, &error);
g_object_unref(bufferStream);
g_object_unref(schema);
gboolean test = garrow_record_batch_writer_write_record_batch(GARROW_RECORD_BATCH_WRITER(sw), recordbatch, &error);
GBytes *data = garrow_buffer_get_data(GARROW_BUFFER(buffer));
gint64 length = garrow_buffer_get_size(GARROW_BUFFER(buffer));
gsize datasize;
gconstpointer datap = g_bytes_get_data(data, &datasize);
GArrowBuffer *receivingBuffer = garrow_buffer_new(datap, datasize);
GArrowBufferInputStream *inputStream = garrow_buffer_input_stream_new(GARROW_BUFFER(receivingBuffer));
GArrowRecordBatchStreamReader *sr = garrow_record_batch_stream_reader_new(GARROW_INPUT_STREAM(inputStream), &error);
printf("Reading RecordBatch:\n");
GArrowRecordBatch *recordbatch2 = garrow_record_batch_reader_read_next(GARROW_RECORD_BATCH_READER(sr), &error);
printf("%s\n", garrow_record_batch_to_string(recordbatch2, &error));

FFMPEG remux sample without writing to file

Let's consider this very nice and easy to use remux sample by horgh.
I'd like to achieve the same task: convert an RTSP H264 encoded stream to a fragmented MP4 stream.
This code does exactly this task.
However I don't want to write the mp4 onto disk at all, but I need to get a byte buffer or array in C with the contents that would normally written to disk.
How is that achievable?
This sample uses vs_open_output to define the output format and this function needs an output url.
If I would get rid of outputting the contents to disk, how shall I modify this code?
Or there might be better alternatives as well, those are also welcomed.
Update:
As szatmary recommended, I have checked his example link.
However as I stated in the question I need the output as buffer instead of a file.
This example demonstrates nicely how can I read my custom source and give it to ffmpeg.
What I need is how can open the input as standard (with avformat_open_input) then do my custom modification with the packets and then instead writing to file, write to a buffer.
What have I tried?
Based on szatmary's example I created some buffers and initialization:
uint8_t *buffer;
buffer = (uint8_t *)av_malloc(4096);
format_ctx = avformat_alloc_context();
format_ctx->pb = avio_alloc_context(
buffer, 4096, // internal buffer and its size
1, // write flag (1=true, 0=false)
opaque, // user data, will be passed to our callback functions
0, // no read
&IOWriteFunc,
&IOSeekFunc
);
format_ctx->flags |= AVFMT_FLAG_CUSTOM_IO;
AVOutputFormat * const output_format = av_guess_format("mp4", NULL, NULL);
format_ctx->oformat = output_format;
avformat_alloc_output_context2(&format_ctx, output_format,
NULL, NULL)
Then of course I have created 'IOWriteFunc' and 'IOSeekFunc':
static int IOWriteFunc(void *opaque, uint8_t *buf, int buf_size) {
printf("Bytes read: %d\n", buf_size);
int len = buf_size;
return (int)len;
}
static int64_t IOSeekFunc (void *opaque, int64_t offset, int whence) {
switch(whence){
case SEEK_SET:
return 1;
break;
case SEEK_CUR:
return 1;
break;
case SEEK_END:
return 1;
break;
case AVSEEK_SIZE:
return 4096;
break;
default:
return -1;
}
return 1;
}
Then I need to write the header to the output buffer, and the expected behaviour here is to print "Bytes read: x":
AVDictionary * opts = NULL;
av_dict_set(&opts, "movflags", "frag_keyframe+empty_moov", 0);
av_dict_set_int(&opts, "flush_packets", 1, 0);
avformat_write_header(output->format_ctx, &opts)
In the last line during execution, it always runs into segfault, here is the backtrace:
#0 0x00007ffff7a6ee30 in () at /usr/lib/x86_64-linux-gnu/libavformat.so.57
#1 0x00007ffff7a98189 in avformat_init_output () at /usr/lib/x86_64-linux-gnu/libavformat.so.57
#2 0x00007ffff7a98ca5 in avformat_write_header () at /usr/lib/x86_64-linux-gnu/libavformat.so.57
...
The hard thing for me with the example is that it uses avformat_open_input.
However there is no such thing for the output (no avformat_open_ouput).
Update2:
I have found another example for reading: doc/examples/avio_reading.c.
There are mentions of a similar example for writing (avio_writing.c), but ffmpeg does not have this available (at least in my google search).
Is this task really this hard to solve? standard rtsp input to custom avio?
Fortunately ffmpeg.org is down. Great.
It was a silly mistake:
In the initialization part I called this:
avformat_alloc_output_context2(&format_ctx, output_format,
NULL, NULL)
However before this I already put the avio buffers into format_ctx:
format_ctx->pb = ...
Also, this line is unnecessary:
format_ctx = avformat_alloc_context();
Correct order:
AVOutputFormat * const output_format = av_guess_format("mp4", NULL, NULL);
avformat_alloc_output_context2(&format_ctx, output_format,
NULL, NULL)
format_ctx->pb = avio_alloc_context(
buffer, 4096, // internal buffer and its size
1, // write flag (1=true, 0=false)
opaque, // user data, will be passed to our callback functions
0, // no read
&IOWriteFunc,
&IOSeekFunc
);
format_ctx->flags |= AVFMT_FLAG_CUSTOM_IO;
format_ctx->oformat = output_format; //might be unncessary too
Segfault is gone now.
You need to write a AVIOContext implementation.

Why is zlib deflate() hanging?

My issue is that my program hangs on use of zlib's deflate() function.
I first initialize my z_stream, as follows:
int setupGzipOutputStream(z_stream zStream) {
int zError;
zStream.zalloc = Z_NULL;
zStream.zfree = Z_NULL;
zStream.opaque = Z_NULL;
zError = deflateInit(&zStream, Z_COMPRESSION_LEVEL);
/* error handling code to test if zError != Z_OK... */
return EXIT_SUCCESS;
}
I attempt to write data to my z-stream with the following function:
int compressDataToGzipOutputStream(unsigned char *myData, z_stream zStream, Boolean flushZStreamFlag) {
int zError;
int zOutHave;
FILE *outFp = stdout;
unsigned char zBuffer[Z_BUFFER_MAX_LENGTH] = {0};
zStream.next_in = myData;
zStream.avail_in = strlen(myData); /* myData is a null-terminated string */
do {
zStream.avail_out = Z_BUFFER_MAX_LENGTH;
zStream.next_out = zBuffer;
zError = deflate(&zStream, (flushZStreamFlag == kFalse) ? Z_NO_FLUSH : Z_FINISH);
/* error handling code to test if zError != Z_OK... */
zOutHave = Z_BUFFER_MAX_LENGTH - zStream.avail_out;
fwrite(zBuffer, sizeof(unsigned char), zOutHave, outFp);
fflush(outFp);
} while (zStream.avail_out == 0);
return EXIT_SUCCESS;
}
I call these two functions (with simplifications for the purpose of asking this question) as follows:
z_stream zOutStream;
setupGzipOutputStream(zOutStream);
compressDataToGzipOutputStream(data, zOutStream, kFalse);
compressDataToGzipOutputStream(data, zOutStream, kFalse);
...
compressDataToGzipOutputStream(data, zOutStream, kTrue);
I then break down the zOutStream struct with deflateEnd().
The kTrue value on the last compression step sends the Z_FINISH flag to deflate(), instead of Z_NO_FLUSH.
It hangs on the following line:
zError = deflate(&zStream, (flushZStreamFlag == kFalse) ? Z_NO_FLUSH : Z_FINISH);
I then tried using gdb. I set a break at this line, the line where the program hangs.
At this breakpoint, I can see the values of the variables zStream, flushZStreamFlag and others. The zStream variable is not NULL, which I can verify with print zStream, print zStream.next_in, etc. which are populated with my data of interest.
If I type next in gdb, then this line of code is processed and the entire process hangs, which I verify with log statements before and after this line of code. The "before" log statement shows up, but the "after" statement does not.
My question is: Why is deflate() hanging here? Am I not initializing the output stream correctly? Not using deflate() correctly? I've been banging my head on the wall trying to solve this, but no luck. Thanks for any advice you might have.
Your functions should take a pointer to a z_stream, rather than passing the struct in. Your init function is initialising what is effectively a local copy, which will be discarded. Then your compression function will have a garbage z_stream passed to it.
e.g:
int setupGzipOutputStream(z_stream *zStream) {
int zError;
zStream->zalloc = Z_NULL;
...
}
... etc.
It also looks like your compression function is not taking into account the null on the end of the string, so that might cause you problems when you try to re-inflate your data.
zStream.avail_in = strlen(myData);
Might want to be:
zStream.avail_in = strlen(myData) + 1;

how to get off with the libxml2 error messages

I want to use the libxml2 lib to parse my xml files.
Now, when I have some bad xml file the lib itself is printing large error messages.
below is some sample code
reader = xmlReaderForFile(filename, NULL, 0);
if (reader != NULL) {
ret = xmlTextReaderRead(reader);
while (ret == 1) {
printf("_________________________________\n");
processNode(reader);
ret = xmlTextReaderRead(reader);
printf("_________________________________\n");
}
xmlFreeTextReader(reader);
if (ret != 0) {
fprintf(stderr, "%s : failed to parse\n", filename);
}
}
In above example, if I have bad xml file, I get error like this
my.xml:4: parser error : attributes construct error
include type="text"this is text. this might be excluded in the next occurrence
my.xml:4: parser error : Couldn't find end of Start Tag include
include type="text"this is text. this might be excluded in the next occurrence
my.xml : failed to parse
Instead, I just want to return some error no. and get off with this ugly lib messages.
what do I do ?
The last parameter to xmlReaderForFile(filename, NULL, 0); is a set of option flags. Reading the documentation for these flags, I see there are two options you might want to set: XML_PARSE_NOERROR and XML_PARSE_NOWARNING. Note that I haven't tried any of this, I just Googled libxml2 and xmlReaderForFile.
You will need to or the flags together like this:
reader = xmlReaderForFile(filename, NULL, XML_PARSE_NOERROR | XML_PARSE_NOWARNING);

parsing for xml values

I have a simple xml string defined in the following way in a c code:
char xmlstr[] = "<root><str1>Welcome</str1><str2>to</str2><str3>wonderland</str3></root>";
I want to parse the xmlstr to fetch all the values assigned to str1,str2,str3 tags.
I am using libxml2 library. As I am less experienced in xml handling, I unable get the values of the required tags. I tried some sources from net, but I am ending wrong outputs.
Using the libxml2 library parsing your string would look something like this:
char xmlstr[] = ...;
char *str1, *str2, *str3;
xmlDocPtr doc = xmlReadDoc(BAD_CAST xmlstr, "http://someurl", NULL, 0);
xmlNodePtr root, child;
if(!doc)
{ /* error */ }
root = xmlDocGetRootElement(doc);
now that we have parsed a DOM structure out of your xml string, we can extract the values by iterating over all child values of your root tag:
for(child = root->children; child != NULL; child = child->next)
{
if(xmlStrcmp(child->name, BAD_CAST "str1") == 0)
{
str1 = (char *)xmlNodeGetContent(child);
}
/* repeat for str2 and str3 */
...
}
I usual do xml parsing using minixml library
u hope this will help you
http://www.minixml.org/documentation.php/basics.html

Resources