Is there any book or tutorial that can learn me how to read binary files with a complex structure. I did a lot of attempts to make a program that has to read a complex file format and save it in a struct. But it always failed because of heap overruns etc. that made the program crash.
Probably your best bet is to look for information on binary network protocols rather than file formats. The main issues (byte order, structure packing, serializing and unserializing pointers, ...) are the same but networking people tend to be more aware of the issues and more explicit in how they are handled. Reading and writing a blob of binary to or from a wire really isn't much different than dealing with binary blobs on disk.
You could also find a lot of existing examples in open source graphics packages (such as netpbm or The Gimp). An open source office package (such as LibreOffice) would also give you lots of example code that deals with complex and convoluted binary formats.
There might even be something of use for you in Google's Protocol Buffers or old-school ONC RPC and XDR.
I don't know any books or manuals on such things but maybe a bunch of real life working examples will be more useful to you than a HOWTO guide.
One of the best tools to debug memory access problems is valgrind. I'd give that a try next time. As for books, you'd need to be more specific about what formats you want to parse. There are lots of formats and many of them are radically different from each other.
Check out Flavor. It allows you to specify the format using C-like structure and will auto-generate the parser for the data in C++ or Java.
Related
I've been searching for hours and my google-fu has failed me so thought I'd just ask. Is there an easy and efficient way of breaking data into small chunks in C?
For example. If I collect a bunch of info from somewhere; database, file, user input, whatever. Then maybe use a serialization library or something to create a single large object in memory. I have the pointer to this object. Let's say somehow this object ends up being like... 500 kb or something. If your goal was to break this down into 128 byte sections. What would you do? I would like a kind of general answer, whether you wanted to send these chunks over network, store them in a bunch of little files, or pass them through a looped process or something. If there is not a simple process for all, but if there does exist some for specific use cases, that'd be cool to know too.
What has brought this question about: I've been learning network sockets and protocols. I often see discussion about packet fragmentation and the like. Lots of talk about chunking things and sending them in smaller parts. But I can never seem to find what they use to do this before they move on to how they send it over the network, which seems like the easy part... So I started wondering how large data would manually be broken up into chunks to send small bits over the socket at a time. And here we are.
Thanks for any help!
Is there an easy and efficient way of breaking data into small chunks in C?
Data is practically a consecutive sequence of bytes.
You could use memmove to copy or move it and slice it in smaller chunks (e.g. of 1024 bytes each). For non-overlapping data, consider memcpy. In practice, a byte is often a char (perhaps an unsigned char or a signed char) but see also the standard uint8_t type and related types. In practice, you can cast void* from or to char* on Von Neumann architectures (like x86 or RISC-V).
Beware of undefined behavior.
In practice I would recommend organizing data at a higher level.
If your operating system is Linux or Windows or MacOSX or Android, you could consider using a database library such as sqlite (or indexed files à la Tokyo Cabinet). It is open source software, and doing such slicing at the disk level.
If you have no operating system and your C code is freestanding (read the C11 standard n1570 for the terminology) things are becoming different. For example, a typical computer mouse contains a micro-controller whose code is mostly in C. Look into Arduino for inspiration (and also RaspBerryPi). You'll have to then handle data at the bit level.
But I can never seem to find what they use to do this before they move on to how they send it over the network, which seems like the easy part...
You'll find lots of open source network code.
The Linux kernel has some. FreeRTOS has some. FreeBSD has some. Xorg has some. Contiki has some. OSdev links to more resources (notably on github or gitlab). You could download such source code and study it.
You'll find many HTTP (libonion, libcurl, etc...) or SMTP (postfix, vmime, etc...) related networking open source programs on Linux etc... And other network programs (PostGreSQL, etc...). Study their source code
I'm encoding images as video using FFmpeg using custom C code rather than linux commands because I am developing the code for an embedded system.
I am currently following through the first dranger tutorial and the code provided in the following question.
How to encode a video from several images generated in a C++ program without writing the separate frame images to disk?
I have found some "less abstract" code in the following github location.
https://github.com/FFmpeg/FFmpeg/blob/master/doc/examples/encode_video.c
And I plan to use it as well.
My end goal is simply to save video on an embedded system using embedded C source code, and I am coming up the curve too slowly. So in summary my question is, Does it seem like I am following the correct path here? I know that my system does not come with hardware for video codec conversion, which means I need to do it with software, but I am unsure if FFmpeg is even a feasible option for embedded work because I am yet to compile.
The biggest red flag for me thus far is that FFmpeg uses dynamic memory allocation. I am unfamiliar with how to assess the amount of dynamic memory that it uses. This is very important information to me, and if anyone is familiar with the amount of memory used or how to assess it before compiling, I would greatly appreciate the input.
After further research, it seems to me that encoding video is often a hardware intensive task that can use multiple processors and mega-gigbyte sizes of RAM. In order to avoid this I am performing a minimal amount of compression by utilizing the AVI format.
I have found that FFmpeg can't readily be utilized for raw-metal embedded systems because the initial "make" of the library sets up configuration settings specific to the computer compiling, which conflicts with the need to cross compile. I can see that there are cross compilation flags available, but I have not found any documentation describing how to use them. Either way I want to avoid big heaps and multi-threading, so I moved on.
I decided to look for more basic source code elsewhere. mikekohn.net/file_formats/libkohn_avi.php Is a great resource for very basic encoding without any complicated library dependencies or multi-threading. I am yet to implement, so no guarantees, but best of luck. This is actually one of the only understandable encoding source codes that I have found for image to video applications, other than https://www.jonolick.com/home/mpeg-video-writer. However, Jon Olick's source code uses lossy encoding and a minimum framerate (inherent to MPEG), both of which I am trying to avoid.
The longer I'm working as a C developer I find myself lacking some source of middle sized code chunks.
I have source of code snippets and libraries, but I can't find a good source for code sized in between. Something that is a header, or a header+implementation file but isn't a library but is included into the project.
Stuff like a dynamic array, or linked list or some debugging or logging helpers.
I know that its partially due to C developers DIY mentality, but I just don't believe that people don't share stuff like this.
You might want to check out http://nothings.org for some single file (moderately sized) projects that include (image) decompression, font rasterization and other useful things.
You may also want to look at CCAN.
http://www.koders.com/ is worth checking. You might find something usefull now and then.
You can also sort the results by license which is pretty handy feature.
There's a handful of utility libraries that spring to mind quickly; glib provides a wide variety of useful little utilities, including:
doubly- and singly-linked lists, hash tables, dynamic strings and string utilities, such as a lexical scanner, string chunks (groups of strings), dynamic arrays, balanced binary trees, N-ary trees
(And yes, glib is useful even in non-graphical environments; don't let its GNOME-background fool you. :)
The Apache portable runtime is a library that helps abstract away platform-specific knowledge; I've seen a handful of programs use it. It feels like enough programmers are content with "It runs on Linux" to not really worry about platform differences, and forgo learning Yet Another Library as a result. It feels more like a systems-level toolkit:
Memory allocation and memory pool functionality, Atomic operations, Dynamic library handling, File I/O, Command argument parsing, Locking, Hash tables and arrays, Mmap functionality, Network sockets and protocols, Thread, process and mutex functionality, Shared memory functionality, Time routines, User and group ID services
I always look at the Python (C) source code first when I am looking for the "best" way to code something up in C. Guido van Rossum's C coding style is concise and clear and given the number functions and features supported in the standard python libraries there is nearly always a useful/relevent snippet of code in there.
Is there a standard way of reading a kind of configuration like INI files for Linux using C?
I am working on a Linux based handheld and writing code in C.
Otherwise, I shall like to know about any alternatives.
Final update:
I have explored and even used LibConfig. But the footprint is high and my usage is too simple. So, to reduce the footprint, I have rolled out my own implementation. The implementation is not too generic, in fact quite coupled as of now. The configuration file is parsed once at the time of starting the application and set to some global variables.
Try libconfig:
a simple library for processing structured configuration files, like this one: test.cfg. This file format is more compact and more readable than XML. And unlike XML, it is type-aware, so it is not necessary to do string parsing in application code.
Libconfig is very compact — a fraction of the size of the expat XML parser library. This makes it well-suited for memory-constrained systems like handheld devices.
The library includes bindings for both the C and C++ languages. It works on POSIX-compliant UNIX and UNIX-like systems (GNU/Linux, Mac OS X, Solaris, FreeBSD), Android, and Windows (2000, XP and later)...
No, there isn't one standard way. I'm sorry, but that is probably the most precise answer :)
You could look at this list of Linux configuration file libraries, though. That might be helpful.
Here are four options:
Iniparser
libini
sdl-cfg
RWini
If you can use the (excellent, in any C-based application) glib, it has a key-value file parser that is suitable for .ini-style files. Of course, you'd also get access to the various (very nice) data structures in glib, "for free".
There is an updated fork of iniparser at ccan, the original author has not been able to give it much attention over the years. Disclaimer - I maintain it.
Additionally, iniparser contains a dictionary that is very useful on its own.
If you need a fast and small code just for reading config files I suggest the inih
It loads the config file content just once, parse the content and calls a callback function for each key/value pair.
Really small. It can be used on embedded systems too.
I hate to suggest something entirely different in suggesting XML, but libexpat is pretty minimal, but does XML.
I came to this conclusion as I had the same question as you did, but then I realized the project already had libexpat linked-in--and I should probably just use that.
I'm looking into a mechanism for serialize data to be passed over a socket or shared-memory in a language-independent mechanism. I'm reluctant to use XML since this data is going to be very structured, and encoding/decoding speed is vital. Having a good C API that's liberally licensed is important, but ideally there should be support for a ton of other languages. I've looked at google's protocol buffers and ASN.1. Am I on the right track? Is there something better? Should I just implement my own packed structure and not look for some standard?
Given your requirements, I would go with Google Protocol Buffers. It sounds like it's ideally suited to your application.
You could consider XDR. It has an RFC. I've used it and never had any performance problems with it. It was used in ONC RPC and has an and comes with a tool called rpcgen. It is also easy to create a generator yourself when you just want to serialize data (which is what I ended up doing for portability reasons, took me half a day).
There is an open source C implementation, but it can already be in a system library, so you wouldn't need the sources.
ASN.1 always seemed a bit baroque to me, but depending on your actual needs might be more appropriate, since there are some limitations to XDR.
Just wanted to throw in ASN.1 into this mix. ASN.1 is a format standard, but there's libraries for most languages, and the C interface via asn1c is much cleaner than the C interface for protocol buffers.
JSON is really my favorite for this kind of stuff. I have no prior experience with binary stuff in it though. Please post your results if you are planning on using JSON!
Thrift is a binary format created by Facebook. Here's a comparison with google protocol buffers.
Check out Hessian
There is also Binary XML but it seems not stabilized yet. The article I link to gives a bunch of links which might be of interest.
Another option is SNAC/TLV which is used by AOL in it's Oscar/AIM protocol.
Also check out Muscle. While it does quite a bit, it serializes to a binary format.
Few Thing's you need to Consider
1. Storage
2. Encoding Style (1 byte 2 byte)
3. TLV standards
ASN.1 Parser is the good for binary represenations the best part is ASN.1 is a well-established technology that is widely used both within ITU-T and outside of it. The notation is supported by a number of software vendors.