What is an efficient way of breaking data into chunks in C? - c

I've been searching for hours and my google-fu has failed me so thought I'd just ask. Is there an easy and efficient way of breaking data into small chunks in C?
For example. If I collect a bunch of info from somewhere; database, file, user input, whatever. Then maybe use a serialization library or something to create a single large object in memory. I have the pointer to this object. Let's say somehow this object ends up being like... 500 kb or something. If your goal was to break this down into 128 byte sections. What would you do? I would like a kind of general answer, whether you wanted to send these chunks over network, store them in a bunch of little files, or pass them through a looped process or something. If there is not a simple process for all, but if there does exist some for specific use cases, that'd be cool to know too.
What has brought this question about: I've been learning network sockets and protocols. I often see discussion about packet fragmentation and the like. Lots of talk about chunking things and sending them in smaller parts. But I can never seem to find what they use to do this before they move on to how they send it over the network, which seems like the easy part... So I started wondering how large data would manually be broken up into chunks to send small bits over the socket at a time. And here we are.
Thanks for any help!

Is there an easy and efficient way of breaking data into small chunks in C?
Data is practically a consecutive sequence of bytes.
You could use memmove to copy or move it and slice it in smaller chunks (e.g. of 1024 bytes each). For non-overlapping data, consider memcpy. In practice, a byte is often a char (perhaps an unsigned char or a signed char) but see also the standard uint8_t type and related types. In practice, you can cast void* from or to char* on Von Neumann architectures (like x86 or RISC-V).
Beware of undefined behavior.
In practice I would recommend organizing data at a higher level.
If your operating system is Linux or Windows or MacOSX or Android, you could consider using a database library such as sqlite (or indexed files à la Tokyo Cabinet). It is open source software, and doing such slicing at the disk level.
If you have no operating system and your C code is freestanding (read the C11 standard n1570 for the terminology) things are becoming different. For example, a typical computer mouse contains a micro-controller whose code is mostly in C. Look into Arduino for inspiration (and also RaspBerryPi). You'll have to then handle data at the bit level.
But I can never seem to find what they use to do this before they move on to how they send it over the network, which seems like the easy part...
You'll find lots of open source network code.
The Linux kernel has some. FreeRTOS has some. FreeBSD has some. Xorg has some. Contiki has some. OSdev links to more resources (notably on github or gitlab). You could download such source code and study it.
You'll find many HTTP (libonion, libcurl, etc...) or SMTP (postfix, vmime, etc...) related networking open source programs on Linux etc... And other network programs (PostGreSQL, etc...). Study their source code

Related

Coding C libraries for an Operating System

I am trying to create a DOS-like OS. I have read multiple articles, read multiple books (I even paid a lifetime subscription for O'Reilly Media), but to no avail, I found nothing useful. I want to learn how to make operating system libraries, which rises the question which is: are libraries which you code for a program the same if you are compiling it for an operating system?
I know Operating Systems are very challenging to make and the very few programmers that do attempt to make one never produce a functioning one which is why it's described as "the great pinnacle of programming.". But still, I'm going to make an attempt at making one (just for fun, and maybe learn a few pointers on the way).
All I need to do this is basically learning how to make the libraries, C (which I already know and love), assembly (which I kind-of know how to use along with C) and a compiler (I use the GNU toolchain). The only thing I am having trouble with are coding the libraries. I'm like wow right now, who knew that coding libraries are so hard, like point me to a book or something! But no. I'm not asking for a book right here, all I'm asking for is some advice on how to do this like:
How do you start making some basic I/O libraries
Is it the same as making a regular C library
And finally, is it going to be hard? (JK I know already that this is going to be extremely hard which is why I prepared so much)
In summary, the main question is, how I can make this work or is there a pre-built library that would most likely speed up the process?
Are libraries which you code for a program the same if you are compiling it for an operating system?
Absolutely not. A user-space C library at its lowest level makes system calls to an operating system to interact with hardware via device drivers; it is the device driver and interaction with hardware you will be writing.
From my experience doing embedded system bringups, the way you start is with a development board with a legacy RS-232 port. It's about the easiest possible device to write a driver for - you write bytes to a memory mapped IO address, wait a bit then write some more. This is where your first debug output goes too.
You might find yourself waggling IO pins and probing them with a logic analyser or DSO on the route to this though - hence why you want a development board where the signals are accessible.
None of the standard C-library will be available to you - so you'll need to equivalents of some of things it provides - but in kernel space - including type definitions, memory management, and intrinsics the compiler expects - particularly those for memory barriers. The C-library doesn't provide any data structures or algorithms anyway, but you'll definitely be wanting to write some early on.

Is there a way to get battery info (status, plugged in, etc) without reading a proc/sys file on linux?

I want to get information about the battery in C on linux. I don't want to read or parse any file! Is there any low-level interface to acpi/the kernel or any other module to get the information I want to have?
I already searched the web, but every question results in the answer "parse /proc/foo/bar". I really don't want to do this because I think, low-level interfaces won't change as fast as Files do.
best regards.
The /proc filesystem does not exist on a disk. Instead, the kernel creates it in memory. They are generated on-demand by the kernel when accessed. As such, your concerns are invalid -- the /proc files will change as quickly as the kernel becomes aware of changes.
Check this for more info about /proc file system.
In any case, I don't believe there's any alternative interface.
You might be looking for UPower: http://upower.freedesktop.org/
This is a common need for both desktop environments and mobile devices, so there have been many solutions over time. For example, one of the oldest ones was acpid, which is pretty much obsolete now.
While I'd recommend using a light-weight abstraction like UPower for code clarity reasons, the files in /proc and (to some extent) /sys are considered part of the Linux kernel ABI, which means that changing them is generally frowned upon.

File in both KLM and user space

I remembering reading this concept somewhere. I do not remember where though.
I have a file say file.c, which along with other files I compile along with some other files as a library for use by applications.
Now suppose i compile the same file and build it with a Kernel module. Hence now the same file object is in both user space and kernel space and it allows me to access kernel data structures without invoking a system call. I mean i can have api's in the library by which applications can access kernel data structures without system calls. I am not sure if I can write anything into the kernel (which i think is impossile in this manner), but reading some data structures from kernel this way would be fine?
Can anyone give me more details about this approach. I could not find anything in google regarding this.
I believe this is a conceptually flawed approach, unless I misunderstand what you're talking about.
If I understand you correctly, you want to take the same file and compile it twice: once as a module and once as a userspace program. Then you want to run both of them, so that they can share memory.
So, the obvious problem with that is that even though the programs come from the same source code, they would still exist as separate executables. The module won't be its own process: it only would get invoked when the kernel get's going (i.e. system calls). So by itself, it doesn't let you escape the system call nonsense.
A better solution depends on what your goal is: do you simply want to access kernel data structures because you need something that you can't normally get at? Or, are you concerned about performance and want to access these structures faster than a system call?
For (1), you can create a character device or a procfs file. Both of these allow your userspace programs to reach their dirty little fingers into the kernel.
For (2), you are in a tough spot, and the problem gets a lot nastier (and more insteresting). To solve the speed issue, it depends a lot on what exact data you're trying to extract.
Does this help?
There are two ways to do this, the most common being what's called a Character Device, and the other being a Block Device (i.e. something "disk-like").
Here's a guide on how to create drivers that register chardevs.

Reading complex binary file formats

Is there any book or tutorial that can learn me how to read binary files with a complex structure. I did a lot of attempts to make a program that has to read a complex file format and save it in a struct. But it always failed because of heap overruns etc. that made the program crash.
Probably your best bet is to look for information on binary network protocols rather than file formats. The main issues (byte order, structure packing, serializing and unserializing pointers, ...) are the same but networking people tend to be more aware of the issues and more explicit in how they are handled. Reading and writing a blob of binary to or from a wire really isn't much different than dealing with binary blobs on disk.
You could also find a lot of existing examples in open source graphics packages (such as netpbm or The Gimp). An open source office package (such as LibreOffice) would also give you lots of example code that deals with complex and convoluted binary formats.
There might even be something of use for you in Google's Protocol Buffers or old-school ONC RPC and XDR.
I don't know any books or manuals on such things but maybe a bunch of real life working examples will be more useful to you than a HOWTO guide.
One of the best tools to debug memory access problems is valgrind. I'd give that a try next time. As for books, you'd need to be more specific about what formats you want to parse. There are lots of formats and many of them are radically different from each other.
Check out Flavor. It allows you to specify the format using C-like structure and will auto-generate the parser for the data in C++ or Java.

How to implement a very simple filesystem?

I am wondering how the OS is reading/writing to the hard drive.
I would like as an exercise to implement a simple filesystem with no directories that can read and write files.
Where do I start?
Will C/C++ do the trick or do I have to go with a more low level approach?
Is it too much for one person to handle?
Take a look at FUSE: http://fuse.sourceforge.net/
This will allow you to write a filesystem without having to actually write a device driver. From there, I'd start with a single file. Basically create a file that's (for example) 100MB in length, then write your routines to read and write from that file.
Once you're happy with the results, then you can look into writing a device driver, and making your driver run against a physical disk.
The nice thing is you can use almost any language with FUSE, not just C/C++.
I found it quite easy to understand a simple filesystem while using the fat filesystem on the avr microcontroller.
http://elm-chan.org/fsw/ff/00index_e.html
Take look at the code you will figure out how fat works.
For learning the ideas of a file system it's not really necessary to use a disk i think. Just create an array of 512 byte byte-arrays. Just imagine this a your Harddisk an start to experiment a bit.
Also you may want to hava a look at some of the standard OS textbooks like http://codex.cs.yale.edu/avi/os-book/OS8/os8c/index.html
The answer to your first question, is that besides Fuse as someone else told you, you can also use Dokan that does the same for Windows, and from there is just a question of doing Reads and Writes to a physical partition (http://msdn.microsoft.com/en-us/library/aa363858%28v=vs.85%29.aspx (read particularly the section on Physical Disks and Volumes)).
Of course that in Linux or Unix besides using something like Fuse you only have to issue, a read or write call to the wanted device in /dev/xxx (if you are root), and in these terms the Unices are more friendly or more insecure depending on your point of view.
From there try to implement a simple filesystem like Fat, or something more exoteric like an tar filesystem, or even some simple filesystem based on Unix concepts like UFS or Minux, or just something that only logs the calls that are made and their arguments to a log file (and this will help you understand, the calls that are made to the filesystem driver during the regular use of your computer).
Now your second question (that is much more simple to answer), yes C/C++ will do the trick, since they are the lingua franca of system development, also a lot of your example code will be in C/C++ so you will at least read C/C++ in your development.
Now for your third question, yes, this is doable by one person, for example the ext filesystem (widely known in Linux world by it's successors as ext2 or ext3) was made by a single developer Theodore Ts'o, so don't think that these things aren't doable by a single person.
Now the final notes, remember that a real filesystem interacts with a lot of other subsystems in a regular kernel, for example, if you have a laptop and hibernate it the filesystem has to flush all changes made to the open files, if you have a pagefile on the partition or even if the pagefile has it's own filesystem, that will affect your filesystem, particularly the block sizes, since they will tend to be equal or powers of the page block size, because it's easy to just place a block from the filesystem on memory that by coincidence is equal to the page size (because that's just one transfer).
And also, security, since you will want to control the users and what files they read/write and that usually means that before opening a file, you will have to know what user is logged on, and what permissions he has for that file. And obviously without filesystem, users can't run any program or interact with the machine. Modern filesystem layers, also interact with the network subsystem due to the fact that there are network and distributed filesystems.
So if you want to go and learn about doing kernel filesystems, those are some of the things you will have to worry about (besides knowing a VFS interface)
P.S.: If you want to make Unix permissions work on Windows, you can use something like what MS uses for NFS on the server versions of windows (http://support.microsoft.com/kb/262965)

Resources