Fast 'C' library to tranparently manage very large files - c

I need to save very large amounts of data (>500GB) which is being streamed (800Mb/s) from another device connected to my PC. The speed rules out use of a database e.g. MySQl/ISAM and I am looking for a fast, light library which sits on top of the 'C' stdio file lib (i.e. fopen/fclose/fwrite) which will allow me to write/read a very large file (up to available disk-space).
Behind-the-scenes, the large file can be broken up into smaller files e.g. 1GB and I want the API to take care of these details.
The data arrives at the PC in a compressed binary format and no further processing is needed before writing it to the hard-disk.
The library should be work for Windows and Linux.

if you need random access into the data, take a look at memory mapped files.
It lets you map a file (or a section of a file) into memeory transparently, without having to explicitly allocate memeory and read data. It works on windows/Linux (there is a boost lib that wraps the differences).
On Windows you can handle files >>4gb on a 32bit os by using multiple windows into the file.
edit: Sorry 800Mb/s !! I don't know any disks that can cope with that. You migth be lookign at a raid array of SSD drives.
There used to be image capture cards that used an attached drive as a simple series of bytes with no filesystem to get very high speed sustained writes. I don't know if you are going to need somethign like that.

For ultimate speed, I suggest you go highly platform specific.
The objective is to get as close as you can to connecting the input device directly to hard drive. One method is to write a driver for the input device that writes directly to the hard drive.
The generic algorithm is to use either a very large circular byte buffer or use multiple buffers. You need extra space to compensate for the speed difference between the input device and the output device; provided the input device is non-stop.
If you can pause the input device, the issue becomes easier.

Related

How does GNU Radio File Sink work?

I want to know how the file sink in GNU Radio works. Does it receive a signal and then write it to the file, and while it's being written signal receiving is not done?
I just want to make sure if some portion of the signal is lost without being written to the file because of the time taken for writing.
Any help or reading material regarding this would be very much appreciated.
Depending the sampling rate of the device, writing samples to file without discontinuities may be impossible.
Instead writing to disk, you can write the samples in a ramdisk. Ramdisk is an abstraction of file storage, using the RAM memory as storage medium. The great advantage of the ramdisk is the very fast read/write data transfers. However, the file size is limited somehow by the amount of RAM memory that the host has.
Here is a good article that will help you to create a ramdisk under Linux. I am sure that you will easily find a guide for Windows too.
A file sink won't normally block your radio source as long as the average write speed exceeds the radio blocks output speed. There are internal buffers that can smooth things out a little bit, but if your disk fills up then the rest of your flowgraph will stall.
If you're not seeing "O" messages in the output console, you're not dropping samples.

Is there a Win32 API to copy a fragment of a file on another file?

I would like to programmatically copy a section of a file on another file. Is there any Win32 API I could use without moving bytes thru my program? Or should I just read from the source file and write on the target?
I know how to do this by reading and writing chunks of bytes, I just wanted to avoid doing it myself if the OS already offers that.
What you're asking for can be achieved, bot not easily. Device drivers routinely transfer data without CPU involvement, but doing that requires kernel mode code. Basically, you would have to write a device driver. The benefits would have to be huge to justify the difficulties associated with developing, testing, and distributing a kernel mode driver. So unless you think there is huge benefit at stake here, I'm afraid that ReadFile/WriteFile are the best you can do.

Is it worth to implement small filesystem for an EEPROM

I have bought an I2C EEPROM. I want to store sensor and voltage data. I'm assuming that value can be bigger than one byte, and there can be a lot of data. Is it worth is such case to implement a filesystem with small file allocation table? It would make me easier to peek trought EEPROM for example.
I see two causes for a FAT on EEPROM
If there is a requirement for the flexibility of having different files. Such as
for data logging or configurations. It allows multiple such configuration/log files, to be independent and easily added in the future. This can be a very successful building block for future projects.
For ease of access by other devices or libraries. Typically only an option if the memory device is directly accessible by other interface. Where as in this case it is an EEPROM. If your device was directly USB capable, such as a ATmega32u4 (leo) then you can use LUFA tools to have the USB show up as MASS storage. Making FAT an ideal solution. Or possibly if the device has an Ethernet Shield.
with all being said and if this case simply a datalogger, then the KISS (Keep It Simple Solution) may be a good way to go. So that one can keep focus on the original subject for collecting the data itself.
It is worth noting that SdCards can be easily added for cheap either of the well established Sd library (IDE stock) or SdFat Library (GitHub more features) adding an almost infinite capacity of logging of FAT32. The only trade off is they consume a fair chunk of code space.
I think mpflaga is on the right path.
Some options you should consider include:
Is the device/microcontroller that is writing the data going to be the same as the one that is reading the data?
How many records are you hoping to fit into your storage device?
How robust/recoverable do you want your storage format to be to events such as reboots/power outages/etc?
My opinion regarding these points is that:
It is going to be the same device reading and writing, so you can probably get away with a very specific/custom format rather than a full blown file system.
You probably want to extract as many bytes as possible for use as storage, so a format well-designed for your application will probably help.
This is tricky. You could use self-describing structures, such as a TLV, which would pack your bytes tightly but be harder to search; OR you could use a fixed-length structure, which wastes a lot of bytes but allows easy access. Also, you could just assume the storage will always remain valid, but what happens if power is removed half-way through a write!
Overall, my recommendation would be:
Use an existing library
Use an application-specific format first, but ensure you abstract the storage of the data from the data itself.
If you find you need a filesystem, rewrite the storage layer to use a filesystem.
Having a small standard file system, like FAT16, is worth implementing because you can map this file system over the USB or Network to other devices/computers.
Standardization in your design is a big compliance advantage.
You can find ready sources/libraries or, if it's FAT16 and because it is really simple and well described/documented, try implementing yourself.

reading and writing hard disk directly in win32api like the biosdisk or absread in ms-dos

I had been playing with the disk drives 12bit FAT (FAT12) and 16bit FAT(FAT16) in C language (Turbo C) which runs under the 16 bit OS MS-DOS.
I was able to manipulate sectors directly.
FAT32 was little complicated because the sectors are stored like linked list unlike other FAT lower than FAT32.
I want to read write hard disks, USB Disks directly using 32 bit C language (win32 api).
I saw some code and it were using /device/ to access a disk where as in biosdisk the disks were numbered from 0 onwards i think. i was manipulating like heads, sectors, cylinders ...
Please advice on how to read write hard disks directly sector by sector or how to read write hdd in low level.
do i have to go for assembly language?
EDIT
one scenario why i need to directly manipulate the hard disk is i want to write a file maintaining my own FAT even hiding it from the FileSystem but marking those sectors as used. So it is just hiding a file from the other users, the operating system and even me except the program i write which can only access those files. this is just one point and the others would be just playing around. :)
If you use WinAPI, then you open raw disk device using CreateFile() API (see Physical Disks and Volumes section there) and then use ReadFile() and WriteFile() methods to read and write data.
Note, however, that recent versions of Windows (Vista, Windows 7) restrict your access even if you are an administrator. Our RawDisk product lets you bypass these restrictions. Free non-commercial licenses are available for RawDisk.

Accessing same resource across restarts in Windows

I will write some thing in a file/memory just before system shutdown or a service shutdown. In the next restart of system, Is it possible to access same file or same memory on the disk, before filesystem loads? Actual requirement is like this, we have a driver that sits between volume level drivers and filesystem driver...in that part of the driver code, I want to access some memory or file.
Thanks & Regards,
calvin
The logical thing here is to read/write this into the registry if it is not too big. Is there a reason you do not want to use the registry?
If you need to access large data and you are writing a volume or device filter and cannot rely on ZwOpen/Read/Write/Close functions in the kernel an approach would be to create the file in user mode, get its device name and cluster chain and store them in the registry. On the next boot, you can get the device and clusters from registry, and do direct I/O on them.
Since you want to access this before the filesystem loads, my first thought is to allocate and use a block of storage space on the hard drive outside of the filesystem. You can create a hidden mini-partition on the drive and use low-level I/O commands to read and write your data.
This is a common task in the world of embedded systems, and we often implement it by adding some sort of non-volatile memory device into the system (flash, battery-backed DRAM, etc) and reading and writing to that device. Since you likely don't have the same level of control over the available hardware as embedded developers do, the closest analogue I can think of would be to reserve a chunk of space on a physical disk that you can read from without having to mount as a filesystem. A dedicated mini-partition might work the best because if you know the size of it, you can treat it as one big raw-access buffer and can avoid having to hassle with filenames, filesystems, etc.

Resources