How does GNU Radio File Sink work? - file

I want to know how the file sink in GNU Radio works. Does it receive a signal and then write it to the file, and while it's being written signal receiving is not done?
I just want to make sure if some portion of the signal is lost without being written to the file because of the time taken for writing.
Any help or reading material regarding this would be very much appreciated.

Depending the sampling rate of the device, writing samples to file without discontinuities may be impossible.
Instead writing to disk, you can write the samples in a ramdisk. Ramdisk is an abstraction of file storage, using the RAM memory as storage medium. The great advantage of the ramdisk is the very fast read/write data transfers. However, the file size is limited somehow by the amount of RAM memory that the host has.
Here is a good article that will help you to create a ramdisk under Linux. I am sure that you will easily find a guide for Windows too.

A file sink won't normally block your radio source as long as the average write speed exceeds the radio blocks output speed. There are internal buffers that can smooth things out a little bit, but if your disk fills up then the rest of your flowgraph will stall.
If you're not seeing "O" messages in the output console, you're not dropping samples.

Related

Is it wrong to log inner working of an IoT device in case of failure?

I'm currently working on an IoT project and I want to log the execution of my software and hardware.
I want to log them then send them to some DB in case I need to have a look at my device remotely.
The wip IoT device will have to be as minimal as possible so the act of having to write very often inside a flash memory module seems weird to me.
I know that it will run the RTOS OS Nucleus on an Cortex-M4 with some modules connected through SPI.
Can someone with more expertise enlighten me ?
Thanks.
You will have to estimate your hourly/daily/whatever data volume that needs to go into the log and extrapolate to the expected lifetime of your product. Microcontroller flash usually isn't made for logging and thus it features neither enduring flash cells (some 10K-100K write cycles usually compared to 1M or more for dedicated data chips - look it up in the uC spec sheet) nor wear leveling. Wear leveling is any method which prevents software from writing to the same physical cell too frequently (which would e.g. be the directory for a simple file system).
For your log you will have to create a quite clever or complex method to circumvent any flash lifetime problems.
But the problems don't stop there: usually the MCU isn't able to read from Flash memory when writing to it where "writing" means a prolonged (several microseconds up to milliseconds depending on the chip) sequence of instructions controlling the internal Flash statemachine (programming voltage, saturation times, etc.) until the new values have reliably settled in the memory. And, maybe you guessed it, "reading" in this context also means reading instructions, that is you have to make sure that whichever code and interrupts that may occur during the Flash write are only executing code in RAM, cache or other memories and not in the normal instruction memory. It is doable but the more complex the SW system that you are running above the HW layer, the less likely it will work reliably.

Kafka writes data directly on disk?

I’m looking at Kafka documentation, in particular at Persistence section:
kafka doc - persistence section
If I understood in the last lines it says that Kafka writes data on disks as they arrive instead of use RAM. It sounds really strange to me (writes on disks are not heavy operations?) but clearly I trust kafka developers. First of all I would like to have a confirm of that.
Then, assuming it and to verify it I executed a simple task with a data stream of 500kb/s for some minutes on a machine with 4GB-200GB and I produced graphs of ram memory usage(%) and disk space usage (MB). You can find a pic here:
RAM : https://ibb.co/mzYD5m
DISK SPACE: https://ibb.co/coAMrR
(The stream is ingested at second 125 and finish at second around 870)
Accordingly to what I understood, I expected to see a linear decreasing graph (due to the gradually occupation of space as data arrive) about disk space usage, instead I’m not able to explain why are showed those plain regions which indicate that no other space is occupied in the correspondent seconds.
Moreover, continuing in the doc, there is the section:
linux flush behaviour
which seems to explain an opposite behaviour respect to the "Persistence" section. It said Linux use a pagecache (stored in the RAM I suppose) to provide a disk cache. This could explain the presence of the plain regions in the second graph but it goes against the principle of Kafka of avoid writes on volatile memory.
I'm really confused.
Thank you,
Andrea
Kafka always writes directly to disk, but remember one thing the I/O operations are really carried out by the Operating System. In case of Linux it seems the data is written to the page cache until it can be written to the disk. Kafka has done its job of assigning the operating system the data to be written to the disk, but it it is the operating system which decides when and how to write the data.
Hope that answers your question.

Is there a Win32 API to copy a fragment of a file on another file?

I would like to programmatically copy a section of a file on another file. Is there any Win32 API I could use without moving bytes thru my program? Or should I just read from the source file and write on the target?
I know how to do this by reading and writing chunks of bytes, I just wanted to avoid doing it myself if the OS already offers that.
What you're asking for can be achieved, bot not easily. Device drivers routinely transfer data without CPU involvement, but doing that requires kernel mode code. Basically, you would have to write a device driver. The benefits would have to be huge to justify the difficulties associated with developing, testing, and distributing a kernel mode driver. So unless you think there is huge benefit at stake here, I'm afraid that ReadFile/WriteFile are the best you can do.

After how many seconds are file system write buffers typically flushed?

Before overwriting data in a file, I would like to be pretty sure the old data is stored on disk. It's potentially a very big file (multiple GB), so in-place updates are needed. Usually writes will be 2 MB or larger (my plan is to use a block size of 4 KB).
Instead of (or in addition to) calling fsync(), I would like to retain (not overwrite) old data on disk until the file system has written the new data. The main reasons why I don't want to rely on fsync() is: most hard disks lie to you about doing an fsync.
So what I'm looking for is what is the typical maximum delay for a file system, operating system (for example Windows), hard drive until data is written to disk, without using fsync or similar methods. I would like to have real-world numbers if possible. I'm not looking for advice to use fsync.
I know there is no 100% reliable way to do it, but I would like to better understand how operating systems and file systems work in this regard.
What I found so far is: 30 seconds is / was the default for /proc/sys/vm/dirty_expire_centiseconds. Then "dirty pages are flushed (written) to disk ... (when) too much time has elapsed since a page has stayed dirty" (but there I couldn't find the default time). So for Linux, 40 seconds seems to be on the safe side. But is this true for all file systems / disks? What about Windows, Android, and so on? I would like to get an answer that applies to all common operating systems / file system / disk types, including Windows, Android, regular hard disks, SSDs, and so on.
Let me restate this your problem in only slightly-uncharitable terms: You're trying to control the behavior of a physical device which its driver in the operating system cannot control. What you're trying to do seems impossible, if what you want is an actual guarantee, rather than a pretty good guess. If all you want is a pretty good guess, fine, but beware of this and document accordingly.
You might be able to solve this with the right device driver. The SCSI protocol, for example, has a Force Unit Access (FUA) bit in its READ and WRITE commands that instructs the device to bypass any internal cache. Even if the data were originally written buffered, reading unbuffered should be able to verify that it was actually there.
The only way to reliably make sure that data has been synced is to use the OS specific syncing mechanism, and as per PostgreSQL's Reliability Docs.
When the operating system sends a write request to the storage
hardware, there is little it can do to make sure the data has arrived
at a truly non-volatile storage area. Rather, it is the
administrator's responsibility to make certain that all storage
components ensure data integrity.
So no, there are no truly portable solutions, but it is possible (but hard) to write portable wrappers and deploy a reliable solution.
First of all thanks for the information that hard disks lie about flushing data, that was new to me.
Now to your problem: you want to be sure that all data that you write has been written to the disk (lowest level). You are saying that there are two parts which need to be controlled: the time when the OS writes to the hard drive and the time when the hard drive writes to the disk.
Your only solution is to use a fuzzy logic timer to estimate when the data will be written.
In my opinion this is the wrong way. You have control about when the OS is writing to the hard drive, so use the possibility and control it! Then only the lying hard drive is your problem. This problem can't be solved reliably. I think, you should tell the user/admin that he must take care when choosing the right hard drive. Of course it might be a good idea to implement the additional timer you proposed.
I believe, it's up to you to start a row of tests with different hard drives and Brad Fitzgerald's tool to get a good estimation of when hard drives will have written all data. But of course - if the hard drive wants to lie, you can never be sure that the data really has been written to the disk.
There are a lot of caches involved in giving users a responsive system.
There is cpu cache, kernel/filesystem memory cache, disk drive memory cache, etc. What you are asking is how long does it take to flush all the caches?
Or, another way to look at it is, what happens if the disk drive goes bad? All the flushing is not going to guarantee a successful read or write operation.
Disk drives do go bad eventually. The solution you are looking for is how can you have a redundant cpu/disk drive system such that the system survives a component failure and still keeps working.
You could improve the likelihood that system will keep working with aid of hardware such as RAID arrays and other high availability configurations.
As far software solution goes, I think the answer is, trust the OS to do the optimal thing. Most of them flush buffers out routinely.
This is an old question but still relevant in 2019. For Windows, the answer appears to be "at least after every one second" based on this:
To ensure that the right amount of flushing occurs, the cache manager spawns a process every second called a lazy writer. The lazy writer process queues one-eighth of the pages that have not been flushed recently to be written to disk. It constantly reevaluates the amount of data being flushed for optimal system performance, and if more data needs to be written it queues more data.
To be clear, the above says the lazy writer is spawned after every second, which is not the same as writing out data every second, but it's the best I can find so far in my own search for an answer to a similar question (in my case, I have an Android apps which lazy-writes data back to disk and I noticed some data loss when using an interval of 3 seconds, so I am going to reduce it to 1 second and see if that helps...it may hurt performance but losing data kills performance a whole lot more if you consider the hours it takes to recover it).

Fast 'C' library to tranparently manage very large files

I need to save very large amounts of data (>500GB) which is being streamed (800Mb/s) from another device connected to my PC. The speed rules out use of a database e.g. MySQl/ISAM and I am looking for a fast, light library which sits on top of the 'C' stdio file lib (i.e. fopen/fclose/fwrite) which will allow me to write/read a very large file (up to available disk-space).
Behind-the-scenes, the large file can be broken up into smaller files e.g. 1GB and I want the API to take care of these details.
The data arrives at the PC in a compressed binary format and no further processing is needed before writing it to the hard-disk.
The library should be work for Windows and Linux.
if you need random access into the data, take a look at memory mapped files.
It lets you map a file (or a section of a file) into memeory transparently, without having to explicitly allocate memeory and read data. It works on windows/Linux (there is a boost lib that wraps the differences).
On Windows you can handle files >>4gb on a 32bit os by using multiple windows into the file.
edit: Sorry 800Mb/s !! I don't know any disks that can cope with that. You migth be lookign at a raid array of SSD drives.
There used to be image capture cards that used an attached drive as a simple series of bytes with no filesystem to get very high speed sustained writes. I don't know if you are going to need somethign like that.
For ultimate speed, I suggest you go highly platform specific.
The objective is to get as close as you can to connecting the input device directly to hard drive. One method is to write a driver for the input device that writes directly to the hard drive.
The generic algorithm is to use either a very large circular byte buffer or use multiple buffers. You need extra space to compensate for the speed difference between the input device and the output device; provided the input device is non-stop.
If you can pause the input device, the issue becomes easier.

Resources