Robust file system for embedded use - filesystems

I'm looking for realization of embbeded file system to store pieces of data in nand flash memory addressable by name. Target processor is ColdFire v2 running uC/OS-II, that's why huge FlashFX from datalight.com doesn't fit.
Such realization must be robust to occasional power off and erroneous of nand storage.
Thank you

If you haven't looked at uC/FS with journaling enabled, then I would start there.
http://micrium.com/page/products/rtos/fs
journaling file systems are supposed to maintain the integrity of the FS during system failures.
http://en.wikipedia.org/wiki/Journaling_file_system

Related

What file system to use for an embedded linux with a eMMC NAND Flash

I'm in charge of choosing a file system for an embedded Linux device.
The device is a Freescale iMX6 running with a eMMC NAND flash memory and a kernel v3.10.17.
I plan to partition the Flash as decribed below:
Partition #1: kernel - not mounted
Partition #2: rootfs - mounted at "/" in read-only mode
Partition #3: userdata - mounted at "/home" in read-write mode
"/var" and "/tmp" directories will be mounted as tmpfs.
In some previous embedded linux projects, I used to use UBIFS on NAND flashes that were not eMMC NAND flash.
Since eMMC NAND flashes include a wear leveling feature, UBIFS should not be used with them as UBIFS' wear leveling feature may interfere with the one used by the eMMC NAND flashes.
I was planning to use ext2 or ext3 for the Partition #2 (rootfs) and ext3 for the Partition #3. I was wondering if ext3 is robust enough so my data won't get corrupted easily after a power failure of a hardreset reboot.
Does anyone have a strong backgroung with all of this and could help me to figure out what file system would be the best ?
Thanks.
I use ext4 file-system on an eMMC device that contains user data in read/write mode on an embedded-linux system.
The system shuts down by hard-reset several times a day for months now. have not witnessed problems with data consistency yet.
cramfs and squashfs are popular for read-only embedded filesystems, because they are highly compressed in storage.
For read-write filesystems, the "normal" ones you might find on a standard Linux desktop install work well (ext3, ext4, etc.). Read about them and pick one that has a balance of overhead and error-correction, depending on what you need for your device.
For the most part the popularity of these filesystems is independent of the hardware you're using as storage -- drivers are used to actually write to the hardware; the filesystem is an abstraction layer above this.
Your comment about ubifs being inappropriate since the driver already does wear-levelling sounds correct to me. UBIFS is weird in that way. Other filesystems are pretty storage-agnostic.

Which operating systems, and how, can pin pages in a database buffer pool?

Most relational database construction textbooks talk about the concept of being able to pin a page, i.e. prevent the operating system from swapping it out of memory. The concept is so that the database software can use it's own buffer replacement algorithm, which might be a better fit than whatever the OS virtual memory policy provides.
It is unclear to me whether typical desktop operating systems actually provide the programmer with the capability to pin pages. The best I can find on OS X, for example, refers to wired pages, but these seem to be only usable by the superuser.
Is the concept of pinning pages, and of defining appropriate buffer replacement strategies that supersede that of the OS, only of theoretical interest and not really implemented by real relational database systems? Or is it the case that typical desktop OS'es (Linux, Windows, OS X) do include hooks for pinning, and typical relational DB software (Oracle, SQL Server, PostgreSQL, MySQL, etc) uses them?
In PostgreSQL, the database server copies the pages from the file (or from the OS, really) into a shared memory segment which PostgreSQL controls. The OS doesn't know what the mapping is between the file system blocks and the shared memory blocks, so the OS couldn't write those pages back out to their disk locations even if it wanted to, until PostgreSQL tells it to do so by issuing a seek and a write.
The OS could decide to swap parts of shared memory out to disk into a swap partition (for example, if it were under severe memory stress), but it can't write them back to their native location on disk since it doesn't know what that location is.
There are ways to tell the OS not to page out certain parts of memory, such as shmctl(shmid,SHM_LOCK,NULL). But these are mostly intended for security purposes, not performance purposes. For example, you use it to prevent very sensitive information (like the decrypted copy of a private key) from accidentally getting written to swap partitions, from which it might be recovered by the bad guys.
#jjanes is correct to say that the OS can't really write out Pg's shared memory buffer, and can't control what PostgreSQL reads into it, so it doesn't make sense to "pin" it. But that's only half the story.
PostgreSQL does not offer any feature for pinning pages from tables in its shared memory segment. It could do so, and it might arguably be useful, but nobody has implemented it. In most cases the buffer replacement algorithm does a pretty good job by its self.
Partly this is because PostgreSQL relies heavily on the operating system's buffer caches, rather than trying to implement its own. Data might be evicted from shared_buffers, but it's usually still cached in the OS. It's not unreasonable to think of shared_buffers as a first-level cache, and the OS disk cache as the second-level cache.
The features available to control what's kept in the operating system's disk cache are whatever the OS provides. In general, that's not much, because again modern OSes tend to do a better job if you leave them alone and let them manage things themselves.
The idea of manual buffer management, etc, is IMO largely a relic of times when systems had simpler and less effective algorithms for managing caches and buffers automatically.
The main time that automation falls down is if you have something that's used only intermittently, but you want to ensure is available with extremely good response times when it is used; i.e. you wish to degrade the overall system's throughput to make one part of it more responsive. PostgreSQL doesn't offer much control over that; most people simply ensure that they have something regularly querying the data of interest to keep it warm in the cache.
You could write a relatively simple extension to mmap() a file and mlock() its range, but it'd be pretty wasteful and you'd have to fiddle with the default OS limits designed to stop you from locking too much memory.
(FWIW, I think Oracle offers quite a bit of control over pinning relations, indexes, etc, in tune with its "manually control everything whether you want to or not" philosophy, and it bypasses much of the operating system in the process.)
Speaking for SQL Server (on Windows, obviously), there's an OS setting that allows the SQL engine to ignore requests from the OS in response to memory pressure. That setting is called Lock Pages in Memory (LPIM). That permissions is granted on a per-account basis and needs to be granted to the account running your SQL service when the service is started.
Keep in mind that this isn't always a good idea. For example, in a virtualized environment, the hypervisor communicates its memory needs via a balloon driver process in the guest. If the hypervisor needs more memory, it inflates the memory needs of the balloon in the guest. If your SQL process has LPIM turned on, it won't respond and the hypervisor can start flagging as a result. And if the hypervisor isn't happy, ain't nobody happy.

How does a file-system block gets translated to lba?

I understand a file-system can choose the size of blocks it uses on the disk.
On the other hand i understand that the disk is divided into LBA's.
The LBA is an address of a sector on the disk.
So whats the connection between the block used by the file system and the disk sectors (lba)?
Is there some kind of translation from a fs block and lba?
Is it different from fs to fs?
where can i read more about this?
thanks
Yes. File system usually sees a a continuous logical space without knowledge of the spindles underneath, thus it doesn't know disk LBA either. The translation work is usually done in a layer called volume, which is to hide the disk detail and present the file system a logically continuous space. For example, in Linux there's LVM (Logical Volume Manager) playing such roles.
The volume exposed to fs might not be disks. It could be constructed upon other volumes, thus sometimes come up with a very large disk.
The volume could also provide the functionality of RAID, which put several disks together that could relieve you from disk failure in some extent at the expense of performance and space efficiency.
Some file systems can manage disks directly and operate on raw disks, thus no layer of volume. As far as I know, NETAPP's WAFL is doing in that way.

reading and writing hard disk directly in win32api like the biosdisk or absread in ms-dos

I had been playing with the disk drives 12bit FAT (FAT12) and 16bit FAT(FAT16) in C language (Turbo C) which runs under the 16 bit OS MS-DOS.
I was able to manipulate sectors directly.
FAT32 was little complicated because the sectors are stored like linked list unlike other FAT lower than FAT32.
I want to read write hard disks, USB Disks directly using 32 bit C language (win32 api).
I saw some code and it were using /device/ to access a disk where as in biosdisk the disks were numbered from 0 onwards i think. i was manipulating like heads, sectors, cylinders ...
Please advice on how to read write hard disks directly sector by sector or how to read write hdd in low level.
do i have to go for assembly language?
EDIT
one scenario why i need to directly manipulate the hard disk is i want to write a file maintaining my own FAT even hiding it from the FileSystem but marking those sectors as used. So it is just hiding a file from the other users, the operating system and even me except the program i write which can only access those files. this is just one point and the others would be just playing around. :)
If you use WinAPI, then you open raw disk device using CreateFile() API (see Physical Disks and Volumes section there) and then use ReadFile() and WriteFile() methods to read and write data.
Note, however, that recent versions of Windows (Vista, Windows 7) restrict your access even if you are an administrator. Our RawDisk product lets you bypass these restrictions. Free non-commercial licenses are available for RawDisk.

Accessing same resource across restarts in Windows

I will write some thing in a file/memory just before system shutdown or a service shutdown. In the next restart of system, Is it possible to access same file or same memory on the disk, before filesystem loads? Actual requirement is like this, we have a driver that sits between volume level drivers and filesystem driver...in that part of the driver code, I want to access some memory or file.
Thanks & Regards,
calvin
The logical thing here is to read/write this into the registry if it is not too big. Is there a reason you do not want to use the registry?
If you need to access large data and you are writing a volume or device filter and cannot rely on ZwOpen/Read/Write/Close functions in the kernel an approach would be to create the file in user mode, get its device name and cluster chain and store them in the registry. On the next boot, you can get the device and clusters from registry, and do direct I/O on them.
Since you want to access this before the filesystem loads, my first thought is to allocate and use a block of storage space on the hard drive outside of the filesystem. You can create a hidden mini-partition on the drive and use low-level I/O commands to read and write your data.
This is a common task in the world of embedded systems, and we often implement it by adding some sort of non-volatile memory device into the system (flash, battery-backed DRAM, etc) and reading and writing to that device. Since you likely don't have the same level of control over the available hardware as embedded developers do, the closest analogue I can think of would be to reserve a chunk of space on a physical disk that you can read from without having to mount as a filesystem. A dedicated mini-partition might work the best because if you know the size of it, you can treat it as one big raw-access buffer and can avoid having to hassle with filenames, filesystems, etc.

Resources