My full text search engine store indexed data on NFS store.
Due to the frequently read/write ioes,I want to preallocate huge continuous disk space for each table file and so resort to posix_fallocate.
On an NFS volume,My little demo failed with "EOPNOTSUPP" responsed to posix_fallocate.
Does NFS protocal/specification include the posix_fallocate scenario?
(While the title of the question mentions posix_fallocate the definition of posix_fallocate means EOPNOTSUPP is not an error code it will return. It seems highly likely the questioner was actually using fallocate on Linux as that CAN return EOPNOTSUPP)
NFS can support fallocate on Linux (http://wiki.linux-nfs.org/wiki/index.php/Fallocate ) but:
Your client and server have to be using NFS 4.2 or later.
An NFS 4.2+ server doesn't HAVE to implement fallocate - it's optional (see the posting Re: [PATCH 4/4] Remove broken posix_fallocate, posix_falllocate64 fallback code [BZ#15661] in a libc-alpha mailing list thread]).
Re posix_fallocate, you can never know whether a Linux glibc posix_fallocate call was real or emulated without careful observation or manual checking beforehand (e.g. by making a platform native fallocate call and checking if it failed or wasn't supported) and if you've done the native call why did you need the posix_fallocate call?. Further, different platforms have a different call for a native fallocate (or completely lack it) so if portability to non-Linux platforms is a concern you will need to write the appropriate native fallocate wrappers for each platform. If you have to do pre-allocation even when what's below you doesn't have a supported call for it, you will have to fall back to doing it by hand (e.g. via nice chunky writes or hacks like glibc's posix_fallocate...).
Note: When it is possible to do a real pre-allocation (e.g. via fallocate) it can be DRAMATICALLY faster than having to do full writes because the filesystem can fulfill it by just setting appropriate metadata.
Related
I'm working on a LKM which needs to retrieve and write a certain set of information to files. I looked up common ways to do so, but could not find a working one for Linux 4.x. I also found out that it is possible to retrieve system calls from memory and effectively call them.
As I found currently no better way I'd be interested if it'd be feasible to find the system call table, and call open, read/write and close this way.
This is strongly discouraged in most situations.
https://www.linuxjournal.com/article/8110 was a really good read for me the first time I thought I had to do this as well.
From within the Linux kernel, however, reading data out of a file for configuration information is considered to be forbidden. This is due to a vast array of different problems that could result if a developer tries to do this.
Indeed this is possible to do using system calls from within the kernel, but the practice of calling system calls from within the kernel is also generally discouraged. They're designed as interfaces for userspace applications to ask things of the kernel, not for the kernel to get itself to do work.
What kind of files do you want to manipulate from within the kernel? If the kind of file you'd like to manipulate is provided by the proc filesystem or the sysfs filesystem or the dev filesystem, you can modify the contents of the file from within the kernel (since the kernel provides these to userspace itself) -- this should be done NOT with file manipulation calls. If it's a normal userspace file, almost never do you want the kernel to be able to modify it.
If you provide more specifics I'd be interested to hear them, but this is usually a bad idea.
I want to get information about the battery in C on linux. I don't want to read or parse any file! Is there any low-level interface to acpi/the kernel or any other module to get the information I want to have?
I already searched the web, but every question results in the answer "parse /proc/foo/bar". I really don't want to do this because I think, low-level interfaces won't change as fast as Files do.
best regards.
The /proc filesystem does not exist on a disk. Instead, the kernel creates it in memory. They are generated on-demand by the kernel when accessed. As such, your concerns are invalid -- the /proc files will change as quickly as the kernel becomes aware of changes.
Check this for more info about /proc file system.
In any case, I don't believe there's any alternative interface.
You might be looking for UPower: http://upower.freedesktop.org/
This is a common need for both desktop environments and mobile devices, so there have been many solutions over time. For example, one of the oldest ones was acpid, which is pretty much obsolete now.
While I'd recommend using a light-weight abstraction like UPower for code clarity reasons, the files in /proc and (to some extent) /sys are considered part of the Linux kernel ABI, which means that changing them is generally frowned upon.
I'm implementing a pthread replacement library which is extremely lightweight. There are several reasons why I want to disable __thread completely.
It's a waste of memory. If I'm creating a thousand threads which has nothing to do with the context that declares a variable with __thread they will still allocate the program will still have allocated 1000*the size of that data bytes and never use it. It's simply not memory compatible with the mass-concurrency model. If we need extremely lightweight fibers with only 8K of stack, a TLS block of just 4K would be an overhead of 50% of the memory used by each thread. In some scenarios the TLS overhead would be enormous.
TLS is a complex standard and I simply don't have the time/resources to support it. It's to expensive. Personally I think the standard is poorly designed. It should have defined standard functions that had to be provided by the linker so thread libraries can take control over where TLS allocation takes place and insert relevant offsets and addresses it requires. Also the standard ELF implementation has been infected with pthread, expecting pthread sized structs to calculate offsets making it really hard to adapt to something else.
It's just a bad pattern. It encourages using globals and creating functions with static functions/side effects. This is not a territory we want to be in if we're creating correct programs that are easy to analyze.
If we really need "thread context" for some magic that tracks thread state behind the scenes (like for allocation or cancellation tracking) why not just expose the magic that TLS uses to understand that context in the first place? Personally I'm just going to use the %fs register directly. This would not be possible in libraries for obvious reasons but why should they be thread aware to begin with? Why not just design them correctly so they get the context related data they need in the first place right in the argument list?
My question is simply: What is the simplest way to disable __thread support and make clang emit errors if you accidentally used it? How can I get errors if I load a dynamic library which happens to require TLS?
I believe the simplest way is adding something like this unconditionally to your CFLAGS (perhaps from clang's equivalent of the gcc specfile if you want it to be system-global):
-D__thread='^-^'
where the righthand side can be anything that's syntactically invalid (a constraint violation) at any point in a C program.
As for preventing loading of libraries with TLS, you'd have to patch the linker and/or dynamic linker to reject them. If you're just talking about dlopen, your program could first read the file and parse the ELF headers for TLS relocations, then reject the library (without passing it to dlopen) if it has any. This might even be possible with an LD_PRELOAD wrapper.
I agree with you that, especially in its current implementation, TLS is something whose use should generally be avoided, but may I ask if you've measured the costs? I think stamping it out completely will be fairly difficult on a system that's designed to use it, and there's much lower-hanging fruit for cutting bloat. Which libc are you using? If it's glibc, I'm pretty sure glibc has lots of TLS it uses internally itself these days... Of course if you're writing your own threads implementation, that's going to entail a lot of interaction with the rest of the standard library, so perhaps you're already patching it out...?
By the way (shameless plug follows), we have an extremely light-weight threads implementation in musl libc which presently does not have TLS. I don't think it would be easy to integrate with another libc (and I'm sure if you're writing your own you will find it difficult to integrate with glibc, especially glibc's dynamic linker, which expects TLS to be supported) but if you can use the whole library as-is, it may meet your needs for specific projects, or have useful code you can borrow (license is MIT).
Disclaimer: This is probably a research question as I cannot find what I am looking for, and it is rather specific.
Problem: I have a custom search application that needs to read between 100K and 10M files that are between 0.01MB to about 10.0MB each. Each file contains one array that could be directly loaded as an array via mmap. I am looking for a solution to prefetch files into RAM before they are needed and if the system memory is full, eject ones that were already processed.
I know this sounds a lot like a combination of OS memory management and something like memcached. What I am actually looking for is something like memcached that doesn't return strings or values for a key, but rather the address for the start of a chosen array. In addition, (this is a different topic) I would like to be able to have the shared memory managed such that the distance between the CPU core and the RAM is the shortest on NUMA machines.
My question is: "does a tool/library like this already exist?"
Your question is related to this one
I'm not sure you need to find a library. You just need to understand how to efficiently use system calls.
I believe the readahead system call could help you.
Indeed you have many many files (and perhaps too much of them). I hope that your filesystem is good enough, or that they are in many directories. Having millions of files may become a concern if not tuned appropriately (but I won't dare help on this).
I don't know if it is your application who writes & reads that many files. Perhaps you might consider switching to a fast DBMS like PostGresQL or MySQL, or perhaps you could use GDBM.
I have once done this for a search-engine kind of application. It used an LRU chain, which was also addressable (via a hash table) by file-id, and memory-address IIRC. On every access, the hot items were repositioned to the head of the LRU chain. When memory got tight (mmap can fail ...) the tail of the LRU-chain was unmapped.
The pitfall of this scheme is that the program can get blocked on pagefaults. And since it was single threaded, it was really blocked. Altering this to a multithreaded architecture would involve protecting the hash and LRU structures by locks and semaphores.
After that, I realised that I was doing double buffering: the OS itself has a perfect LRU diskbuffer mechanism, which is probably smarter then mine. Just open()ing or mmap()ing every single file on every request is only one sytemcall away, and (given recent activity) just as fast, or even faster than the buffering layer.
wrt DBMS: using a DBMS is a clean design, but you have the overhead of minimal 3 systemcalls just to get the first block of data. And it will certainly (always) block. But it lends itself reasonably for a multi-threaded design, and relieves you from the pain of locks and buffer management.
I am wondering how the OS is reading/writing to the hard drive.
I would like as an exercise to implement a simple filesystem with no directories that can read and write files.
Where do I start?
Will C/C++ do the trick or do I have to go with a more low level approach?
Is it too much for one person to handle?
Take a look at FUSE: http://fuse.sourceforge.net/
This will allow you to write a filesystem without having to actually write a device driver. From there, I'd start with a single file. Basically create a file that's (for example) 100MB in length, then write your routines to read and write from that file.
Once you're happy with the results, then you can look into writing a device driver, and making your driver run against a physical disk.
The nice thing is you can use almost any language with FUSE, not just C/C++.
I found it quite easy to understand a simple filesystem while using the fat filesystem on the avr microcontroller.
http://elm-chan.org/fsw/ff/00index_e.html
Take look at the code you will figure out how fat works.
For learning the ideas of a file system it's not really necessary to use a disk i think. Just create an array of 512 byte byte-arrays. Just imagine this a your Harddisk an start to experiment a bit.
Also you may want to hava a look at some of the standard OS textbooks like http://codex.cs.yale.edu/avi/os-book/OS8/os8c/index.html
The answer to your first question, is that besides Fuse as someone else told you, you can also use Dokan that does the same for Windows, and from there is just a question of doing Reads and Writes to a physical partition (http://msdn.microsoft.com/en-us/library/aa363858%28v=vs.85%29.aspx (read particularly the section on Physical Disks and Volumes)).
Of course that in Linux or Unix besides using something like Fuse you only have to issue, a read or write call to the wanted device in /dev/xxx (if you are root), and in these terms the Unices are more friendly or more insecure depending on your point of view.
From there try to implement a simple filesystem like Fat, or something more exoteric like an tar filesystem, or even some simple filesystem based on Unix concepts like UFS or Minux, or just something that only logs the calls that are made and their arguments to a log file (and this will help you understand, the calls that are made to the filesystem driver during the regular use of your computer).
Now your second question (that is much more simple to answer), yes C/C++ will do the trick, since they are the lingua franca of system development, also a lot of your example code will be in C/C++ so you will at least read C/C++ in your development.
Now for your third question, yes, this is doable by one person, for example the ext filesystem (widely known in Linux world by it's successors as ext2 or ext3) was made by a single developer Theodore Ts'o, so don't think that these things aren't doable by a single person.
Now the final notes, remember that a real filesystem interacts with a lot of other subsystems in a regular kernel, for example, if you have a laptop and hibernate it the filesystem has to flush all changes made to the open files, if you have a pagefile on the partition or even if the pagefile has it's own filesystem, that will affect your filesystem, particularly the block sizes, since they will tend to be equal or powers of the page block size, because it's easy to just place a block from the filesystem on memory that by coincidence is equal to the page size (because that's just one transfer).
And also, security, since you will want to control the users and what files they read/write and that usually means that before opening a file, you will have to know what user is logged on, and what permissions he has for that file. And obviously without filesystem, users can't run any program or interact with the machine. Modern filesystem layers, also interact with the network subsystem due to the fact that there are network and distributed filesystems.
So if you want to go and learn about doing kernel filesystems, those are some of the things you will have to worry about (besides knowing a VFS interface)
P.S.: If you want to make Unix permissions work on Windows, you can use something like what MS uses for NFS on the server versions of windows (http://support.microsoft.com/kb/262965)