Simulating file system access

Simulating file system access - c

I am designing a file system in user space and need to test it. I do not want to use the available benchmarking tools as my requirements are different. So to test the file system I wish to simulate file access operation. To do this, I first use the ftw() function to walk through one f my existing file system(experimental) and list all the files and directories in a file.
Then I invoke a simulator to simulate file access by a number of processes. Thus, the simulator randomly starts a process i.e it forks a thread which does what a real process would have done. The thread randomly selects a file operation (read, write, rename etc) selects arguments to this operation from the list(generated by ftw()) . The thread does a number of such file operations and then exits marking the end of a process. The simulator continues to spawn threads; thread execution can overlap just as real processes do. Now, as operations are performed by threads, files get inserted, deleted, renamed and this is updated in the list of files.
I have not yet started coding. Does the plan seem sane? I am also not sure how to code the simulator...how will it spawn threads over a period of time. Should I be using some random delay to do this.
Thanks

Yep, that seems fairly reasonable to me. I would consider attempting to impose a statistical distribution over your file operations (and accesses to particular files) that is somehow matched to your expected workload. You might be able to find some statistics about typical filesystem workloads as a starting point.

That sounds about right for a decent test case just to make sure it's working. You could use sleep() to wait between spawning threads or just spawn them all at once and have them do an operation then wait a bit, then do another operation, etc... IMO if you hit it hard with a lot of requests and it works then there's a likely chance your filesystem will do just fine. Take an example from PostMark which all it does is append like crazy to different files and other benchmarks that do random access reads/writes in different locations to make sure that the page has to be read from disk.

Related

How fio benchmark tool performs sequential disk reads?

I use fio to test read/write bandwidth of my disks.
Even for the sequential read test, I can let it run the multiple threads.
What does it mean by running multiple threads on sequential read test?
Does it perform multiple sequential reads? (each thread is assigned a file offset to start the sequential scanning from)
Do the multiple threads share a file offset? (Each thread invokes sequential reads using a single file offset that is shared by the multiple threads)
I tried to read the open source codes of fio, but I couldn't really figure it out.
Can any one give me an idea?

Sadly you didn't include a jobfile with your question and didn't say what platform you're running on. Here's a stab at answers:
Yes it does multiple sequential reads though wouldn't it have to do this even with a single thread?
No each thread has its own offset but unless you use offset and size they will all work inside the same "region".
On Linux fio actually defaults to using separate processes per job and each process has its own file descriptor (for ioengines that use files) for each file used. Further, some ioengines (e.g. libaio, pvsync but there are many others) use syscalls that take the offset you want to do the I/O at with the request itself so even if they do share a descriptor their offset is not impacted by others using the same descriptor.
There may be problems if you use the sync ioengine, ask fio to use threads rather than processes and have those threads work on the same file. That ioengine has to use lseek prior to doing its I/O so perhaps there's a chance for another thread's lseek to sneak in before the I/O is submitted. Note that the sync I/O engine is not the default one used with recent fio versions.
Perhaps the fio mailing list can say more?

which is better way to edit RLIMIT_NPROC value

My application creates per connection thread . Application is ruinng under the non-zero user id and Sometimes number of threads surpasses default value 1024 . I want to edit this number so I have few options
run as root [very bad idea and also have to compromise with securty ,so dropping it]
run under underprivilaged user use setcap and give capability CAP_SYS_RESOURCE . then I can add code im my program
struct rlimit rlp; /* will initilize this later with values of nprocs(maximum number of desired threads)*/
setrlimit(RLIMIT_NPROC, &rlp);
/*RLIMIT_NPROC
*The maximum number of processes (or, more precisely on Linux, threads) that can
* created for the real user ID of the
*calling process. Upon encountering this limit, fork(2) fails with the error
*EAGAIN. */
Other thing is editing /etc/securitylimits.conf where simply I can make entry for the development user and can put lines e.g.
#devuser hard nproc 20000
#devuser soft nproc 10000
where 10k is enough .So being litle reluctant in chaning source code should I proceed with last option . And I am more curios to know what is more robust and standars approach.
seeking your opinions , and thank you in advance :)
PS: What will happen if a single process will be served with more than 1k threads . ofcource i have 32GB of Ram also

First, I believe you are wrong in having nearly a thousand threads. Threads are quite costly, and it is usually not reasonable to have so much of them. I would suggest having a few dozen threads at most (unless you run on a very costly super-computer).
You could have some event loop around a multiplexing syscall like poll(2). Then a single thread can deal with many thousands of connections. Read about the C10K problem and epoll. Consider using some event libraries like libevent or libev etc...
You could start your application as root (perhaps by using setuid techniques), set-up the required resources (in particular, opening privileged TCP/IP ports), and change the user with setreuid(2)
Read Advanced Linux Programming...
You could also wrap your application around a tiny setuid C program which increase the limits using setrlimit(2), change the user with setreuid, and at last execve(2) your real program.

User CPU time of specific child process after first output to stdout

I'm working on a program which may spawn multiple child processes, and I need to get precise information about the CPU time used by each child process, even if there are several child processes running simultaneously. I'm doing this using wait4(2) on a separate thread of the parent process, which works quite well.
However, this approach provides the total time spent by a specific child process, and I'm only interested in the amount of time spent after a particular event, namely the child process' first output to stdout. I've looked into other ways of getting the CPU time of child processes, such as getrusage(2) and times(3), but these don't seem to be able to distinguish between multiple child processes' times, and instead provide the sum of all child processes' times.
I'm working on a text editor application that lets users run scripts and code in a variety of different languages, and the app has a built-in code timing feature. The app relies on bash scripts to run the users code, and the first thing my bash scripts do are to output a start-of-heading byte (0x02). After this the bash script does whatever it needs to do to run the users code, and that is the thing I want to time. Bash may do a bit of initialization (to set up PATH variables etc) which may take 30 or 40 ms to complete, and I don't want that initialization to be timed along with the rest. If the users code is for instance a simple Hello World type program in C, the timing feature might display something like 41 ms instead of the actual 1 ms which it took to run their code.
Any ideas on how this might be done?
Thanks :)

A couple of possible solutions come to mind. They don't get CPU time after first output exactly, but they may avoid the problem you're dealing with.
The first is to get rid of the bash scripts and just do the equivalent work in your program before running the user's code (between fork() and exec(), for example). That way the child process' CPU time from wait4() doesn't include your extra setup.
Another possibility is to write a simple application that does nothing but run the user's application and report its CPU time back to your main application. That runner application can then be called from your scripts to run the user's program, rather than calling the user's program directly. The runner application might itself use fork()/exec()/wait4() to run the user's program, and could report the information from wait4() to your main program through any of a variety of means such as a named pipe, message queue, socket, or even just writing the information to a file your main program can open afterward. That way your bash scripts can do work both before and after running the user's program that won't be included in the CPU time reported by the runner application. You'd probaby want the runner to accept an argument like the name of a pipe or an output file in addition to the user's program's path and arguments so that you can control how the information is reported -- that way you could run more than one instance of the runner application and still keep the information they report separate.
If you do want to include the work done by the script, but not the time taken to load bash, then you could signal the main program by echoing something to a pipe from the bash script before and after the parts you want to time. The main program can then measure the time between the start and stop signals, which will at least get you wall-clock time (though not actual CPU time). Otherwise I'm not sure there's a way to perfectly measure the CPU time for just part of the script without using a modified bash (which I'd avoid if possible).

How to speed up consecutive program startup under Linux?

I've written two relatively small programs using C. Both of them comunnicate with each other using textual data. Program A generates some problems from given input, B evaluates them and creates input for another iteration of A.
Here's a bash script that I currently use:
for i in {1..1000}
do
./A data > data2;
./B data2 > data;
done
The problem is that since what A and B do is not very time consuming, most of the time is spent (as I suppose) in starting apps up. When I measure time the script runs I get:
$ time ./bash.sh
real 0m10.304s
user 0m4.010s
sys 0m0.113s
So my main question is: is there any way to communicate data beetwen those two apps faster? I don't want to integrate them into one application, because I'm trying to build a toolset with independent, easly communicating tools (as was suggested in "The Art of Unix Programming" from which I'm learning the way to write reusable software).
PS. The data and data2 files contain sets of data needed in whole at once by those applications (so communicating by for e.g. one line of data at time is impossible).
Thanks for any suggestions.
cheers,
kajman

Can you create named pipe ?
mkfifo data1
mkfifo data2
./A data1 > data2 &
./B data2 > data1
If your application is reading and writing in a loop, this could work :)

If you used pipes to transfer the stdout of program A to the stdin of program B you would remove the need to write the file "data2" each loop.
./A data1 | ./B > data1
Program B would need to have the capability of using input from stdin rather than a specified file.

If you want to make a program run faster, you need to understand what is making the program run slowly. The field of computer science dedicated to measuring the performance of a running program is called profiling.
Once you discover which internal portion of your program is running slow, you can generally speed it up. How you go about speeding up that item depends heavily on what "the slow part" is doing and how it is "being done".
Several people have recommended pipes for moving the data directly from the output of one program into the input of another program. Assuming you rewrite your tools to handle input and output in a piped manner, this might improve performance. Again, it depends on what you are doing and how you are doing it.
For example, if your tool just fixes windows style end-of-lines into unix style end-of-lines, the program might read in one line, waiting for it to be available, check the end-of-line and write out the line with the desired end-of-line. Or the tool might read in all of the data, do a replacement call on each "wrong" end-of-line in memory, and then write out all of the data. With the first solution, piping speeds things up. With the second solution piping doesn't speed up anything.
The reason is is truly so hard to answer such a question is because the fix you need really depends on the code you have, the problem you are trying to solve, and the means by which you are solving it now. In the end, there isn't always a 100% guarantee that the code can be sped up; however, virtually every piece of code has opportunities to be sped up. Use profiling to speed up the parts that are slow, instead of wasting your time working on a part of your program that is only called once, and represents 0.001% of the program's runtime.
Remember if you speed up something that is 0.001% of your program's runtime by 50%, you actually only sped up your entire program by 0.0005%. Use profiling to determine the block of code that's taking up 90% of your runtime and concentrate on it.

I do have to wonder why, if A and B depend on each other to run, do you want them to be part of an independent toolset.
One solution is a compromise between the two:
Create a library that contains A.
Create a library that contains B.
Create a program that spawns two threads, 1 containing A and 2 containing B.
Create a semaphore that tells A to run and another that tells B to run.
After the function that calls A in 1, increment B's semaphore.
After the function that calls B in 2, increment A's semaphore.
Another possibility is to use file locking in your programs:
Make both A and B execute in infinite loops (or however many times you're processing data)
Add code to attempt to lock both files at the beginning of the infinite loop in A and B (if not, sleep and try again so that you don't do anything until you have the lock).
Add code to unlock and sleep for longer than the sleep in step 2 at the end of each loop.
Either of these solve the problem of having the overhead of launching the program between runs.

It's almost certainly not application startup which is the bottleneck. Linux will end up caching large portions of your programs, which means that launching will progressively get faster (to a point) the more times you start your program.
You need to look elsewhere for your bottleneck.

Is there a posix-way to ensure two files are flushed in sequence without blocking?

In my program, I hold two files open for writing, a content-file, containing chunks of data, and an index-file, containing a map over which chunks of data has been written so far.
I would like to flush them both to disc, as performant as possible, with the only constraint that the blocks in the data-file must be written before the corresponding blocks in the map-file (naturally).
The catch is that I would like to avoid blocking I.E. doing an fsync, both for latency and throughput-reasons.
Any ideas?

I don't think you can do this easily in a single execution path. You need fsync to have the write to disk guaranteed - and this is going to have to wait for the write.
I suspect it is possible (but not easy) to do this by delegating the writing task to a separate thread or process. Generate the data in your existing program and 'write' it to the second thread/process using any method that looks sensible. This can be non-blocking. The second thread would then write any new data to the data to your content-file, then fsync, then write the index-file, then check for new data again. Key design decisions relate to how you separate the two execution paths, how you communicate between them, and if you need to report the write back to the main program. This could still have latency and throughput issues, but that's part of the cost of choosing to have the index-file and content-file in sync. At least there would be a chance of getting work done while waiting on the disk.
It could be worth looking to see if this is well encapsulated so as to be useful to you in the source of any of the transactional databases. You could also investigate the sync option when you mount the file system for the content-file.