Benchmark the performance of file systems

Benchmark the performance of file systems - c

I'm taking Computer Organization class this semester.
My professor give us a homework.the description is as follows:
Write a program to benchmark the two file systems, Windows and Linux. Discuss which performs better.
I want to use C to implement this homework assignment, but I have no idea how to start.
What functions do I need? What I can do?
Please give me some hints or example.

You could download the sources from bonie++ and look how they to this.
But i think the best way is that you write to your HDD and look how long it takes to write or read the defined data.

Some interesting data points just for file reads/writes:
cold cache vs hot cache,
single-thread? concurrent threads?
posix aio vs windows overlapped i/o (single thread? multiple threads?)
You might also measure speed of directory enumeration and traversal.
Keep in mind that both Linux and Windows support many filesystems; ext4 and NTFS are the most widely used for Linux and Windows respectively.
What functions should you use? For unix, there's the basic read(2) and write(2) calls (for normal, blocking IO). Windows has ReadFile and WriteFile.

Related

How to test a file system implementation?

I am going to implement a file system in C and i'm wondering how can i test it without installing it in the kernel nor using FUSE API. Ideally what i'd like to do is to use dd command to create a virtual hard drive and interact with it using linux system calls like write and read (the idea is to not write drivers). Is that posible?
(I'm sorry if i misspelled words, but eanglish isn't my first language. Also i'm sorry if this is off-topic, it's my first question)
Thanks.

If you are really implementing a file system, you can test it in virtual machine.
Otherwise, you can implement a file system in a file which exist in real file system, and implement some functions like read/write/etc...

Virtual hard drive and virtual filesystem are a bit different things, - you write different functions and handle different requests when implementing them. Given that you implement a filesystem, your best bet on linux is to expose your filesystem via FUSE for testing. Then write different tests that will access your FUSE-based filesystem to perform various tasks.
Unfortunately testing a filesystem is hard and requires writing many tests. Manual testing with different software (file managers) is also required.

Is there a way to get battery info (status, plugged in, etc) without reading a proc/sys file on linux?

I want to get information about the battery in C on linux. I don't want to read or parse any file! Is there any low-level interface to acpi/the kernel or any other module to get the information I want to have?
I already searched the web, but every question results in the answer "parse /proc/foo/bar". I really don't want to do this because I think, low-level interfaces won't change as fast as Files do.
best regards.

The /proc filesystem does not exist on a disk. Instead, the kernel creates it in memory. They are generated on-demand by the kernel when accessed. As such, your concerns are invalid -- the /proc files will change as quickly as the kernel becomes aware of changes.
Check this for more info about /proc file system.
In any case, I don't believe there's any alternative interface.

You might be looking for UPower: http://upower.freedesktop.org/
This is a common need for both desktop environments and mobile devices, so there have been many solutions over time. For example, one of the oldest ones was acpid, which is pretty much obsolete now.
While I'd recommend using a light-weight abstraction like UPower for code clarity reasons, the files in /proc and (to some extent) /sys are considered part of the Linux kernel ABI, which means that changing them is generally frowned upon.

How to determine if the application is already running? C portable Linux/Win

Is there a way to write a C code that allow us to determine if a previous instance of an application is already running? I need to check this in a portable way for Linux and Windows, both using the last version of GCC avaiable.
Any examples of portable code would be of enormous help. I see two options now:
Check process list. Here linux has good tools, but I don't think the same functions apply to windows. Maybe some gnu libraries for both SO? What libraries, or functions?
Save and lock a file. Now, how to do that in a way that both systems can understand? One problem is where to save the file? Path trees are different from each systems. Also, if a relative path is chosen, two applications can still run with different locked files in different directories.
Thanks!
Beco.
PS. The SO have different requisites, so if you know one and not another, please answer. After all, if there is no portable "single" way, I still may be able to use #ifdef and the codes proposed as answer.
C language (not c++), console application, gcc, linux and windows

Unfortunately, if you limit yourself to C, you may have difficulty doing this portably. With C++, there's boost interprocess's named_mutex, but on C, you will have to either:
UNIXes (including Mac OS): Open and flock a file somewhere. Traditionally you will also write your current PID into this file. NOTE: This may not be safe on NFS; but your options are extremely limited there anyway. On Linux you can use a /dev/shm path if you want to ensure it's local and safe to lock.
Windows: Open and lock a named mutex

for windows, a mutex works well.
http://msdn.microsoft.com/en-us/library/ms682411(v=vs.85).aspx
the article also mentions an alternative to a mutex....
To limit your application to one instance per user, create a locked file in the user's profile directory.

The sort of canonical method in Unixland is to have the process write its own PID to a file in a known location. If this file exists, then the program can check its own pid (available by system call) with the one in that file, and if it's unfamiliar you know that another process is running.

C does not give in-built facilities to check if an application is already running, so, making it cross platform is difficult/impossible. However, on Linux, one can use IPC. And, on Windows (I'm not very experienced in this category), you may find this helpful.

How to implement a very simple filesystem?

I am wondering how the OS is reading/writing to the hard drive.
I would like as an exercise to implement a simple filesystem with no directories that can read and write files.
Where do I start?
Will C/C++ do the trick or do I have to go with a more low level approach?
Is it too much for one person to handle?

Take a look at FUSE: http://fuse.sourceforge.net/
This will allow you to write a filesystem without having to actually write a device driver. From there, I'd start with a single file. Basically create a file that's (for example) 100MB in length, then write your routines to read and write from that file.
Once you're happy with the results, then you can look into writing a device driver, and making your driver run against a physical disk.
The nice thing is you can use almost any language with FUSE, not just C/C++.

I found it quite easy to understand a simple filesystem while using the fat filesystem on the avr microcontroller.
http://elm-chan.org/fsw/ff/00index_e.html
Take look at the code you will figure out how fat works.

For learning the ideas of a file system it's not really necessary to use a disk i think. Just create an array of 512 byte byte-arrays. Just imagine this a your Harddisk an start to experiment a bit.
Also you may want to hava a look at some of the standard OS textbooks like http://codex.cs.yale.edu/avi/os-book/OS8/os8c/index.html

The answer to your first question, is that besides Fuse as someone else told you, you can also use Dokan that does the same for Windows, and from there is just a question of doing Reads and Writes to a physical partition (http://msdn.microsoft.com/en-us/library/aa363858%28v=vs.85%29.aspx (read particularly the section on Physical Disks and Volumes)).
Of course that in Linux or Unix besides using something like Fuse you only have to issue, a read or write call to the wanted device in /dev/xxx (if you are root), and in these terms the Unices are more friendly or more insecure depending on your point of view.
From there try to implement a simple filesystem like Fat, or something more exoteric like an tar filesystem, or even some simple filesystem based on Unix concepts like UFS or Minux, or just something that only logs the calls that are made and their arguments to a log file (and this will help you understand, the calls that are made to the filesystem driver during the regular use of your computer).
Now your second question (that is much more simple to answer), yes C/C++ will do the trick, since they are the lingua franca of system development, also a lot of your example code will be in C/C++ so you will at least read C/C++ in your development.
Now for your third question, yes, this is doable by one person, for example the ext filesystem (widely known in Linux world by it's successors as ext2 or ext3) was made by a single developer Theodore Ts'o, so don't think that these things aren't doable by a single person.
Now the final notes, remember that a real filesystem interacts with a lot of other subsystems in a regular kernel, for example, if you have a laptop and hibernate it the filesystem has to flush all changes made to the open files, if you have a pagefile on the partition or even if the pagefile has it's own filesystem, that will affect your filesystem, particularly the block sizes, since they will tend to be equal or powers of the page block size, because it's easy to just place a block from the filesystem on memory that by coincidence is equal to the page size (because that's just one transfer).
And also, security, since you will want to control the users and what files they read/write and that usually means that before opening a file, you will have to know what user is logged on, and what permissions he has for that file. And obviously without filesystem, users can't run any program or interact with the machine. Modern filesystem layers, also interact with the network subsystem due to the fact that there are network and distributed filesystems.
So if you want to go and learn about doing kernel filesystems, those are some of the things you will have to worry about (besides knowing a VFS interface)
P.S.: If you want to make Unix permissions work on Windows, you can use something like what MS uses for NFS on the server versions of windows (http://support.microsoft.com/kb/262965)

Content for Linux Operating Systems Class

I will be TA for an operating systems class this upcoming semester. The labs will deal specifically with the Linux Kernel.
What concepts/components of the Linux kernel do you think are the most important to cover in the class?
What do you wish was covered in your studies that was left out?
Any suggestions regarding the Linux kernel or overall operating systems design would be much appreciated.

My list:
What an operating system's concerns are: Abstraction and extension of the physical machine and resource management.
How the build process works ie, how architecture specific/machine code stuff is implanted
How system calls work and how modules can link up
Memory management / Virtual Memory / Paging and all the rest
How processes are born, live and die in POSIX and other systems
userspace vs kernel threads and what the difference is between process/threads
Why the monolithic Kernel design is growing tiresome and what are the alternatives
Scheduling (and some of the alternative / domain specific schedulers)
I/O, Driver development and how they are dynamically loaded
The early stages of booting and what the kernel does to setup the environment
Problems with clocks, mmu-less systems etc
... I could go on ...
I almost forgot IPC and Unix 'eveything is a file' design decisions
POSIX, why it exists, why it shouldn't
In the end just get them to go through tanenbaum's modern operating systems and also do case studies on some other kernels like Mach/Hurd's microkernel setup and maybe some distributed and exokernel stuff.
Give a broad view past Linux too, I recon
For those who are super geeky, the history of operating systems and why they are the way they are.

The Virtual File System layer is an absolute must for any Linux Operating System class.
I took a similar class in college. The most frustrating but, at the same time, helpful project was writing a small file system for the Linux operating system. Getting this to work takes ~2-3 weeks for a group of 4 people and really teaches you the ins and outs of the Kernel.

I recently took an operating systems class, and I found the projects to be challenging, but essential in understanding the concepts in class. The projects were also fun, in that they involved us actually working with the Linux source code (version 2.6.12, or thereabouts).
Here's a list of some pretty good projects/concepts that I think should be covered in any operating systems class:
The difference between user space and kernel space
Process management (i.e. fork(), exec(), etc.)
Write a small shell that demonstrates knowledge of fork() and exec()
How system calls work, i.e. how do we switch from user to kernel mode
Add a simple system call to the Linux kernel, write a test application that calls the system call to demonstrate it works.
Synchronization in and out of the kernel
Implement synchronization primitives in user space
Understand how synchronization primitives work in kernel space
Understand how synchronization primitives differ between single-CPU architectures and SMP
Add a simple system call to the Linux kernel that demonstrates knowledge of how to use synchronization primitives in the Linux kernel (i.e. something that has to acquire, say, the tasklist lock, etc. but also make it something where you have to kmalloc, which can't be done while holding a lock (unless you GFP_ATOMIC, but you shouldn't, really))
Scheduling algorithms, and how scheduling takes place in the Linux kernel
Modify the Linux task scheduler by adding your own scheduling policy
What is paging? How does it work? Why do we have paging? How does it work in the Linux kernel?
Add a system call to the Linux kernel which, given an address, will tell you if that address is present or if it's been swapped out (or some other assignment involving paging).
File systems - what are they? Why do they exist? How do they work in the Linux kernel?
Disk scheduling algorithms - why do they exist? What are they?
Add a VFS to the Linux kernel

Well, I just finished my OS course this semester so I thought I'd chime in.
I was kind of upset that we didn't actually play around with the actual OS itself, rather we just did system programming. I'd recommend having the labs be on something that is in the OS itself, which is what it sounds like what you want to do.
One lab that I did enjoy and found useful however was writing our own malloc/free routines. It was difficult, but pretty entertaining as well.
Maybe also cover loading programs into memory and/or setting up the memory manager (such as paging).

For labs, one thing that may be cool is to show them actual code and discuss about it, ask questions about what do they think things are done that way and not another, etc.
If I were again in the University I would certainly appreciate more in depth lessons about synchronization primitives, concurrency and so on... those are hard matters that are more difficult to approach without proper guidance. I remember I went to a speech by Paul "Rusty" Russell about spinlocks and other synchronization primitives that was absolutely rad, maybe you could find it in youtube and borrow some ideas.

Another good topic (or possibly exercise for the students) would be looking at virtualisation. Especially Rusty Russel's "lguest" which is designed as a simple introduction to what is required to virtualise an operating system. The docs are good reading too.

I actually just took a class that perfectly fits your description (OS Design using linux) in the spring. I was actually very frustrated with it because I felt like the teacher focused too narrowly for the projects rather than give a broader understanding. For instance, our last project revolved around futexes. My partner and I barely learned what they were, got it working (kinda) and then turned it in. I came away with no general knowledge of anything really from that project. I wish one of the projects had been to write a simple device driver or something like that.
In other words, I think it's good to make sure a good broad overview is presented, with as much detail as you can afford, but ultimately broad. I felt like my teacher nitpicked these tiny areas and made us intensely focus on those, while in the end I did NOT come away with that great of a general understanding of the inner-workings of Linux.
Another thing I'd like to note is a lot of why I didn't retain knowledge from the class was lack of organization. Topics came out of nowhere any given week, and there was no roadmap. Give the material a logical flow. Mental organization is the key to retaining the knowledge.

The networking sub-system is also quite interesting. You could follow a packet as it goes from the socket system call to the wire and the other way around.
Fun assignments could be:
create a state-full firewall by using netfilter
create an HTTP load balancer
design and implement a simple tunneling protocol

Memory mapped I/O and the 1g/3g vs 2g/2g split between kernel address space and user addressable space in 32bit operating systems.
Limitations of 32 bit architecture on hard drive size and what this means for the design of file systems.
Actually just all the pros and cons of going to 64 bit, what it means and why as well as the history and why are aren't there yet.