I will be TA for an operating systems class this upcoming semester. The labs will deal specifically with the Linux Kernel.
What concepts/components of the Linux kernel do you think are the most important to cover in the class?
What do you wish was covered in your studies that was left out?
Any suggestions regarding the Linux kernel or overall operating systems design would be much appreciated.
My list:
What an operating system's concerns are: Abstraction and extension of the physical machine and resource management.
How the build process works ie, how architecture specific/machine code stuff is implanted
How system calls work and how modules can link up
Memory management / Virtual Memory / Paging and all the rest
How processes are born, live and die in POSIX and other systems
userspace vs kernel threads and what the difference is between process/threads
Why the monolithic Kernel design is growing tiresome and what are the alternatives
Scheduling (and some of the alternative / domain specific schedulers)
I/O, Driver development and how they are dynamically loaded
The early stages of booting and what the kernel does to setup the environment
Problems with clocks, mmu-less systems etc
... I could go on ...
I almost forgot IPC and Unix 'eveything is a file' design decisions
POSIX, why it exists, why it shouldn't
In the end just get them to go through tanenbaum's modern operating systems and also do case studies on some other kernels like Mach/Hurd's microkernel setup and maybe some distributed and exokernel stuff.
Give a broad view past Linux too, I recon
For those who are super geeky, the history of operating systems and why they are the way they are.
The Virtual File System layer is an absolute must for any Linux Operating System class.
I took a similar class in college. The most frustrating but, at the same time, helpful project was writing a small file system for the Linux operating system. Getting this to work takes ~2-3 weeks for a group of 4 people and really teaches you the ins and outs of the Kernel.
I recently took an operating systems class, and I found the projects to be challenging, but essential in understanding the concepts in class. The projects were also fun, in that they involved us actually working with the Linux source code (version 2.6.12, or thereabouts).
Here's a list of some pretty good projects/concepts that I think should be covered in any operating systems class:
The difference between user space and kernel space
Process management (i.e. fork(), exec(), etc.)
Write a small shell that demonstrates knowledge of fork() and exec()
How system calls work, i.e. how do we switch from user to kernel mode
Add a simple system call to the Linux kernel, write a test application that calls the system call to demonstrate it works.
Synchronization in and out of the kernel
Implement synchronization primitives in user space
Understand how synchronization primitives work in kernel space
Understand how synchronization primitives differ between single-CPU architectures and SMP
Add a simple system call to the Linux kernel that demonstrates knowledge of how to use synchronization primitives in the Linux kernel (i.e. something that has to acquire, say, the tasklist lock, etc. but also make it something where you have to kmalloc, which can't be done while holding a lock (unless you GFP_ATOMIC, but you shouldn't, really))
Scheduling algorithms, and how scheduling takes place in the Linux kernel
Modify the Linux task scheduler by adding your own scheduling policy
What is paging? How does it work? Why do we have paging? How does it work in the Linux kernel?
Add a system call to the Linux kernel which, given an address, will tell you if that address is present or if it's been swapped out (or some other assignment involving paging).
File systems - what are they? Why do they exist? How do they work in the Linux kernel?
Disk scheduling algorithms - why do they exist? What are they?
Add a VFS to the Linux kernel
Well, I just finished my OS course this semester so I thought I'd chime in.
I was kind of upset that we didn't actually play around with the actual OS itself, rather we just did system programming. I'd recommend having the labs be on something that is in the OS itself, which is what it sounds like what you want to do.
One lab that I did enjoy and found useful however was writing our own malloc/free routines. It was difficult, but pretty entertaining as well.
Maybe also cover loading programs into memory and/or setting up the memory manager (such as paging).
For labs, one thing that may be cool is to show them actual code and discuss about it, ask questions about what do they think things are done that way and not another, etc.
If I were again in the University I would certainly appreciate more in depth lessons about synchronization primitives, concurrency and so on... those are hard matters that are more difficult to approach without proper guidance. I remember I went to a speech by Paul "Rusty" Russell about spinlocks and other synchronization primitives that was absolutely rad, maybe you could find it in youtube and borrow some ideas.
Another good topic (or possibly exercise for the students) would be looking at virtualisation. Especially Rusty Russel's "lguest" which is designed as a simple introduction to what is required to virtualise an operating system. The docs are good reading too.
I actually just took a class that perfectly fits your description (OS Design using linux) in the spring. I was actually very frustrated with it because I felt like the teacher focused too narrowly for the projects rather than give a broader understanding. For instance, our last project revolved around futexes. My partner and I barely learned what they were, got it working (kinda) and then turned it in. I came away with no general knowledge of anything really from that project. I wish one of the projects had been to write a simple device driver or something like that.
In other words, I think it's good to make sure a good broad overview is presented, with as much detail as you can afford, but ultimately broad. I felt like my teacher nitpicked these tiny areas and made us intensely focus on those, while in the end I did NOT come away with that great of a general understanding of the inner-workings of Linux.
Another thing I'd like to note is a lot of why I didn't retain knowledge from the class was lack of organization. Topics came out of nowhere any given week, and there was no roadmap. Give the material a logical flow. Mental organization is the key to retaining the knowledge.
The networking sub-system is also quite interesting. You could follow a packet as it goes from the socket system call to the wire and the other way around.
Fun assignments could be:
create a state-full firewall by using netfilter
create an HTTP load balancer
design and implement a simple tunneling protocol
Memory mapped I/O and the 1g/3g vs 2g/2g split between kernel address space and user addressable space in 32bit operating systems.
Limitations of 32 bit architecture on hard drive size and what this means for the design of file systems.
Actually just all the pros and cons of going to 64 bit, what it means and why as well as the history and why are aren't there yet.
Related
I have been taking online courses on Operating systems, and I heard them say, that XV6 operating system can be used learn implementation of operating systems, thats all.But after I searched on the internet there aren't enough resources, which would get me started with it.
My question is, why should I use it,and how will it help me in understanding operating system.
(Please be gentle with information you throw at me, I am a newbie :( )
Any effort is appreciated
There are only 2 possibilities:
too complex to be useful for teaching
too simple to be useful for teaching
Things like fancy features/enhanced functionality, mitigating security issues, dealing with hardware bugs/errata, performance, scalability and supporting a very wide range of different hardware all increase the complexity of the code; and if you look at a real commercial OS (e.g. Linux maybe) that has to care about all of these things it's hard to learn about one thing (e.g. memory management) without all the complexity getting in the way and making it significantly harder to learn.
If you have a simple OS that does none of those things (no fancy features, no mitigation of security issues, ...) then it's much much easier to learn basic principles from it; but it also becomes impossible to use it to learn about fancy features, mitigating security issues, dealing with hardware bugs/errata, ...
The solution is to start with a simple OS (e.g. XV6) to learn the basics, then switch to a real OS later to learn everything else.
However; most OS courses at Universities are not intended to teach you about writing an OS. Instead they're intended to give you basic information about operating systems so that you can use that knowledge when writing application programs for existing operating systems. For that reason (and because there's time constraints) they only do the first part (with a simple OS like XV6) and then the course finishes.
1.XV6 is used for teaching in many universities.
2.It's also a tool OS for many program
I am trying to create a DOS-like OS. I have read multiple articles, read multiple books (I even paid a lifetime subscription for O'Reilly Media), but to no avail, I found nothing useful. I want to learn how to make operating system libraries, which rises the question which is: are libraries which you code for a program the same if you are compiling it for an operating system?
I know Operating Systems are very challenging to make and the very few programmers that do attempt to make one never produce a functioning one which is why it's described as "the great pinnacle of programming.". But still, I'm going to make an attempt at making one (just for fun, and maybe learn a few pointers on the way).
All I need to do this is basically learning how to make the libraries, C (which I already know and love), assembly (which I kind-of know how to use along with C) and a compiler (I use the GNU toolchain). The only thing I am having trouble with are coding the libraries. I'm like wow right now, who knew that coding libraries are so hard, like point me to a book or something! But no. I'm not asking for a book right here, all I'm asking for is some advice on how to do this like:
How do you start making some basic I/O libraries
Is it the same as making a regular C library
And finally, is it going to be hard? (JK I know already that this is going to be extremely hard which is why I prepared so much)
In summary, the main question is, how I can make this work or is there a pre-built library that would most likely speed up the process?
Are libraries which you code for a program the same if you are compiling it for an operating system?
Absolutely not. A user-space C library at its lowest level makes system calls to an operating system to interact with hardware via device drivers; it is the device driver and interaction with hardware you will be writing.
From my experience doing embedded system bringups, the way you start is with a development board with a legacy RS-232 port. It's about the easiest possible device to write a driver for - you write bytes to a memory mapped IO address, wait a bit then write some more. This is where your first debug output goes too.
You might find yourself waggling IO pins and probing them with a logic analyser or DSO on the route to this though - hence why you want a development board where the signals are accessible.
None of the standard C-library will be available to you - so you'll need to equivalents of some of things it provides - but in kernel space - including type definitions, memory management, and intrinsics the compiler expects - particularly those for memory barriers. The C-library doesn't provide any data structures or algorithms anyway, but you'll definitely be wanting to write some early on.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
Does anyone know where I might find sample solutions written in C for low level / systems level applications? A really good website or book recommendation would be cool too.
I've learned some of the basics, but would like to see some code within the context of a real solution written in C, and specifically for a lower-level problem. Id' be interested in how C is used within the context of OS programming, for example. What are some areas where C is used for lower-level programming?
Thanks.
I would suggest you to study MINIX3 from Tanenbaum: http://www.minix3.org/
Its a microkernel architecture, and with his book ( http://vig.prenhall.com/catalog/academic/product/0,1144,0131429388,00.html ) it is really enlightning.
As of my opinion, studying the linux kernel is a bit hardcore for a start ;), and out of a academical point of view the microkernel architecture is superior to the monolithic kernel.
Furthermore, with only a few thousands lines of code, unlike the Linux Kernel, its consumable in a realistic timetable.
And its a real serious project, the European Union sponsored some Millions towards it as far as i am aware of. I think i remind him saying that in one of his talks.
And you have a X-Server running there, a gcc-toolchain etcpp.
Have fun :)
EDIT: As i read the comments, someone mentions the Ruby interpreter. Its written in a mixture of C and Ruby, and as far as it was mentioned in one episode of se-radio.net, it is really nice sourcecode. Though i have to admit, i havent looked into it myself. Might be worth the dig into it if you have some interest in Ruby too.
I'd suggest looking at some (for you) interesting open source projects written in C. For example, there's busybox, a piece of software that runs on embedded devices and has lots of smaller programs to study. You could, for example, take the source for the telnet client on one side and the corresponding RFC on the other. Or, for a steeper learning curve, you could also try studying the open source OSes, like the Linux kernel (here's the tree for browsing) or the BSDs. It's a lot more involved than busybox, but you can still find some parts that are fairly easy to understand if you're familiar with the context.
Studying the Linux kernel, maybe in conjunction with one of the several books on the kernel or device drivers would provide a wealth of material. Much of this is available free.
any or all of the books by W. Richard Stevens that walk though the implementation (TCP/IP Illustrated) or use (UNIX Network Programming) of the networking stack or his Advanced Programming in the UNIX Environment book.
If you have a leaning toward Windows there are several good books, even if they're quite old, including:
Programming Server-Side Applications for Microsoft Windows 2000 by Richter and Clark
Programming Applications for Microsoft Windows by Richter
I would suggest the following sources might be interesting r.e. Operating Systems from a learning perspective. Be aware there have been many advancements actually present in modern kernels:
The original linux code.
xv6. This is a simple unix OS that goes along with MIT's excellent OpenCourseWare course on Operating Systems.
Other ideas:
The current grub stage 1 bootloader isn't that complicated - it's pretty hard to be complicated with 512 bytes to play with.
The Linux kernel module guide gives you an introduction to building kernel modules. You could experiment with building custom, yet pointless, drivers that add say character devices to /dev/ or proc devices to /proc and work towards implementing something interesting. People have implemented web servers in kernel space...
If you want to experiment with Windows kernels, have a go with Native NT applications. I'd start with printing a pointless boot message, then move up to drivers.
Beyond that, it's hard to suggest where you might want to go. Systems level is a wide space.
In the context of low level programming, C and C++ are portable assembler. In many of the above spaces the standard library is either partially or totally missing and extra functionality may be implemented by existing parts of the system-level code you're modifying, so you have to be aware of the API functions available to you in any given space and what you need to implement yourself, as well as what your memory and processing requirements must be. For example, a bootloader written to the MBR has to use bios interrupts and starts in real (16-bit) mode. Those are the constraints of the hardware design. Likewise, functions like fopen() aren't available in kernel space since they wrap system calls - you'd need to use kernel specific constructs to achieve this if it really made sense to write a file from kernel space.
I'm looking to implement 'branch and bound' over a cluster (like Amazon's say), as I want it to be horizontally scalable, not limited to a single CPU. There's a paper "Task Pool Teams: A Hybrid Programming Environment for Irregular Algorithms on SMP Clusters" by Judith Hippold and Gudula Runger. It's basically a bottom-up, task-stealing framework like Intel's TBB, except for ad-hoc networks instead of shared memory. If this library was available I'd use it (replacing the local, threaded part with TBB). Unfortunately they don't seem to have made it available for download anywhere that I could find, so I wonder are there other implementations, or similar libraries out there?
It doesn't look like Microsoft's Task Parallel Library has the equivalent, either, to steal from.
(I tried to make a tag 'taskpool' after 'threadpool', the most-used variant (before 'thread-pool') but, didn't have enough points. Anyone heavy enough think it's worth adding?)
edit:
I haven't tried it yet, but it PEBBL (under here: software.sandia.gov/trac/acro/wiki/Packages) claims to scale really high. The paper that the answerer mentions from the Wiley book 'Parallel Branch-and-Bound Algorithms', Crainic, Le Cun and Roucairol, 2006, from "Parallel Combinatorial Optimization", 2006 edited by El-Ghazali Talbi was where I found it, and there are other libraries listed; some may be better, I reserve the right to update this :). Funny that Google didn't find these libs, either my Googling was weak or Google itself fails to be magic sometimes.
When you say "over a cluster" it sounds like you mean distributed memory, and parallelizing branch and bound is a notoriously difficult problem for distributed memory - at least in a way that guarantees scalability. The seminal paper on this topic is available here, and there's an excerpt from a Wiley book on the topic here.
Shared memory branch is bound is an easier problem because you can implement a global task queue. A good high level description of how to do both shared memory and message passing implementations is available here. If nothing else, the references section is worth purusing for ideas and existing implementations.
One thing you might consider is investigating shared message queues like RabbitMQ. It is a AMQP server (a messaging protocol developed so distributed applications can send messages to each other).
you basically need some kind of distributed synchronization/queue
I suggest looking into armci as a low-level distributed memory interface with synchronization and build on top of that.
Alternative is to allocate mpi process as Master to distribute work allocation.
http://www.cs.utk.edu/~dongarra/ccgsc2008/talks/Talk10-Lusk.pdf
Once upon a time, a team of guys sat down and wrote an application in C, running on VMS on a VAX. It was a rather important undertaking and runs a reasonably important back-end operation at LargeCo. This whole shebang works so well that twenty-five years later it's still chugging along and doing it's thing.
Time passes and people retire and it so happens that the Last Man Standing has turned over the keys to a new generation who - we might imagine - are less than thrilled to find themselves caretakers of a system old enough to be their younger brother. Yet, as underwhelmed as they are by the idea of dealing with Ultra Legacy Systems, they can't justify the cost of replacing the venerable application.
LMS discovered that I habla unix and put this question to me. And since I habla unix but don't speak the C I shall summarize and put it to you. Long Story Short:
LMS wants to port LegacyApp, written in C. from VMS to unix. Resources? Any books he can read? People he can talk to?
The first question I'd need to ask is why, and I'd be leading the conversation in the direction of "Do you really need to port it off of VMS". There are a number of things worth mentioning about VMS:
-> VMS is still actively developed and maintained by HP. They just release V8.4 for Field Test last week (see http://h71000.www7.hp.com/openvmsft/).
-> VMS is available on new hardware; specifically HP's Integrity servers based on the Itanium processor.
-> VMS is also available on virtual platforms via the Charon Emulation products.
-> Popular estimates are that there are about 300,000 VMS systems still in active use today. LMS may be the last man at LargeCo, but he's far from the last man standing worldwide.
-> Lots of information out there, see openvms.org for example, to see lots of current information on VMS, all from current users.
OK - you still want to port off of VMS. How do you do it? Well, it depends on lots of stuff.
-> As others have said, how standard is the code? Chances are, not very. The more VMS-isms, the more difficult the job. 'nuff said.
-> What is the database? If it's Oracle, probably not too tough to move to Oracle on some other platform. If it's some sort of custom DB based on RMS index files, then you've got more work to do, you'll need to re-create that pseudo DB, or, understand it enough to replace it with some relational DB.
-> Besides C, what else is used to create the application? What's on the front end? DECforms? FMS? Is there a transaction engine, e.g. ACMS? RTR? These things will have a huge impact on the feasibility and effort required to port to UNIX.
-> What other products are involved? Are there any 3rd party libraries being used? Are there 3rd party products in use that are critical to the application or functionality?
-> Is this system clustered? If so why? You'll need to meet those same goals with the UNIX box.
-> There are companies out there that will help you do it, and claim to have tools to make it easier, but my experience is that these companies tend to be selling you more services than products (i.e. you need to hire them to use the tools. It'll be expensive).
The book UNIX for OpenVMS Users will give the VMS novice some help in understanding VMS, but, as the title says, the book is really intended for the opposite purpose.
Everything written on VMS uses lots of VMS specific stuff it was just so convenient.
There are a few companies that sell compatibility libs to make the port easier - they wont be cheap though, VMS tended to be used where reliability mattered more than cost.
The other option is to run openVMS on some modern hardware, possibly in a VM.
I am sure Brian has made his decision by now, but for my sins of working for many years in DEC OpenVMS language support (yes, some people had this dubious honour) the real question I would have asked a customer such as Brian is: is it a real-time application or not? If it is the former, then it would be heavily dependent on many VMS system services which would rule out a 'port' and indicate a re-write. If it were the latter then the frequency of VMS system services should (possibly) be limited and make a port viable.
The acid test for me, would be to SEARCH *.c "SYS$", "LIB$" i.e. to search all of the C source files for "SYS$" and "LIB$" tags which prefix VMS system services. If the count for these are in the 10s then a port is probably likely, between 10 and 100 makes it possibly likely, but over a 100 makes a successful port highly unlikely.
Hope this helps
You have several choices.
Get the OpenVMS source, and continue to maintain Open VMS as if it were a Linux distribution. Some folks don't mind keeping up with Linux distributions and OpenVMS distributions. It can be done.
Try to recompile the VMS C into Linux. This can be trivial if the C used only standard libraries. This can be very, very difficult if the C used a lot of VMS libraries.
Once you have facts at your fingertips, you can reevaluate this course of action. Since you didn't list a bunch of VMS library methods this program uses, it's impossible to tell how entangled it is with the OS.
This may be trivial or impossible. It's difficult to tell without analysis of the source.
Write bridge libraries from VMS to Linux. If your program only does a few VMS things, this isn't very difficult. If your program does extensive VMS things, this is craziness.
The bridge -- in the long run -- is a terrible idea. Managers love it, however.
An alternative is to replace the VMS library calls with proper, portable Linux calls rather than write bridges. This is better in the long run, because it excises the non-portable features of the program.
Rewrite it from scratch in Python. That is usually simpler than trying to port the C code. It will be shorter, cleaner, simpler, and portable.
If you're willing to keep running VMS in a VM, you can look into CHARON-VAX ( http://www.charon-vax.com/ ). As previously mentioned, the ease of porting really depends a lot on how much of the VMS extensions were used; searching the source code for $ characters embedded in strings (usually with a 3-character leading substring, such as lib$gettime or dsc$descriptor or sys$foobar etc) will give you at least a basic idea of what VMS system functions are called and how likely they are to be portable, if the name is reasonably obvious.
If it ain't broke, don't fix it! Why port it or migrate the app if you don't have to? Why not run it on a current install of OpenVMS running on an HP Itanium server; that is assuming you wish to upgrade the hardware, which may not even be necessary if your VAX hardware is still running strong.
To learn C, you might as well drag it from the horse's mouth: "The C Programming Language" by its inventors, Kernighan and Ritchie.
I can recommend "The UNIX programming environment" by (again) Brian Kernighan; a more authoritative source you'll hardly find, and it teaches you both Unix/C idioms and a bit of C programming at the same time.
For more depth and detail on C, I heartily enjoyed a book by Peter van der Linden: "Expert C Programming - Deep C Secrets".
You'll also want to wrestle LMS for a library documentation of VMS-specific C functions with (of course) special emphasis on those actually used in the app. That's where your porting effort will be.
The job could be easy or difficult, depending on how much machine-specific cleverness and bit-twiddling is done, and how many VMS-specific system calls are used. It would be very good if word size was equal (in other words, if your VMS box has a word size of 32 bits, don't run the code on a 64 bit version of Unix!)
Brian, I'm not sure if LMS specified/cared to port C-code or the WHOLE process. As too often people think of languages out of scope of systems.
If there're was a process built on VMS, most likely it used at least scheduling/batch facilities, which are often scripted in DCL (rather simple and clear language, unlike shell or perl scripting).
So the cost of porting the whole process may be higher than originally perceived by your LMS. Add here the reliability aspect, given your crunches with C, which is nothing impossible, of course, with enthusiasm and determination.
If you want simply give the C-code a try, as previously posted, search it for the "$" hits. Or just cc it with all headers present, the basics of compile-link command should be enough.
Alternatively, this looks like a consultant's call, as indeed such jobs were abundant at the "exodus" time. All said VMS remains quite a robust platform (24x7 is a norm!), unless the harware dies, then there're still tons of "exodus" spares. GOOD LUCK!
About a year and a half later, maybe you've already figured out what to do. My organization has recently decided to stick with OpenVMS instead of switching to Linux even though the old guard recently left. We just couldn't argue with what we felt was a very stable and reliable system. We are currently switching from Alpha servers to Integrity servers for end of life reasons. HP has been very helpful with our transition.
For that matter, there may be Linux vendors out there who can help with the transition. Ask your new hardware vendor if they have any recommendations.
Depending on what languages you already know, C is not that hard to learn. I taught myself C in the course of learning C++ after finally prying myself loose from Pascal.
(VAX Pascal, plus Rdb/VMS, plus DCL formed a combination that was hard to beat.)
If the software is typical C, you'll spend more time learning the library functions than learning the language.
It's pretty lightweight stuff, but I went through the online tutorials for C++ that Microsoft makes available in conjunction with the express edition of Visual Studio for C++.
Here's the beginner's tutorial:
http://msdn.microsoft.com/en-us/beginner/cc305129.aspx
It's probably worth making the effort to ask why LMS wants to port the application to Unix. The answer may seem obvious, but properly exploring the reasons has its benefits. I would assume:
OpenVMS is an "ultra legacy platform", and for that reason alone is something that is not worth running an application on anymore;
It's tough to find anyone who is willing to maintain an application that runs on OpenVMS these days;
The hardware on-which OpenVMS runs is threatening to become moribund.
We have a similar challenge, but in our case the application in question not only runs on OpenVMS but is also written in COBOL. I would have to say that your situation is rosy in comparison given that your application is written in a cross-platform language.
In any case, I think if you're about to make a big decision like moving from OpenVMS to Unix it would be prudent to do a little due diligence. In your case, try to assess just how portable the code is--only then will you know what the scale of the effort is (worst case could quite easily be a multiple of best case). In C, code portability is mostly a function of the dependencies--are they "standard" or are they VMS-specific?
Our enquiries revealed that HP would be supporting OpenVMS on Itanium until at least 2022. There isn't necessarily a need to rush to another platform--perhaps you could keep things on OpenVMS whilst embarking on an effort to prepare the application for porting (make it less dependent on OpenVMS specifics).
VMS has a surprisingly healthy community and if it's the lack of Unix that's the issue, then maybe GNV could help bridge the gap?
Well u have a few options. if this code needs to be ported rather quickly, i would write a bridge library to emulate the vms libs. whener you get it back up and running on a *nix, then go through replacing the vms library calls with native/portable calls for *nix.
Also if there is a lot of optimizations in the code ie inline assembly and bit twiddling. then you will have to rewrite thi code, which will take an understanding of the VAX arch. also. be sure to check word size differences and endian differences