Operating systems vs. database management systems - database

Context: I'm a student who just finished an operating systems course and is currently taking a databases course.
I'm confused about how the OS and the DBMS interact with one another.
For example, what happens when a user program tries to access a file? Does a system call get invoked that is then handled by the OS to find the correct file and data? Or is the call handled by the DBMS, which can then more efficiently find the data (tuple/record) using a B+ tree for example? And then the DBMS makes a call to the OS to actually get the data?
Is the database only accessed if using a programming language like SQL? If I just write a simple C program that writes a file to disk, is the data really stored in a "database" or just in some block on disk where the information for the file is stored within the inode for that file?
I apologize if this isn't the correct forum to ask this question and also if this question is too simple. I tried looking online, but surprisingly didn't find much info (maybe I was searching for the wrong key words?)

For example, what happens when a user program tries to access a file?
That depends on how the user program is accessing the file. fopen/read/write calls are offered by the filesystem managed by the OS.
Does a system call get invoked that is then handled by the OS to find the correct file and data?
If a database is used, the database manages it's own set of data files, and indexes into those data files. The database engine executes IO requests which are handled by the underlying OS. Additionally, the database most possibly will also do caching to reduce file IO.
Or is the call handled by the DBMS, which can then more efficiently find the data (tuple/record) using a B+ tree for example? And then the DBMS makes a call to the OS to actually get the data?
Depending on the database query, the data can be read sequentially, or accessed randomly via an index lookup.
Is the database only accessed if using a programming language like SQL? If I just write a simple C program that writes a file to disk, is the data really stored in a "database" or just in some block on disk where the information for the file is stored within the inode for that file?
It's stored in the filesystem that might used a block based design. A simple C program should use an SDK to connect to a database, and then invoke SQL statements.
Hope this helps!

The answer really depends upon the operating system and the database system.Industrial-strength operating systems have support for multiple file structures and record level locking. A database system can make use of the facilities provide by the operating system for locking and indices. In that case, much of the database works by invoking system service calls to the operating system.
With the rise of brain damaged operating systems that do not support records at all, let alone record or column locking, database system does nearly of the work, beyond low level I/O calls. Some operating systems are so brain damaged that database systems have to create their own partitions and effectively manage their own file systems within the partition. There are no files at all from the perspective of the operating system.

Related

Is there a way to prevent a file from being completely loaded by a software?

Is there a way to limit a hard drive from reading a certain file? Ex. It's given to Program A the order to open a .txt file. Program B overloads the .txt file opening hundreds times a second. Program A is unable to open the txt file.
So I'm trying to stress test a game engine that relies on extracting all used textures from a single file at once. I think that this extraction method is causing some core problems to the game developing experience of the engine overall. My theory is that the problem is caused by the slow reading time of some hard drives. But I'm not sure if I'm right on this, and I needed I way to test this out.
Most operating systems support file locking and file sharing so that you can establish rules for processes that share access to a file.
.NET, for example (which runs on Windows, Linux, and MacOS), provides the facility to open a file in a variety of sharing modes.
For very rapid access like you describe, you may want to consider a memory-mapped file. They are supported on many operating systems and via various programming languages. .NET also provides support.

Securely remove file from ext3 linux

This question has been asked with varying degrees of success in the past...
Are there tools, or C/C++ unix functions to call that would enable me to retrieve the location on disk of a file? Not some virtual address of the file, but the disk/sector/block the file resides in?
The goal here is to enable overwriting of the actual bits that exist on disk. I would probably need a way to bypass the kernel's superimposition of addresses. I am willing to consider an x86 asm based solution...
However, I feel there are tools that do this quite well already.
Thanks for any input on this.
Removing files securely is only possible under very specific circumstances:
There are no uncontrolled layers of indirection between the OS and the actual storage medium.
On modern systems that can no longer be assumed. SSD drives with firmware wear-leveling code do not work like this; they may move or copy data at will with no logging or possibility of outside control. Even magnetic disk drives will routinely leave existing data in sectors that have been remapped after a failure. Hybrid drives do both...
The ATA specification does support a SECURE ERASE command which erases a whole drive, but I do not know how thorough the existing implementations are.
The filesystem driver has a stable and unique mapping of files to physical blocks at all times.
I believe that ext2fs does have this feature. I also think that ext3fs and ext4fs also work like this in the default journaling mode, but not when mounted with the data=journal option which allows for file data to be stored in the journal, rather than just metadata.
On the other hand reiserfs definitely works differently, since it stores small amounts of data along with the metadata, unless mounted with the notail option.
If these two conditions are met, then a program such as shred may be able to securely remove the content of a file by overwriting its content multiple times.
This method still does not take into account:
Backups
Virtualized storage
Left over data in the swap space
...
Bottom line:
You can no longer assume that secure deletion is possible. Better assume that it is impossible and use encryption; you should probably be using it anyway if you are handling sensitive data.
There is a reason that protocols regarding sensitive data mandate the physical destruction of the storage medium. There are companies that actually demagnetize their hard disk drives and then shred them before incinerating the remains...

Possible to build support for a filesystem directly into an application?

I am wondering if it's possible to write an application that will access a foreign filesystem, but without needing support for that filesystem from the operating system. For example, I'd like to write an app in C that runs on Mac OS X that can browse / copy files from an ext2/ext3 formatted disk. Of course, you'd have to do all the transfers through the application (not through the system using cp or the Finder), but that would be OK for my purpose. Is this possible?
There are user space libraries that allow you to access file systems.
The Linux-NTFS library (libntfs) allows you to access NTFS file systems and there are user space programs like ntfsfix to do things to the file system.
E2fsprogs does the same for ext2, ext3 and ext4 filesystems.
As Basile mentioned, Mtools is another one that provides access to FAT partitions.
There was even a program that does exactly what you're looking for on Windows. It's called ext2explore and allows you to access ext2 partitions from Windows.
It is possible. For example the GNU mtools utility are doing that (assuming a way to access the raw device or partition) for MS-DOS FAT file systems.
However, file systems inside the kernel are usually very well tested and optimized.
Yes and No. For a regular user Application is usually not possible because access to block devices is restricted to root only. Every block device should give read/write to the needed block device for that effect. This would need at best a server/client approach where a service is started on the machine and configured to give the permissions on a per block device manner.
The somewhat easier alternative would be you to use the MacFUSE implementation.
Look here:
http://code.google.com/p/macfuse/
http://groups.google.com/group/macfuse?pli=1
The MacFuse project seems no longer mantained, but can give you a starting point for your project.
The dirty and quick approach is the following as root chmod 666 /dev/diskN
You can hijack syscalls and library calls from your application and then redirect reads/writes to anything like a KV store or a distributed DB layer (using the regular calls for the "virtual devices" that you do not support).
Then, the possibilities are boundless because you don't have to reach the physical/virtual devices when someone asks for them (resolving privilege issues).

Are there operating systems that aren't based off of or don't use a file/directory system?

It seems like there isn't anything inherent in an operating system that would necessarily require that sort of abstraction/metaphor.
If so, what are they? Are they still used anywhere? I'd be especially interested in knowing about examples that can be run/experimented with on a standard desktop computer.
Examples are Persistent Haskell, Squeak Smalltalk, and KeyKOS and its descendants.
It seems like there isn't anything inherent in an operating system
that would necessarily require that sort of abstraction/metaphor.
There isn't any necessity, it's completely bogus. In fact, forcing everything to be accessible via a human readable name is fundamentally flawed, and precludes security due to Zooko's triangle.
Examples of hierarchies similar to this appear as well in DNS, URLs, programming language module systems (Python and Java are two good examples), and torrents, X.509 PKI.
One system that fixes some of the problems caused by DNS/URLs/X.509 PKI is Waterken's YURL.
All these systems exhibit ridiculous problems because the system is designed around some fancy hierarchy instead of for something that actually matters.
I've been planning on writing some blogs explaining why these types of systems are bad, I'll update with links to them when I get around to it.
I found this http://pages.stern.nyu.edu/~marriaga/papers/beyond-the-hfs.pdf but it's from 2003. Is something like that what you are looking for?
About 1995, I started to design an object oriented operating system
(SOOOS) that has no file system.
Almost everything is an object that exists in virtual memory
which is mapped/paged directly to the disk
(either local or networked, I.e. redudimentary cloud computing).
There is a lot of overhead in programs to read and write data in specific formats.
Image never reading and writing files.
In SOOOS there are no such things as files and directories,
Autonomous objects, which would essentially replace files, can be organized
suiting your needs, not simply a restrictive hierarchical file system.
There are no low level drive format structures (I.e. clusters)
that have additional level of abstraction and translation overhead.
SOOOS Data storage overhead is simply limited to page tables
that can be quickly indexed as with basic virtual memory paging.
Autonomous objects each have their own dynamic
virtual memory space which serves as the persistent data store.
When active they are given a task context and added to the active process task list
and then exist as processes.
A lot of complexity is eliminated in my design, simply instanciate objects
in a program and let the memory manager and virtual memory system handle
everything consistently with minimal overhead.
Booting the operating system is simply a matter of loading the basic kernal
setting up the virtual memory page tables to the key OS objects and
(re)starting the OS object tasks. When the computer is turned-off,
shutdown is essentially analogous to hibernation
so the OS is nearly in instant-on status,
The parts (pages) of data and code are loaded only as needed.
For example to edit a document, instead of starting a program by loading the entire
executable in memory, simply load the task control structure of the
autonomous object and set the instruction pointer to the function to be performed.
The code is paged in only as the instruction pointer traverses its virtual memory.
Data is always immediately ready to be used and simply paged in only as accessed
with no need to parse files and manage data structures which often
have a distict represention in memory from secondary storage.
Simply use the program's native memory allocation mechanism and
abstract data types without disparate and/or redundent data structures.
Object Linking and Embedding type of program interaction,
memory mapped IO, and interprocess communication you
get practically for free as one would implement
memory sharing using the facilities of the processor's Memory Management Unit.

How to implement a very simple filesystem?

I am wondering how the OS is reading/writing to the hard drive.
I would like as an exercise to implement a simple filesystem with no directories that can read and write files.
Where do I start?
Will C/C++ do the trick or do I have to go with a more low level approach?
Is it too much for one person to handle?
Take a look at FUSE: http://fuse.sourceforge.net/
This will allow you to write a filesystem without having to actually write a device driver. From there, I'd start with a single file. Basically create a file that's (for example) 100MB in length, then write your routines to read and write from that file.
Once you're happy with the results, then you can look into writing a device driver, and making your driver run against a physical disk.
The nice thing is you can use almost any language with FUSE, not just C/C++.
I found it quite easy to understand a simple filesystem while using the fat filesystem on the avr microcontroller.
http://elm-chan.org/fsw/ff/00index_e.html
Take look at the code you will figure out how fat works.
For learning the ideas of a file system it's not really necessary to use a disk i think. Just create an array of 512 byte byte-arrays. Just imagine this a your Harddisk an start to experiment a bit.
Also you may want to hava a look at some of the standard OS textbooks like http://codex.cs.yale.edu/avi/os-book/OS8/os8c/index.html
The answer to your first question, is that besides Fuse as someone else told you, you can also use Dokan that does the same for Windows, and from there is just a question of doing Reads and Writes to a physical partition (http://msdn.microsoft.com/en-us/library/aa363858%28v=vs.85%29.aspx (read particularly the section on Physical Disks and Volumes)).
Of course that in Linux or Unix besides using something like Fuse you only have to issue, a read or write call to the wanted device in /dev/xxx (if you are root), and in these terms the Unices are more friendly or more insecure depending on your point of view.
From there try to implement a simple filesystem like Fat, or something more exoteric like an tar filesystem, or even some simple filesystem based on Unix concepts like UFS or Minux, or just something that only logs the calls that are made and their arguments to a log file (and this will help you understand, the calls that are made to the filesystem driver during the regular use of your computer).
Now your second question (that is much more simple to answer), yes C/C++ will do the trick, since they are the lingua franca of system development, also a lot of your example code will be in C/C++ so you will at least read C/C++ in your development.
Now for your third question, yes, this is doable by one person, for example the ext filesystem (widely known in Linux world by it's successors as ext2 or ext3) was made by a single developer Theodore Ts'o, so don't think that these things aren't doable by a single person.
Now the final notes, remember that a real filesystem interacts with a lot of other subsystems in a regular kernel, for example, if you have a laptop and hibernate it the filesystem has to flush all changes made to the open files, if you have a pagefile on the partition or even if the pagefile has it's own filesystem, that will affect your filesystem, particularly the block sizes, since they will tend to be equal or powers of the page block size, because it's easy to just place a block from the filesystem on memory that by coincidence is equal to the page size (because that's just one transfer).
And also, security, since you will want to control the users and what files they read/write and that usually means that before opening a file, you will have to know what user is logged on, and what permissions he has for that file. And obviously without filesystem, users can't run any program or interact with the machine. Modern filesystem layers, also interact with the network subsystem due to the fact that there are network and distributed filesystems.
So if you want to go and learn about doing kernel filesystems, those are some of the things you will have to worry about (besides knowing a VFS interface)
P.S.: If you want to make Unix permissions work on Windows, you can use something like what MS uses for NFS on the server versions of windows (http://support.microsoft.com/kb/262965)

Resources