File based storage system - database

Anyone know of a commercially available file based storage system that meets the following requirements:
Should not require installation
Should provide APIs to read and write onto the storage system, preferably .net APIs
Paid/Free (either way it should be supported)
Should be fast and efficient
Basically I am looking for something with database like functionality with the least footprint.

Take a look at Sqlite. It has become the standard solution for a file based database solution - it's even built in to the iPhone, Firefox and many other high profile software/devices.
My Google-fu gave me this simple tutorial of using Sqlite with .net: sqlite-on-dotnet-in-3-mins

Try MongoDB it's a file based document database. Installing it is done by copying it's files and it has a C# driver to read/write data from it.

Here are some thoughts about your question.
The "file based storage system" means "data base" in this context.
Some comments by requirements.
2.1. The first requirement "Should not require installation" means "Embedded database".
2.2. The second requirement "Should provide APIs to read and write ..." is natural for all databases. They all have such API.
2.3. The third requirement "Should be fast and efficient" is a really interesting thing. Here is one of the links by this issue with a lot of useful information Comparison of relational database management systems.
And, finally if you are looking for "something with database like functionality with the least footprint" the basic choice will be SQLite.
It is a small C library that implements a self-contained, embeddable, zero-configuration SQL database engine. There is no set up procedure to initialize it before using it. Databases need minimal or no administration. There is no need to maintain a separate server process dedicated to SQLite. It stores an entire database in a single, ordinary native file that can reside anywhere in a directory of the native file system. Any user who has a permission to read the file can read anything from the database.

Related

Database access in C

I am developing an application completely written in C. I have to save data permanently somewhere. I tried file storage but I feel its really a primitive manner to do the job and I don't want to save my sensitive data in a simple text file. How can i save my data and access it back in an easy manner? I come from JavaScript background and would prefer something like jsons. I will be happy with something like postgreSQL also. Give me some suggestions. I am using gcc (Ubuntu 4.4.3-4ubuntu5) 4.4.3.
sqlite seems to meet your requirements.
SQLite is an embedded SQL database engine. Unlike most other SQL
databases, SQLite does not have a separate server process. SQLite
reads and writes directly to ordinary disk files. A complete SQL
database with multiple tables, indices, triggers, and views, is
contained in a single disk file. The database file format is
cross-platform - you can freely copy a database between 32-bit and
64-bit systems or between big-endian and little-endian architectures.
These features make SQLite a popular choice as an Application File
Format. Think of SQLite not as a replacement for Oracle but as a
replacement for fopen()
Check out the quickstart
http://www.postgresql.org/docs/8.1/static/libpq.html
libpq is the C application programmer's interface to PostgreSQL. libpq is a set of library functions that allow client programs to pass queries to the PostgreSQL backend server and to receive the results of these queries.
I would recommend SQLite. I think it is a great way of storing local data.
There are C library bindings, and its API is quite simple.
Its main advantage is that all you need is the library. You don't need a complex database server setup (as you would with PostgreSQL). Also, its footprint is quite small (it's also used a lot in mobile development world {iOS, android, others}).
Its drawback is that it doesn't handle concurrency that well. But if it is a local, simple, single-threaded application, then I guess it won't be a problem.
MySQL embedded or BerkeleyDB are other options you might want to take a look at.
SQLite is a lightweight database. This page describes the C language interface:
http://www.sqlite.org/capi3ref.html
SQLite is a software library that implements a self-contained, serverless, zero-configuration, transactional SQL database engine. SQLite is the most widely deployed SQL database engine in the world. The source code for SQLite is in the public domain.
SQLite is a popular choice because it's light-weight and speedy. It also offers a C/C++ interface (including a bunch of other languages).
Everyone else has already mentioned SQLite, so I'll counter with dbm:
http://linux.die.net/man/3/dbm_open
It's not quite as fancy as SQLite (e.g, it's not a full SQL database), but it's often easier to work with from C, as it requires less setup.

Databases in offline software?

I'm primarily a web developer, currently learning C and planning on going into C++ in a year or so when I feel absolutely confident with C (Note: I'm not saying I'll be a master at C, just that I'll understand it in a fair amount of depth and will retain it properly rather than forgetting it when I see a new language).
My question is, how are offline/networked applications written with database functionality? I've built many-a database driven website in PHP and MySQL and would like to know how to use databases with my C projects - a lot of the applications I have the desire to write rely more on content management rather than processing data as such. What database formats are available to me? What should I be looking at to build a simple contact database for example?
Thanks in advance.
I'd suggest SQLite for file-based database. Mongo is pretty awesome too if you run it locally but it is still networked.
For a small application SQLLite might be a good option for you - it is part of your application and not dependant on other software but as a database is fairly weak (No triggers, no stored procedures afaik).
If you are looking for something more substantial (especially when it involves multiple users) you should be looking for MySQL or SQLServer. These can be accessed directly from their respective API's or via some kindof common mediator such as ODBC.
Your question is really very open, much application software depends on relational database technology at some level but the OS and the required task ussually dictate the best choices.
Going the SQL route with offline applications in C is not straightforward. Whereas the database storage brings in advantages, in terms of reliability e.g., it adds conversion steps during the save/load of your data, simply by using SQL.
The question is why would you want to create SQL commands as character strings to load/save the data that is treated as binary in your program, and that you can store as binary directly in your system local storage? It costs!
On the other side, if you already know SQL well, then you'll only have to learn about an (there are several) API to access a database (SQLite, MySQL ...) from C to get started.

Which embedded database capable of 100 million records has an efficient C or C++ API

I'm looking for a cross-platform database engine that can handle databases up hundreds of millions of records without severe degradation in query performance. It needs to have a C or C++ API which will allow easy, fast construction of records and parsing returned data.
Highly discouraged are products where data has to be translated to and from strings just to get it into the database. The technical users storing things like IP addresses don't want or need this overhead. This is a very important criteria so if you're going to refer to products, please be explicit about how they offer such a direct API. Not wishing to be rude, but I can use Google - please assume I've found most mainstream products and I'm asking because it's often hard to work out just what direct API they offer, rather than just a C wrapper around SQL.
It does not need to be an RDBMS - a simple ISAM record-oriented approach would be sufficient.
Whilst the primary need is for a single-user database, expansion to some kind of shared file or server operations is likely for future use.
Access to source code, either open source or via licensing, is highly desirable if the database comes from a small company. It must not be GPL or LGPL.
you might consider C-Tree by FairCom - tell 'em I sent you ;-)
i'm the author of hamsterdb.
tokyo cabinet and berkeleydb should work fine. hamsterdb definitely will work. It's a plain C API, open source, platform independent, very fast and tested with databases up to several hundreds of GB and hundreds of million items.
If you are willing to evaluate and need support then drop me a mail (contact form on hamsterdb.com) - i will help as good as i can!
bye
Christoph
You didn't mention what platform you are on, but if Windows only is OK, take a look at the Extensible Storage Engine (previously known as Jet Blue), the embedded ISAM table engine included in Windows 2000 and later. It's used for Active Directory, Exchange, and other internal components, optimized for a small number of large tables.
It has a C interface and supports binary data types natively. It supports indexes, transactions and uses a log to ensure atomicity and durability. There is no query language; you have to work with the tables and indexes directly yourself.
ESE doesn't like to open files over a network, and doesn't support sharing a database through file sharing. You're going to be hard pressed to find any database engine that supports sharing through file sharing. The Access Jet database engine (AKA Jet Red, totally separate code base) is the only one I know of, and it's notorious for corrupting files over the network, especially if they're large (>100 MB).
Whatever engine you use, you'll most likely have to implement the shared usage functions yourself in your own network server process or use a discrete database engine.
For anyone finding this page a few years later, I'm now using LevelDB with some scaffolding on top to add the multiple indexing necessary. In particular, it's a nice fit for embedded databases on iOS. I ended up writing a book about it! (Getting Started with LevelDB, from Packt in late 2013).
One option could be Firebird. It offers both a server based product, as well as an embedded product.
It is also open source and there are a large number of providers for all types of languages.
I believe what you are looking for is BerkeleyDB:
http://www.oracle.com/technology/products/berkeley-db/db/index.html
Never mind that it's Oracle, the license is free, and it's open-source -- the only catch is that if you redistribute your software that uses BerkeleyDB, you must make your source available as well -- or buy a license.
It does not provide SQL support, but rather direct lookups (via b-tree or hash-table structure, whichever makes more sense for your needs). It's extremely reliable, fast, ACID, has built-in replication support, and so on.
Here is a small quote from the page I refer to above, that lists a few features:
Data Storage
Berkeley DB stores data quickly and
easily without the overhead found in
other databases. Berkeley DB is a C
library that runs in the same process
as your application, avoiding the
interprocess communication delays of
using a remote database server. Shared
caches keep the most active data in
memory, avoiding costly disk access.
Local, in-process data storage
Schema-neutral, application native data format
Indexed and sequential retrieval (Btree, Queue, Recno, Hash)
Multiple processes per application and multiple threads per process
Fine grained and configurable locking for highly concurrent systems
Multi-version concurrency control (MVCC)
Support for secondary indexes
In-memory, on disk or both
Online Btree compaction
Online Btree disk space reclamation
Online abandoned lock removal
On disk data encryption (AES)
Records up to 4GB and tables up to 256TB
Update: Just ran across this project and thought of the question you posted:
http://tokyocabinet.sourceforge.net/index.html . It is under LGPL, so not compatible with your restrictions, but an interesting project to check out, nonetheless.
SQLite would meet those criteria, except for the eventual shared file scenario in the future (and actually it could probably do that to if the network file system implements file locks correctly).
Many good solutions (such as SQLite) have been mentioned. Let me add two, since you don't require SQL:
HamsterDB fast, simple to use, can store arbitrary binary data. No provision for shared databases.
Glib HashTable module seems quite interesting too and is very
common so you won't risk going into a dead end. On the other end,
I'm not sure there is and easy way to store the database on the
disk, it's mostly for in-memory stuff
I've tested both on multi-million records projects.
As you are familiar with Fairtree, then you are probably also familiar with Raima RDM.
It went open source a few years ago, then dbstar claimed that they had somehow acquired the copyright. This seems debatable though. From reading the original Raima license, this does not seem possible. Of course it is possible to stay with the original code release. It is rather rare, but I have a copy archived away.
SQLite tends to be the first option. It doesn't store data as strings but I think you have to build a SQL command to do the insertion and that command will have some string building.
BerkeleyDB is a well engineered product if you don't need a relationDB. I have no idea what Oracle charges for it and if you would need a license for your application.
Personally I would consider why you have some of your requirements . Have you done testing to verify the requirement that you need to do direct insertion into the database? Seems like you could take a couple of hours to write up a wrapper that converts from whatever API you want to SQL and then see if SQLite, MySql,... meet your speed requirements.
There used to be a product called b-trieve but I'm not sure if source code was included. I think it has been discontinued. The only database engine I know of with an ISAM orientation is c-tree.

How would you build a database filesystem (DBFS)?

A database file system is a file system that is a database instead of a hierarchy. Not too complex an idea initially but I thought I'd ask if anyone has thought about how they might do something like this? What are the issues that a simple plan is likely to miss? My first guess at an implementation would be something like a filesystem to for a Linux platform (probably atop an existing file system) but I really don't know much about how that would be started. Its a passing thought that I doubt I'd ever follow through on but I'm hoping to at least satisfy my curiosity.
DBFS is a really nice PoC implementation for KDE. Instead of implementing it as a file system directly, it is based on indexing on a traditional file system, and building a new user interface to make the results accessible to users.
The easiest way would be to build it using fuse, with a database back-end.
A more difficult thing to do is to have it as a kernel module (VFS).
On Windows, you could use IFS.
I'm not really sure what you mean with "A database file system is a file system that is a database instead of a hierarchy".
Probably, using "Filesystem in Userspace" (FUSE), as mentioned by Osama ALASSIRY, is a good idea. The FUSE wiki lists a lot of existing projects about databased-backed filesystems as well as filesystems in which you can search by SQL-like queries.
Maybe this is a good starting point for getting an idea how it could work.
It's a basic overview of the Firebird architecture.
Firebird is an opensource RDBMS, so you can have a real deep insight look, too, if you're interested.
Its been a while since you asked this. I'm surprised no one suggested the obvious. Look at mainframes and minis, especially iSeries-OS (now called IBM-i used to be called iOS or OS/400).
How to do an relational database as a mass data store is relatively easy. Oracle and MySQL both have these. The catch is it must be essentially ubiquitous for end user applications.
So the steps for an app conversion are:
1) Everything in a normal hierarchical filesystem
2) Data in BLOBs with light metadata in the database. File with some catalogue information.
3) Large data in BLOBs with extensive metadata and complex structures in the database. File with substantial metadata associated with it that can be essentially to understanding the structure.
4) Internal structures of the BLOB exposed in an object <--> Relational map with extensive meta-data. While there may be an exportable form, the application naturally works with the database, the notion of the file as the repository is lost.

Shared File Database Suggestion

I would like to build and deploy a database application for Windows based systems, but need to live within the following constraints:
Cannot run as a server (i.e., have open ports);
Must be able to share database files with other instances of the program (running on other machines);
Must not require a DBA for maintenance;
No additional cost for run-time license.
In addition, the following are nice to have "features":
Zero-install (e.g., no registry entries, no need to put files in \Windows\..., etc.);
"Reasonable" performance (yes, that's vague);
"Reasonable" file size limitations (at least 1GB per table/file--just in case).
I've seen this question
Embedded Database for .net that can run off a network
but it doesn't quite answer it for me.
I have seen the VistaDB site, but while it looks promising, I have no personal experience with it.
I have also looked at SQLite, and while it seems good enough for Goggle, I (again) have no personal experience with it.
I would love to use a Java based solution because it's cross-platform (even though my main target is Windows, I'd like to be flexible) and WebStart is a really nice way to distribute software, but the most commonly used DBs (Derby and hsqldb) won't support shared access.
I know that I'm not the only one who's trying/tried to do this, so I'm hoping I could get some advice.
I'd go with SQLite. There are SQLite bindings for everything, and it's very widely used as a embedded database for a large number of applications.
I use SQLite at work and one thing that you should keep in mind is that its file based and uses a file lock for managing concurrent connections. It is not a great solution when you have multiple users trying to use the database at the same time. SQLite is however a great database for one user application, its fast, has a small foot print and has a thriving community built around it.
If you've got VStudio sitting around, how about SQL Server 3.5 Compact edition? MSSQL running in-proc.
http://www.microsoft.com/sql/editions/compact/downloads.mspx

Resources