Creating custom SQL data source using standard protocol - database

Problem
I have a custom data source (an inhouse system) which I would like to access as a standard data source. I am looking for a solution to provide standard SQL-like accessors so that the data source can be used from different report engines, Excel, MS access, maybe standard web frontends and off-the-shelf data management tool. In other words, I would like off-the-shelf support for ODBC, JDBC and whatnot, without having to implement support for all these drivers myself.
What I have been doing so far
I have successfully used the SQLite virtual table mechanism to provide access to the data source using a standard SQLite driver. SQLite will take care of the SQL query parsing, table metadata translation (provided by my extension) and manage the SQL parts that my data source does not support (aggregates, complex joins and updates, etc).
However, what I don't get with SQLite is network support. SQLite is an embedded database engine which works very well with my data source, but although it has ODBC and JDBC support, it has no wire protocol. Embedding my custom data source in the client process is not an option, since the data source has very strict runtime restrictions (among other restrictions).
What I am considering
1. Networked SQLite
The obvious solution is to look at if it is possible to network the SQLite data source. However, the network options does not seem well supported, especially not with client drivers (i.e. not at all).
2. MySQL storage engine
I have been looking at replacing the SQLite virtual table driver with a MySQL storage engine (30 minutes of reading API specs gives me a gut feeling the APIs are quite similar). I have three concerns:
Process control. My data source is a system which wants to manage its own processes. I would prefer to be the one responsible for service provisioning.
Running the whole MySQL server looks like overkill from an IT adminstration point of view. An embedded networked server would suffice. I already got the network server (it's already a web service process).
Licensing. MySQL looks like GPL or expensive. I did not find anything conclusive on what license requirements this setup would force me into.
3. Mimicking a known network protocol
I have been looking into mimicking "known" protocols such as MySQL wire protocol or MSSQL (freeTDS is a good source). However the amount of readily available solutions look scarce, and I might have to roll my own if I go down this path, which is probably a lot of work.
I am looking for other options on how to do this. Right now, I am investigating if it is possible to choose #2 and use an interface between my data source and the storage engine (e.g. 0mq or some network protocol). I believe it is doable, but I am very interested in easier solutions. Has anyone out there done something similar (with success)?

Related

How is data stored in a non-web based application?

The main scope of the question is to identify what/how non-web based applications aka an executable stores information.
Surely, web-based applications use databases.
Does a non-web based app store information in a compiled of a database module?
For example, if we have postgres, would we compile the postgres source and use it in someway with a driver to store info locally?
If not how is information stored? Are databases only for web-based apps? Why would someone compile/build/make the source of a DB?
TLDR: An example situation;
We have a non-web based game, where exactly do you store character stats, progress, encounters etc. do you use a database for this? If not how?
There are lots of options:
Static files (json, yaml, xml, .ini, custom text formats, ...) read into memory, modified, and written out periodically and/or on exit. Care is taken to write the new file, then rename it to overwrite the old one.
Embedded SQL databases like SQLite, Firebird, Microsoft JET, HSQLDB, Derby, etc
Embedded key/value stores like Berkely DB
Standlone SQL databases, bundled with the app installer, like PostgreSQL, MySQL, MS-SQL, etc. Or the same DBs installed separately by the user, where the app is then configured to use an existing DB.
System configuration databases like the Windows Registry. Not suitable for storing data that changes a lot, is updated frequently, or for storing a lot of data. Don't do this please.
Platform and language specific facilities like Java or Swift object serialization. Best avoided, but they have their place.
Various wacky custom formats and schemes
It's completely unrelated to whether you compile sources, etc. Most embedded databases are available as a shared library (DLL, dylib, etc) with headers. You might link them to your program at compile time, but only in the same way you link the database drivers of some other DBs (or driver frameworks like ODBC) into your app.
No matter what you actually use, in most cases data gets stored in the desktop user's profile, or in a mobile app data store sandbox. The main exception is DB servers

Database access in C

I am developing an application completely written in C. I have to save data permanently somewhere. I tried file storage but I feel its really a primitive manner to do the job and I don't want to save my sensitive data in a simple text file. How can i save my data and access it back in an easy manner? I come from JavaScript background and would prefer something like jsons. I will be happy with something like postgreSQL also. Give me some suggestions. I am using gcc (Ubuntu 4.4.3-4ubuntu5) 4.4.3.
sqlite seems to meet your requirements.
SQLite is an embedded SQL database engine. Unlike most other SQL
databases, SQLite does not have a separate server process. SQLite
reads and writes directly to ordinary disk files. A complete SQL
database with multiple tables, indices, triggers, and views, is
contained in a single disk file. The database file format is
cross-platform - you can freely copy a database between 32-bit and
64-bit systems or between big-endian and little-endian architectures.
These features make SQLite a popular choice as an Application File
Format. Think of SQLite not as a replacement for Oracle but as a
replacement for fopen()
Check out the quickstart
http://www.postgresql.org/docs/8.1/static/libpq.html
libpq is the C application programmer's interface to PostgreSQL. libpq is a set of library functions that allow client programs to pass queries to the PostgreSQL backend server and to receive the results of these queries.
I would recommend SQLite. I think it is a great way of storing local data.
There are C library bindings, and its API is quite simple.
Its main advantage is that all you need is the library. You don't need a complex database server setup (as you would with PostgreSQL). Also, its footprint is quite small (it's also used a lot in mobile development world {iOS, android, others}).
Its drawback is that it doesn't handle concurrency that well. But if it is a local, simple, single-threaded application, then I guess it won't be a problem.
MySQL embedded or BerkeleyDB are other options you might want to take a look at.
SQLite is a lightweight database. This page describes the C language interface:
http://www.sqlite.org/capi3ref.html
SQLite is a software library that implements a self-contained, serverless, zero-configuration, transactional SQL database engine. SQLite is the most widely deployed SQL database engine in the world. The source code for SQLite is in the public domain.
SQLite is a popular choice because it's light-weight and speedy. It also offers a C/C++ interface (including a bunch of other languages).
Everyone else has already mentioned SQLite, so I'll counter with dbm:
http://linux.die.net/man/3/dbm_open
It's not quite as fancy as SQLite (e.g, it's not a full SQL database), but it's often easier to work with from C, as it requires less setup.

File based storage system

Anyone know of a commercially available file based storage system that meets the following requirements:
Should not require installation
Should provide APIs to read and write onto the storage system, preferably .net APIs
Paid/Free (either way it should be supported)
Should be fast and efficient
Basically I am looking for something with database like functionality with the least footprint.
Take a look at Sqlite. It has become the standard solution for a file based database solution - it's even built in to the iPhone, Firefox and many other high profile software/devices.
My Google-fu gave me this simple tutorial of using Sqlite with .net: sqlite-on-dotnet-in-3-mins
Try MongoDB it's a file based document database. Installing it is done by copying it's files and it has a C# driver to read/write data from it.
Here are some thoughts about your question.
The "file based storage system" means "data base" in this context.
Some comments by requirements.
2.1. The first requirement "Should not require installation" means "Embedded database".
2.2. The second requirement "Should provide APIs to read and write ..." is natural for all databases. They all have such API.
2.3. The third requirement "Should be fast and efficient" is a really interesting thing. Here is one of the links by this issue with a lot of useful information Comparison of relational database management systems.
And, finally if you are looking for "something with database like functionality with the least footprint" the basic choice will be SQLite.
It is a small C library that implements a self-contained, embeddable, zero-configuration SQL database engine. There is no set up procedure to initialize it before using it. Databases need minimal or no administration. There is no need to maintain a separate server process dedicated to SQLite. It stores an entire database in a single, ordinary native file that can reside anywhere in a directory of the native file system. Any user who has a permission to read the file can read anything from the database.

ADO or DBX using Delphi

Which is better (and for what reasons) to use to connect to MS SQL, Oracle or Firebird from a Delphi Win32 application -- ADO or DBX (Database Express)?
Both allow you to connect to the major databases. I like the way ADO does it all with a connection string change and the fact that ADO and the drivers are included with Windows so nothing extra to deploy (it seems, correct me if I'm wrong).
DBX is also flexible and I can compile the drivers into my app, can I not?
I really am keen to have a single source if possible, with the ability to vary databases depending on the customer's IT department/preferences.
But which is easier to program, performs better, uses memory most efficiently? Any other things to differentiate them on?
Thanks, Richard
ADO is simple to use and is there, you only must make sure to install the correponding client driver in the client side.
I found DBX more flexible and it is better integrated within IDE and another technologies like DataSnap.
For the same purpose than you, I have used DBX with Third Party Drivers from DevArt.
You can compile the drivers with your application if you buy the drivers sources.
In the beginning of Delphi, people praised the multi-DBMS support in Delphi. Everyone loved the BDE (because that was the only way to do that).
But when looking at customers over more then the past decade, I have seen a steady decrease of multi-DBMS support in their applications.
The cost of supporting multiple DBMS from one application is high.
Not only because you have to have knowledge of each DBMS, but also because each DBMS has its own set of pecularities, where you have to adapt for in your data access layer. These not only include differences in syntax and underlying data types, but also optimization strategies.
Also, some DBMS work better with ADO, some better with a direct connection (like skipping your Oracle client all together).
Finally testing all the combinations of your software with multiple DBMS systems is very intensive.
I've been involved in a few projects where we had to change the DBMS backend and/or the data access technology (from i.e. BDE to DBX, or from DBX to a direct connection). Changing the backend always was much more painfull than changing the data access technology. Multi-tier approaches made them somewhat easier, but increased the degrees of freedom and therefor the testing efforts.
Some of products that I do see that support multi-DBMS are in vertical market applications where the final customer already has their own DBMS infrastructure and the application needs to adapt to that. For instance in Dutch governmental areas, Oracle has been really strong, but SQL Server has established quite a user base as well.
So you need to think about what combinations of DBMS you want to support, not only in terms of functionality, but also in terms of cost.
If you stick to one DBMS, then it makes no sense to go for a generic data access layer like BDE, DBX or ADO: it pays off doing a connection as direct as possible.
My experience has taught me that these combinations do work well:
Interbase or Firebird with FIBPlus, AnyDAC, IBO or IBX*
Oracle with AnyDAC, DOA or ODAC
Microsoft SQL Server with ADO
IBX does not like Firebird very much.
Hope this gives you some insight in the possibilities and limitations of supporting multiple DBMS from your Delphi applications.
--jeroen
General rule: every layer of components will possibly add an additional layer of bugs. Both ADO and DBX are component wrappers around standard database functionality, thus they're both equally strong.
So the proper choice should be based on other factors, like the databases that you want to use. If you want to connect to MS-Access or SQL Server, ADO would be the better choice since it's more native for these databases. But Firebird and Oracle are more native for the DBX components.
I personally tend to use the raw ADO API's, though. Then again, I don't use data-aware components in my projects. It's less RAD, I know. But I often need to work this way because I generally write client/server applications with several layers between the database and the GUI, thus making things more complicated.
My two cents: DBX is significantly faster (on both oracle and sql), and significantly more finicky and harder to deploy.
If performance is a factor, I'd go with DBX. Otherwise, I'd just use ADO for simplicity's sake.
As others have said, DBX may have the edge in raw performance in certain cases or under specific circumstances, but ADO is the basis for a very larger number of applications in the world so although performance of ADO may be relatively poorer, clearly that does not mean "unacceptably" poor.
For myself, and informed by major projects I have worked on, the biggest "problem" with DBX is that no matter how good it may be, it is a key infrastructure technology provided by a language/tools company.
Anyone that built applications on the previous BDE technology will testify to the disruption caused when that technology is deprecated and no longer supported. Whilst no technology is immune from deprecation by it's provider, ADO clearly has the edge when it comes to industry support beyond the technology provider themselves.
For that reason I myself now always use ADO. Just changing the connection string isn't always the only thing to worry about when changing from one database type to another however. Stored procedure call syntax can vary from one ADO provider to another, and you still have to watch the SQL syntax you use if you intend deploying against multiple different SQL engines, where the SQL support may vary from on to another.
To mitigate these issues I use my own encapsulation of the ADO object model. This encapsulation does not attempt to mutate the object model into something that doesn't resemble ADO, it simply exposes those parts of ADO that I need to use directly in a more ObjectPascal friendly (and "type" safe) form (e.g enum types and sets for constants and flags etc, rather than just scores if not hundreds of integer constants).
My encapsulation also takes care of some of the minor variations in different provider behaviours/requirements, such as the previously mentioned differences in stored procedure call syntax.
I should say also that similar to another poster, I too long ago stopped used "data aware controls", which opens up this approach. If you need or wish to use data aware controls and wish to use ADO, then you cannot use ADO directly and must instead find some encapsulation that exposes ADO thru the VCL dataset model.
ADO is Microsoft world
DBX was created at the beginning (Delphi 6) for cross platform and Kylix

Which embedded database capable of 100 million records has an efficient C or C++ API

I'm looking for a cross-platform database engine that can handle databases up hundreds of millions of records without severe degradation in query performance. It needs to have a C or C++ API which will allow easy, fast construction of records and parsing returned data.
Highly discouraged are products where data has to be translated to and from strings just to get it into the database. The technical users storing things like IP addresses don't want or need this overhead. This is a very important criteria so if you're going to refer to products, please be explicit about how they offer such a direct API. Not wishing to be rude, but I can use Google - please assume I've found most mainstream products and I'm asking because it's often hard to work out just what direct API they offer, rather than just a C wrapper around SQL.
It does not need to be an RDBMS - a simple ISAM record-oriented approach would be sufficient.
Whilst the primary need is for a single-user database, expansion to some kind of shared file or server operations is likely for future use.
Access to source code, either open source or via licensing, is highly desirable if the database comes from a small company. It must not be GPL or LGPL.
you might consider C-Tree by FairCom - tell 'em I sent you ;-)
i'm the author of hamsterdb.
tokyo cabinet and berkeleydb should work fine. hamsterdb definitely will work. It's a plain C API, open source, platform independent, very fast and tested with databases up to several hundreds of GB and hundreds of million items.
If you are willing to evaluate and need support then drop me a mail (contact form on hamsterdb.com) - i will help as good as i can!
bye
Christoph
You didn't mention what platform you are on, but if Windows only is OK, take a look at the Extensible Storage Engine (previously known as Jet Blue), the embedded ISAM table engine included in Windows 2000 and later. It's used for Active Directory, Exchange, and other internal components, optimized for a small number of large tables.
It has a C interface and supports binary data types natively. It supports indexes, transactions and uses a log to ensure atomicity and durability. There is no query language; you have to work with the tables and indexes directly yourself.
ESE doesn't like to open files over a network, and doesn't support sharing a database through file sharing. You're going to be hard pressed to find any database engine that supports sharing through file sharing. The Access Jet database engine (AKA Jet Red, totally separate code base) is the only one I know of, and it's notorious for corrupting files over the network, especially if they're large (>100 MB).
Whatever engine you use, you'll most likely have to implement the shared usage functions yourself in your own network server process or use a discrete database engine.
For anyone finding this page a few years later, I'm now using LevelDB with some scaffolding on top to add the multiple indexing necessary. In particular, it's a nice fit for embedded databases on iOS. I ended up writing a book about it! (Getting Started with LevelDB, from Packt in late 2013).
One option could be Firebird. It offers both a server based product, as well as an embedded product.
It is also open source and there are a large number of providers for all types of languages.
I believe what you are looking for is BerkeleyDB:
http://www.oracle.com/technology/products/berkeley-db/db/index.html
Never mind that it's Oracle, the license is free, and it's open-source -- the only catch is that if you redistribute your software that uses BerkeleyDB, you must make your source available as well -- or buy a license.
It does not provide SQL support, but rather direct lookups (via b-tree or hash-table structure, whichever makes more sense for your needs). It's extremely reliable, fast, ACID, has built-in replication support, and so on.
Here is a small quote from the page I refer to above, that lists a few features:
Data Storage
Berkeley DB stores data quickly and
easily without the overhead found in
other databases. Berkeley DB is a C
library that runs in the same process
as your application, avoiding the
interprocess communication delays of
using a remote database server. Shared
caches keep the most active data in
memory, avoiding costly disk access.
Local, in-process data storage
Schema-neutral, application native data format
Indexed and sequential retrieval (Btree, Queue, Recno, Hash)
Multiple processes per application and multiple threads per process
Fine grained and configurable locking for highly concurrent systems
Multi-version concurrency control (MVCC)
Support for secondary indexes
In-memory, on disk or both
Online Btree compaction
Online Btree disk space reclamation
Online abandoned lock removal
On disk data encryption (AES)
Records up to 4GB and tables up to 256TB
Update: Just ran across this project and thought of the question you posted:
http://tokyocabinet.sourceforge.net/index.html . It is under LGPL, so not compatible with your restrictions, but an interesting project to check out, nonetheless.
SQLite would meet those criteria, except for the eventual shared file scenario in the future (and actually it could probably do that to if the network file system implements file locks correctly).
Many good solutions (such as SQLite) have been mentioned. Let me add two, since you don't require SQL:
HamsterDB fast, simple to use, can store arbitrary binary data. No provision for shared databases.
Glib HashTable module seems quite interesting too and is very
common so you won't risk going into a dead end. On the other end,
I'm not sure there is and easy way to store the database on the
disk, it's mostly for in-memory stuff
I've tested both on multi-million records projects.
As you are familiar with Fairtree, then you are probably also familiar with Raima RDM.
It went open source a few years ago, then dbstar claimed that they had somehow acquired the copyright. This seems debatable though. From reading the original Raima license, this does not seem possible. Of course it is possible to stay with the original code release. It is rather rare, but I have a copy archived away.
SQLite tends to be the first option. It doesn't store data as strings but I think you have to build a SQL command to do the insertion and that command will have some string building.
BerkeleyDB is a well engineered product if you don't need a relationDB. I have no idea what Oracle charges for it and if you would need a license for your application.
Personally I would consider why you have some of your requirements . Have you done testing to verify the requirement that you need to do direct insertion into the database? Seems like you could take a couple of hours to write up a wrapper that converts from whatever API you want to SQL and then see if SQLite, MySql,... meet your speed requirements.
There used to be a product called b-trieve but I'm not sure if source code was included. I think it has been discontinued. The only database engine I know of with an ISAM orientation is c-tree.

Resources