I don't have experience in database development, so I need your suggestions in choosing of a database that can be used in Firemonkey.
I need to store html files (without media now, but they can be with), their total size is around 20 GB (uncompressed text). A main feature must be maximally fast searching of text in the database, and it must be possible to implement human searching (like google). Plus, there can be compression (20 GB is to much to store. If compression makes searching slow it's not required).
What kind of databases are appropriate for my concern?
Thanks a lot for your suggestions!
Edited
Requirements:
Price: Free
Location: local or remote
Operating system support: Windows
System requirements: a database with a large footprint
(hopefully in exchange of better performances)
Performances: fast text searching
Concurrent users: 20
Full text indexing and searching: human (Google-like) fast
text searching is required
Manageability: doesn't matter much
I know an on-line web legal database that can search words through 100 GB of information in milliseconds. I need the same performance, and Google-like searching is required.
Delphi database access layer is separated from FireMonkey, it's the same used by VCL (although FM AFAIK relies only on LiveBindings to access data, but that's not an issue in your case).
Today 20 GB are really not much data. Almost any database will handle them without much effort if properly configured. What engine to choose depends on:
Price: how much are you going to spend for it?
Location: do you need a local database (same machine) or a remote one (LAN or WAN)?
Operating system support: which OS should it run on?
System requirements: do you need a database with a small footprint, or you can use one with a larger one (hopefully in exchange of better performances)?
Performances: what are the required performances?
Concurrent users: how much user will connect to the database concurrently?
Full text indexing and searching: not all databases offer it out of the box
Manageability: some databases may require more management than others.
There is no "one database fits all" yet.
I'm no DBA so I can't say directly, and honestly I'm not sure that any one person could give a direct answer to this question as it's one of those it just depends scenarios.
http://en.wikipedia.org/wiki/Comparison_of_relational_database_management_systems
That's a good starting point to compare features and platform compatibility. I think the major thing to consider here is what hardware will be running it and how can you best utilize that to accomplish the task at hand.
If you have a server farm being sure your DB supports distribution and some sort of load balancing (most do to some degree from what I understand).
To speed up searching unless you code up a custom algorithm that searches the compressed version somehow I think you're going to want to keep the data un-compressed. Searching the compressed data actually might be faster. If you're able to use the index for the compressed file to compare against your plain text search parameters then are just looking for those keys that were matched within the index. If any are found in the index check for them within the compressed data. Without tons of custom code I haven't heard of any DB that supports this idea of searching compressed text (though I could easily be wrong on this point).
If the entire data set needs to be decompressed before doing the search it will very likely be much slower (memory is relatively cheap compared to CPU time). It looks like Firemonkey has a limited selection of DBs to use so that will help to narrow your choices down as well.
What I would suggest based on your edited question, is to write (or find) a parser or regular expression to extract all the important elements from the HTML that you would like to be searchable. Then store those in a database along with a reference for where they were found in the HTML. In terms of Google like searching, if you mean in terms of how it can correct misspellings and use synonyms, you probably need some sort of custom code to do dictionary look ups for spelling and thesaurus look ups for synonyms. I believe full text searching in any modern DB will handle the need to query with LIKE or similar statements in the where clause.
Looks like ldsandon's answer covers most of this anyhow. TLDR; if not thanks for reading.
I would recommend PostgreSQL for this task. It has good performance, and built in full text search capability for Google-like searching. And it's free and open source.
Unfortunately Delphi doesn't come with Postgres data access components out of the box. You can connect by ODBC, or you can purchase components available from, for example, Devart, DA-Soft or microOLAP.
Have you considered NoSQL databases? The Wikipedia article explains their differences to SQL databases and also mentions that they are suited as document store.
http://en.wikipedia.org/wiki/NoSQL
The article lists around twelve implementations in the document store category, many are open source. (Jackrabbit, CouchDB, MongoDB).
This question on Stackoverflow contains some pointers to Delphi clients:
Delphi and NoSQL
I would also consider caching on the application server, to speed up search. And of course a text indexing solution like Apache Lucene.
I would take Microsoft SQL Server Express Edition. I think 2008 R2 is latest stable version but there is also Denali (2011). It match all criterien you have.
You can use ADO to work with.
Try the Advantage Database Server.
It's easy to manage and configure.
Both dbase-like and SQL data management languages.
Fast indexed full text search capabilities.
Plus, unparalled support from the developers themselves.
The local server (stand-alone version, as opposed to the network based server) is free.
devzone.advantagedatabase.com
There is a Firebird version with full text search according to its documentation - http://www.red-soft.biz/en/document_21 - it uses Apache Lucene, a popular search engine
I need suggestion regarding maximum size for a db lotus notes highly volatil, i.e. an application based on a db of 8+ Gb accessed by 20 users in average inserting attachments and running scripts.
tks !!
There are limits to the size of a Notes Database (sorry Ken). See the Notes Help "Table of Notes and Domino known limits" and Technote #1308379.
The most important ones are:
Database size: The maximum OS file size limit -- (up to 64GB)
Fields in a database: ~ 3000 (limited to ~ 64K total length for all field names). You can enable the database property "Allow more fields in database" to get up to 22,893 uniquely-named fields in the database.
Views in a database: No limit; however, as the number of views increases, the length of time to display other views also increases
Documents in a view: Up to the maximum size of the database
Ususally the "limiting" factors for an application are view rebuild and full text index times, as Ken suggested.
You may want to checkout Andre Guirards postings on the topic of performance as well as his white paper Performance basics for IBM Lotus Notes developers and the Domino Wiki.
I'm not sure if this answers your question, but there's theoretically no limit to the size of a Notes database. Years ago I remember hearing at Lotusphere they had tested a database at 64GBs and it worked.
That said, there will likely be some issues with view indexes growing large, and long waits for refreshing views.
In the link below you can find the limitations that concern the Lotus Notes Databases as they come from IBM and stated also from leyrer.
Limits of Lotus Notes
However, in our company we are working heavily with Lotus Notes databases and our databases our growing very fast mostly due to attachments that can be documents, spreadsheets, images, etc. The solution that we implemented in order to avoid reaching the size limits was to have normally the application and attach to it databases for the attachments. In this way the attachments are not stored to the main application, but to the attached dbs. You can create as many as you can. For example you can have the following:
MyApp.nsf - Main application
MyAppAttach1.nsf - Attached
MyAppAttach2.nsf - Attached
In this way we can also control how much the dbs will grow. I hope that this can help you in your implementation.
For large databases, it may be important to think about a strategy for archiving documents once they are no longer being actively processed. You don't mention how many documents are created/edited/deleted every day, or how large the average document is; but if it is 8GB now, how large will it be next month, or next year? Depending on the factors martin listed in his answer, this could become a concern long before you reach the 64 GB supported limit, and it is better to be prepared in advance.
First suggestion is, create an archive database and reduce the size of the main database. In the next step you should write some process in order to archive the documents weekly or monthly according to your needs.
I am creating a desktop app in Delphi and plan to use an embedded database. I've started the project using SQlite3 with the DISQLite3 library. It works but documentation seems a bit light. I recently found Firebird (yes I've been out of Windows for a while) and it seems to have some compelling features and support.
What are some pros and cons of each embedded db? Size is important as well as support and resources. What have you used and why?
I'm using Firebird 2.1 Embedded and I'm quite happy with it.I like the fact that the database size is practically unlimited (tested with > 4 GB databases and it works) and that the database file is compatible with the Firebird Server so I can use standard tools for database management and inspection. Distribution consists of dropping few files in your exe folder.
Simultaneous access from multiple programs is not supported but simultaneous access from multiple threads is (as long as you ensure that only one 'connect' operation is in progress at any given moment).
I have used SQlite3 for a lot of projects (but from C/C++ and Objective-C). It's extremely small -- no dependencies whatsoever -- database is in a single file.
It's the db of choice for Mac developers because it's directly supported by CoreData and on the iPhone -- so there is a big user base (not to mention all of the other users).
I've been using SQLite (via DISQLite3) in FeedDemon for several months, and I highly recommend it - it has been extremely fast and stable. As Javier said, the docs for the library may be thin, but the docs for SQLite itself are very good.
I've used DBISAM on a number of projects. It is completely embedded without even a need for an external DLL. Unlike the others you listed it is commercial. A lot of great features though and very well documented and supported. The have a successor to it that I haven't tried yet though.
Let's see, quick comparison:
SQLite:
dynamic typing in the database
cross-platform files
runs on Windows, Linux, Mac, etc.
public domain
supports transactions
relies on file system security, does not include own security
Firebird embedded:
strong typing in the database
not all SQL datatypes are supported
cross-platform files
Firebird embedded only runs on Windows
Files from Firebird embedded are in the same format as the full server version
Files from Firebird embedded can be copied to a non-Windows server for use
available under a modified MPL ("what's ours is ours and must remain free, what's yours is yours and you don't have to release it")
supports transactions, triggers, etc.
MySQL embedded:
support for SQL features depends on file format
(IIRC) cross-platform files
GPL unless you pay royalties
runs on Windows, Linux, Mac
incredibly popular with the open source crowd
Even embedded databases have their strengths and weaknesses. You'll need to weigh those strengths and weaknesses against what you're doing to decide.
Firebird embedded is our #1 choice because with no code changes, a single user Delphi app with embedded database can be migrated to a multi-user server based deployment without sacrificing any of the high end features (such as stored procedures, triggers, views, etc.). And its a TRUE free database and doesn't GPL your code in the process.
Strongly recommend to use AnyDAC when working with Databases and Delphi - then you can choose to target FB or SQLite seamlessingly.
My preference would be for FB for embedded apps.
Tom
I use Sybase's Advantage Database Server, but I'm also the R&D Manager, so this post is biased. :)
We have native Delphi TTable and TQuery components for both WIN32 VCL and VCL.NET. Direct table access in addition to SQL support makes Advantage unique among many of the other Delphi offerings. Advantage supports large tables (only limited by the number of records, 2 billion) and has a free local engine, which is nice for development PCs and for small customer sites that don't require client/server functionality. Switch to client/server with a single connection property, no other changes.
We have a ton of clients so accessing the data outside of Delphi is also very easy (.NET data provider, ODBC, OLE DB, PHP, Perl, JDBC, etc).
Main Product Web Site: http://www.advantagedatabase.com
Developer's Web Site: http://devzone.advantagedatabase.com
It really depends what you need. For single-user applications, Firebird Embedded or SQLite are probably best choices (and price is right). On the other end, if you need support for large number of multiple users, you should probably use regular Firebird instead of Embedded version (server is simple to install so you won't have much problems here).
And if you need something in between, for a moderate multi-user application, one of flat databases would be better. I found that ComponentAce's Absolute Database better choice for my needs than DBISAM, NexusDB or VistaDB.
It leaves relatively small footprint (no DLLs), it's a single-file db (a must for me), supports Unicode, BLOB compression, crypting, and technical limits seem impressing for a flat database. Moreover, support was good in few occasions when I needed it.
For cons, I have noticed it doesn't support nested transactions, but other than that, I had no problems.
As for size, nothing beats SQLite.
when you refer about lack of documentation, i guess it's doc for DISQLite3. The SQLite docs are quite complete
Take a look at NexusDB. Have used very successfully in the past.
The problem with (embedded) firebird is, that the database cannot reside on a network drive. Also, it is difficult to have a database on a read only drive (CD/DVD).
For some hacks around these limitations see the Delphi Wiki:
http://delphi.wikia.com/wiki/Firebird_tipps
NexusDB offers the full range from embedded, to full client/server / remote. Also SQL2003 compliant, I believe. I'm using it on a few projects, and am very pleased so far, and the fact that it can work in such a wide range of "scales" is a big plus (not having to learn another DB for scaled-up apps, etc).
Look at this embedded database comparison: http://sql-db.cz.cc/, it can be helpful. Most of abovementioned products are presented there: Advantage, DBISAM, Firebird, MS SQL Server, and much more: Accuracer, Apollo, ElevateDB, NexusDB, TurboDB.
I am partial to Component Ace's Absolute DB. Although a commercial product ($), it is solid, easy to use, small footprint and well documented. If you are looking for a huge multi-user application, this is not the way to go, but if your multi-user needs are light (or non-existent) this is a solid option.
I'm using SQL Server Express and the ADO components. Works great. You can run the SQL Server Express install with commandline to hide the complexities from the users. You can also distribute a database that you load by filename. There are millions of SQL server users so solutions to any problems are easily found in the intertubes :-)
I did a websearch to find a fast database package for my Delphi Application. I wanted it to be completely contained in the executable with no external DLLs or libraries required. I originally found Accuracer by AidAim. They had posted how fast their database was and even gave comparisons with other similar packages to “prove” their point.
I wanted to believe their claims but I thought I’d search the web a bit more to find timings of other packages. I was very surprised to find a post at the Delphi discussion forums where a person asked what database to use, and there were 14 different suggestions. One of the responders had done his own timing comparisons and had found Accuracer to be quite slow compared to several others, which Accuracer had (conveniently) left out of their own comparison page.
The post, plus additional followup web research by me, led me to lean toward DISQLite3, a product based on the Open Source SQLite program, but with enhancements to work in Delphi very quickly, with very small overhead, and with command-based calls - which I like. It is actively under development and will soon have an official Delphi 2009 version, although apparently the current version will work under D2009.
Addenum: DISQLite3 Version 2.0.0, released Nov 17, supports D2009.
I know MS access is a comparatively crap db (and expect to be shot down in flames here), but if only small data is needed it may have advantages if ms office is used anyway. For me it was a way to store program data with more flexibility than csv files which is a common approach for scientific code.
You can create an access db from delphi code without having ms office installed using ado & odbc driver (might be necesary to have an initial .accdb file without tables to copy from then populate, I can't remember this detail. not sure licensing situation doing this.
The .accdb extension can be changed to something else & the file password protected (to a limited degree) so its not immediately obvious to users its access if that's desired.
I know a few commercial developers do this method & copied it myself. Found it easier to setup than sqlite, but maybe because I'd already used ado & access in the past.
I have used ScimoreDB. It has its quirks as they give it royalty free and it has its quirks in data types and with some installation issues. This was on a C# project.
If embedded is an absolute must, look at DBISAM.
kbMemTable is a good candidate. Runs in memory, fast, multi-threadding. Used to be free.
Components4Developers
I have used DBISAM and kbMemTable on different occasions.
What I like about DBISAM is that it has great features, and is usually very reliable. I have used it in large databases, full-text search, read-only mode, CGIs and many other situations.
It is fairly large compared to kbMemTable or SQLite based components, though. And you can't have a single file per database (or even table) - depending on the situation, that is a major disadvantage.
kbMemTable is tiny and it's great for small amounts of data. Since it runs in memory, it has to be a small amount of data, of course.
One other option I've taken on a couple of my desktop apps is dumping the data directly from/to my object hierarchy using TWriter/TReader. This is by far that smallest option, and is absurdly fast compared to using a database. The data files are tiny, too.
It has all kinds of drawbacks, though - you have to code versioning in if you might want to ever add/change fields, unless it's in-memory it is even more complicated, no multi-user support at all, etc.
Firebird embedded is our #1 choice as well. And the suite Unified Interbase v2.0 with it. A great and stable solution!
I have a database that I have to record 5 field data for every 20 sec for 10 days.. 3 field are integer , 1 field is double ( time ) and 1 field is string[5].
I am still using Delphi6 srv2 because of my components. Newer delphi versions are terrible at components that I have to spend thousands of dollars of money to rebuild my component library. Therefor delphi 6 is still best for real commertial applications that never version of delphis give many problems. At many points such as USB or comport readings so on... they release newer ones before previous versions never sit on market.
I have setup a code with Delphi6 what appends 43200 records at a table for test because I will deploy the table in application while it has 43200 records. I will shown all the data on DBChart.
Test result is below databases filled the tables by insert command with 43200 records
Dbisam = 34 sec,
ElevateDb = 11 sec,
AbsoluteDB = 45 sec,
SQLlite = 32 Minute,
Firebird = 12 min,
MSSQL12 localDB = 28 Minute,
Easy table = 8 minute,
BDE = Blocked ,
I havent tested oracle , blackfish , sysbase, nexsusDb etc.. but it seems they will also very slow. I have connected with DBChart and only elevateDb and absoluteDB has loaded 43200 records on DBchart in exceptable time such as 7~10 secs. Other all taken minutes. So slower databases always needs coding tricks to succeed in some real jobs..
I have tested their search speed as well by locate command that unfortunatly the server based databases are always slower in.
MSSQL and SQLLite3 are extremely difficult to manage in to delphi that they made me very tired.
These are my test results
At the end I decided to use AbsoluteDB, Dbisam and Elevate. I have thrown the rest off the PC .
Elevate software doesnt support recno function that requires extra codes at runtime to manage. This makes the database slower Other bug is with Elevate software is autoinc fields. There is no way to reset it . Therefore I have not chosen the Elevat software even it is the fastest database. They say many good functions but how many of them we use it in fact . They just left the most important functions not supported but fixed many many unnecessary functions. and it seems since 8 years there is no any advantage either.
If you want to see with your own eye pls just try and see..
I am thinking between two now absolute DB or DBisam4
Firebird all the way. Does pretty well everything and so far version 2.1 is very solid.
FireBird offers the opportunity to scale up to multi-users sometime down the line, or if you need concurrency (if your application goes multi-threaded).
SQLite is quite unrivaled if you only need single-user access, no other database comes close to it on any aspect, be it performance, convenience, SQL support or stability.
Firebird is really awsome and has a small footprint so you can use embedded
and it can be scaled upward for many users
and does unicode faily well
I use devart components with delphi 2009
and FIB plus for delphi 6/7 (their version for 2009 and unicode is not ready yet too bad)
Hmmm, no one has recommended the BDE - I wonder why that is ;-)
BlackFishSQL is another possibility, although I haven't tested in depth as yet.
when it comes to embedded databases the first question is : is it multiuser ?
Actually,who needs a database that does not allow multiple connections (read&write) to it ?
I have tried (intensly) all mentioned databases and found only one that actually functions the way it should. And that is Accuracer.
The only pity with accuracer is that its a three man band and chronic lack of proper support. It also is mainly static in development as we have seen no real features in years.Not surprising since only one person actually develops it. It seems they are living on old fame. Users praise reflect that (usually 10 years old comments).
For a single user experience I would recommend Absolute Database.
As for major players I would recommend SQL Server from Microsoft. Oracle has become a bloatware and is slowly dying out.
ps
what is nice in accuracer is that their embedded database functions just like full blown server. It locks only current record if its in use while the rest functions normally. Nice database. Pity only it is stagnant.