I need some help choosing databases for my application.
My web application will basically consist of a main table. lets call it the "User" table.
it will have the user info like name, id, password, address, phone etc.
There will be 5 other related tables where i will save each user's info.
eg. Table for books read, Table for songs heard, Food eaten etc.
Overall i dont expect my data to go beyond 1,000 users.
So, i have got tiny data requirements.
Generally i would have gone with mysql, but i am feeling a bit adventurous.
I want to try out some of the new solutions on the block.
my requirements are:
1. pure performance
2. good documentation, ease of use
since my db size shouldn't be more than a few hundreds megs in size, i'd rather the entire tablespace in the memory itself for faster performance. How about some of the new NoSQL DBs.
any recommendations? I have worked mainly on oracle and MySQl and don't have much idea of all the new exciting stuff out there.
I would suggest to go with sqlite if your database requirement is small.
From sqlite website:
SQLite is a compact library. With all features enabled, the library
size can be less than 350KiB, depending on the target platform and
compiler optimization settings. (64-bit code is larger. And some
compiler optimizations such as aggressive function inlining and loop
unrolling can cause the object code to be much larger.) If optional
features are omitted, the size of the SQLite library can be reduced
below 200KiB. SQLite can also be made to run in minimal stack space
(4KiB) and very little heap (100KiB), making SQLite a popular database
engine choice on memory constrained gadgets such as cellphones, PDAs,
and MP3 players. There is a tradeoff between memory usage and speed.
SQLite generally runs faster the more memory you give it.
Nevertheless, performance is usually quite good even in low-memory
environments.
Object oriented dbs can be used like db4o or versant.
Neo4j (for Java) is a pretty awesome tool. It's technically a graph database, but by the sounds of your data model, I think it would be well-suited for you. From what I've seen it performs very well, its documentation was just incredibly good, and if you are using Java then it's like second nature. You basically point it at a directory and it sets up shop there.
If you are feeling adventurous and happen to be using Java, I suggest you give it a try.
I think redis is exactly what you want!
Yesterday I downloaded and installed it for the first time. It runs completely in memory and that meets your performance requirement. (It only writes the data to disk for cases like power failure, like a backup, but this does not slow down the writes to it.)
For linux and such there is tar.gz on the download page.
For windows you can download Dusan's native port: http://redis.io/download - it is precompiled and also has the client console to try out.
The documentation is very good, for example this is the page for the data types: http://redis.io/topics/data-types and you also find all the other relevant information as a fast to browse reference there.
And there is a nice online tutorial to get started quickly: http://try.redis-db.com/ which is actually fun to work through.
I like the atomic operations like "increment by" and the list stuctures with push and pop.
There is also a hash type.
For python there is redis-py: https://github.com/andymccurdy/redis-py
Me myself being a python coder I think the data structures that redis offers do very good match the python datatypes.
I don't have experience in database development, so I need your suggestions in choosing of a database that can be used in Firemonkey.
I need to store html files (without media now, but they can be with), their total size is around 20 GB (uncompressed text). A main feature must be maximally fast searching of text in the database, and it must be possible to implement human searching (like google). Plus, there can be compression (20 GB is to much to store. If compression makes searching slow it's not required).
What kind of databases are appropriate for my concern?
Thanks a lot for your suggestions!
Edited
Requirements:
Price: Free
Location: local or remote
Operating system support: Windows
System requirements: a database with a large footprint
(hopefully in exchange of better performances)
Performances: fast text searching
Concurrent users: 20
Full text indexing and searching: human (Google-like) fast
text searching is required
Manageability: doesn't matter much
I know an on-line web legal database that can search words through 100 GB of information in milliseconds. I need the same performance, and Google-like searching is required.
Delphi database access layer is separated from FireMonkey, it's the same used by VCL (although FM AFAIK relies only on LiveBindings to access data, but that's not an issue in your case).
Today 20 GB are really not much data. Almost any database will handle them without much effort if properly configured. What engine to choose depends on:
Price: how much are you going to spend for it?
Location: do you need a local database (same machine) or a remote one (LAN or WAN)?
Operating system support: which OS should it run on?
System requirements: do you need a database with a small footprint, or you can use one with a larger one (hopefully in exchange of better performances)?
Performances: what are the required performances?
Concurrent users: how much user will connect to the database concurrently?
Full text indexing and searching: not all databases offer it out of the box
Manageability: some databases may require more management than others.
There is no "one database fits all" yet.
I'm no DBA so I can't say directly, and honestly I'm not sure that any one person could give a direct answer to this question as it's one of those it just depends scenarios.
http://en.wikipedia.org/wiki/Comparison_of_relational_database_management_systems
That's a good starting point to compare features and platform compatibility. I think the major thing to consider here is what hardware will be running it and how can you best utilize that to accomplish the task at hand.
If you have a server farm being sure your DB supports distribution and some sort of load balancing (most do to some degree from what I understand).
To speed up searching unless you code up a custom algorithm that searches the compressed version somehow I think you're going to want to keep the data un-compressed. Searching the compressed data actually might be faster. If you're able to use the index for the compressed file to compare against your plain text search parameters then are just looking for those keys that were matched within the index. If any are found in the index check for them within the compressed data. Without tons of custom code I haven't heard of any DB that supports this idea of searching compressed text (though I could easily be wrong on this point).
If the entire data set needs to be decompressed before doing the search it will very likely be much slower (memory is relatively cheap compared to CPU time). It looks like Firemonkey has a limited selection of DBs to use so that will help to narrow your choices down as well.
What I would suggest based on your edited question, is to write (or find) a parser or regular expression to extract all the important elements from the HTML that you would like to be searchable. Then store those in a database along with a reference for where they were found in the HTML. In terms of Google like searching, if you mean in terms of how it can correct misspellings and use synonyms, you probably need some sort of custom code to do dictionary look ups for spelling and thesaurus look ups for synonyms. I believe full text searching in any modern DB will handle the need to query with LIKE or similar statements in the where clause.
Looks like ldsandon's answer covers most of this anyhow. TLDR; if not thanks for reading.
I would recommend PostgreSQL for this task. It has good performance, and built in full text search capability for Google-like searching. And it's free and open source.
Unfortunately Delphi doesn't come with Postgres data access components out of the box. You can connect by ODBC, or you can purchase components available from, for example, Devart, DA-Soft or microOLAP.
Have you considered NoSQL databases? The Wikipedia article explains their differences to SQL databases and also mentions that they are suited as document store.
http://en.wikipedia.org/wiki/NoSQL
The article lists around twelve implementations in the document store category, many are open source. (Jackrabbit, CouchDB, MongoDB).
This question on Stackoverflow contains some pointers to Delphi clients:
Delphi and NoSQL
I would also consider caching on the application server, to speed up search. And of course a text indexing solution like Apache Lucene.
I would take Microsoft SQL Server Express Edition. I think 2008 R2 is latest stable version but there is also Denali (2011). It match all criterien you have.
You can use ADO to work with.
Try the Advantage Database Server.
It's easy to manage and configure.
Both dbase-like and SQL data management languages.
Fast indexed full text search capabilities.
Plus, unparalled support from the developers themselves.
The local server (stand-alone version, as opposed to the network based server) is free.
devzone.advantagedatabase.com
There is a Firebird version with full text search according to its documentation - http://www.red-soft.biz/en/document_21 - it uses Apache Lucene, a popular search engine
I am about to write a program to keep track of my school assignments and I was wondering what database language would be the most efficient and simple to implement to track the meta-data of the assignments? I am thinking about XML, but it would require several documents.
I (currently) have at least ten assignments per week for 45 weeks. The data that has to be stored includes name, issue date, due date, path, and various states of completion. What ever language it's in would have to be able to take a large increase in both the number of assignments and the amount of meta-data without having to make large changes in either the format or the retrieval system.
Quite frankly, if you pick a full-fledged database you run the risk of spending more time on data entry than you do on your homework. If you really need to keep track of this, I would seriously recommend a spreadsheet.
First, I think you are confusing a relational database system with a database language. In all likelihood, you will be using a database that uses SQL. From there, you will need to another programming platform to build an application around. If you wanted, you could use an Microsoft Access database that allows you to build a simple front-end that is stored in the same file as the database. In this case you would be programming with VBA.
Pretty much any more database system would be suitable for your needs, even Access handle orders of magnitude more work than you are describing.
Some possible database systems are, again, Microsoft Access, Microsoft SQL Server Express, VistaDB, SQLite (probably the best choice after access for your needs), and of course there are many others.
You could either build a web front end or a desktop; I assume you are using Windows. You could use Visual Studio C# Express for this if you wanted. Or you could go with VB.NET, VB6, or what have you.
My answer isn't directly related, but as you are designing your database structures you might want to take at some of the the objects in the SIF specification in particular look at the Assignment and GradingAssignment objects.
As for how to store the data, you could use a rdbms (sqlite, mysql) or perhaps key-value database (zodb, link).
Of course, if this is just a small personal project you could just serialize the data to something like xml, json, csv or whatever and storing it as a file. It might be better in the long run to use a database though. A database format will probably scale a lot easier.
I would recommend Oracle Express (With Application Express) It will scale up to 4gb of user data. Beyond that, you would have to start paying. Application Express is very simple and build CRUD applications for, which is what is sounds like yours is.
For a project like that I would use Sqlite or Mysql, it's be fast enough. Plus it's easy to setup.
I am creating a desktop app in Delphi and plan to use an embedded database. I've started the project using SQlite3 with the DISQLite3 library. It works but documentation seems a bit light. I recently found Firebird (yes I've been out of Windows for a while) and it seems to have some compelling features and support.
What are some pros and cons of each embedded db? Size is important as well as support and resources. What have you used and why?
I'm using Firebird 2.1 Embedded and I'm quite happy with it.I like the fact that the database size is practically unlimited (tested with > 4 GB databases and it works) and that the database file is compatible with the Firebird Server so I can use standard tools for database management and inspection. Distribution consists of dropping few files in your exe folder.
Simultaneous access from multiple programs is not supported but simultaneous access from multiple threads is (as long as you ensure that only one 'connect' operation is in progress at any given moment).
I have used SQlite3 for a lot of projects (but from C/C++ and Objective-C). It's extremely small -- no dependencies whatsoever -- database is in a single file.
It's the db of choice for Mac developers because it's directly supported by CoreData and on the iPhone -- so there is a big user base (not to mention all of the other users).
I've been using SQLite (via DISQLite3) in FeedDemon for several months, and I highly recommend it - it has been extremely fast and stable. As Javier said, the docs for the library may be thin, but the docs for SQLite itself are very good.
I've used DBISAM on a number of projects. It is completely embedded without even a need for an external DLL. Unlike the others you listed it is commercial. A lot of great features though and very well documented and supported. The have a successor to it that I haven't tried yet though.
Let's see, quick comparison:
SQLite:
dynamic typing in the database
cross-platform files
runs on Windows, Linux, Mac, etc.
public domain
supports transactions
relies on file system security, does not include own security
Firebird embedded:
strong typing in the database
not all SQL datatypes are supported
cross-platform files
Firebird embedded only runs on Windows
Files from Firebird embedded are in the same format as the full server version
Files from Firebird embedded can be copied to a non-Windows server for use
available under a modified MPL ("what's ours is ours and must remain free, what's yours is yours and you don't have to release it")
supports transactions, triggers, etc.
MySQL embedded:
support for SQL features depends on file format
(IIRC) cross-platform files
GPL unless you pay royalties
runs on Windows, Linux, Mac
incredibly popular with the open source crowd
Even embedded databases have their strengths and weaknesses. You'll need to weigh those strengths and weaknesses against what you're doing to decide.
Firebird embedded is our #1 choice because with no code changes, a single user Delphi app with embedded database can be migrated to a multi-user server based deployment without sacrificing any of the high end features (such as stored procedures, triggers, views, etc.). And its a TRUE free database and doesn't GPL your code in the process.
Strongly recommend to use AnyDAC when working with Databases and Delphi - then you can choose to target FB or SQLite seamlessingly.
My preference would be for FB for embedded apps.
Tom
I use Sybase's Advantage Database Server, but I'm also the R&D Manager, so this post is biased. :)
We have native Delphi TTable and TQuery components for both WIN32 VCL and VCL.NET. Direct table access in addition to SQL support makes Advantage unique among many of the other Delphi offerings. Advantage supports large tables (only limited by the number of records, 2 billion) and has a free local engine, which is nice for development PCs and for small customer sites that don't require client/server functionality. Switch to client/server with a single connection property, no other changes.
We have a ton of clients so accessing the data outside of Delphi is also very easy (.NET data provider, ODBC, OLE DB, PHP, Perl, JDBC, etc).
Main Product Web Site: http://www.advantagedatabase.com
Developer's Web Site: http://devzone.advantagedatabase.com
It really depends what you need. For single-user applications, Firebird Embedded or SQLite are probably best choices (and price is right). On the other end, if you need support for large number of multiple users, you should probably use regular Firebird instead of Embedded version (server is simple to install so you won't have much problems here).
And if you need something in between, for a moderate multi-user application, one of flat databases would be better. I found that ComponentAce's Absolute Database better choice for my needs than DBISAM, NexusDB or VistaDB.
It leaves relatively small footprint (no DLLs), it's a single-file db (a must for me), supports Unicode, BLOB compression, crypting, and technical limits seem impressing for a flat database. Moreover, support was good in few occasions when I needed it.
For cons, I have noticed it doesn't support nested transactions, but other than that, I had no problems.
As for size, nothing beats SQLite.
when you refer about lack of documentation, i guess it's doc for DISQLite3. The SQLite docs are quite complete
Take a look at NexusDB. Have used very successfully in the past.
The problem with (embedded) firebird is, that the database cannot reside on a network drive. Also, it is difficult to have a database on a read only drive (CD/DVD).
For some hacks around these limitations see the Delphi Wiki:
http://delphi.wikia.com/wiki/Firebird_tipps
NexusDB offers the full range from embedded, to full client/server / remote. Also SQL2003 compliant, I believe. I'm using it on a few projects, and am very pleased so far, and the fact that it can work in such a wide range of "scales" is a big plus (not having to learn another DB for scaled-up apps, etc).
Look at this embedded database comparison: http://sql-db.cz.cc/, it can be helpful. Most of abovementioned products are presented there: Advantage, DBISAM, Firebird, MS SQL Server, and much more: Accuracer, Apollo, ElevateDB, NexusDB, TurboDB.
I am partial to Component Ace's Absolute DB. Although a commercial product ($), it is solid, easy to use, small footprint and well documented. If you are looking for a huge multi-user application, this is not the way to go, but if your multi-user needs are light (or non-existent) this is a solid option.
I'm using SQL Server Express and the ADO components. Works great. You can run the SQL Server Express install with commandline to hide the complexities from the users. You can also distribute a database that you load by filename. There are millions of SQL server users so solutions to any problems are easily found in the intertubes :-)
I did a websearch to find a fast database package for my Delphi Application. I wanted it to be completely contained in the executable with no external DLLs or libraries required. I originally found Accuracer by AidAim. They had posted how fast their database was and even gave comparisons with other similar packages to “prove” their point.
I wanted to believe their claims but I thought I’d search the web a bit more to find timings of other packages. I was very surprised to find a post at the Delphi discussion forums where a person asked what database to use, and there were 14 different suggestions. One of the responders had done his own timing comparisons and had found Accuracer to be quite slow compared to several others, which Accuracer had (conveniently) left out of their own comparison page.
The post, plus additional followup web research by me, led me to lean toward DISQLite3, a product based on the Open Source SQLite program, but with enhancements to work in Delphi very quickly, with very small overhead, and with command-based calls - which I like. It is actively under development and will soon have an official Delphi 2009 version, although apparently the current version will work under D2009.
Addenum: DISQLite3 Version 2.0.0, released Nov 17, supports D2009.
I know MS access is a comparatively crap db (and expect to be shot down in flames here), but if only small data is needed it may have advantages if ms office is used anyway. For me it was a way to store program data with more flexibility than csv files which is a common approach for scientific code.
You can create an access db from delphi code without having ms office installed using ado & odbc driver (might be necesary to have an initial .accdb file without tables to copy from then populate, I can't remember this detail. not sure licensing situation doing this.
The .accdb extension can be changed to something else & the file password protected (to a limited degree) so its not immediately obvious to users its access if that's desired.
I know a few commercial developers do this method & copied it myself. Found it easier to setup than sqlite, but maybe because I'd already used ado & access in the past.
I have used ScimoreDB. It has its quirks as they give it royalty free and it has its quirks in data types and with some installation issues. This was on a C# project.
If embedded is an absolute must, look at DBISAM.
kbMemTable is a good candidate. Runs in memory, fast, multi-threadding. Used to be free.
Components4Developers
I have used DBISAM and kbMemTable on different occasions.
What I like about DBISAM is that it has great features, and is usually very reliable. I have used it in large databases, full-text search, read-only mode, CGIs and many other situations.
It is fairly large compared to kbMemTable or SQLite based components, though. And you can't have a single file per database (or even table) - depending on the situation, that is a major disadvantage.
kbMemTable is tiny and it's great for small amounts of data. Since it runs in memory, it has to be a small amount of data, of course.
One other option I've taken on a couple of my desktop apps is dumping the data directly from/to my object hierarchy using TWriter/TReader. This is by far that smallest option, and is absurdly fast compared to using a database. The data files are tiny, too.
It has all kinds of drawbacks, though - you have to code versioning in if you might want to ever add/change fields, unless it's in-memory it is even more complicated, no multi-user support at all, etc.
Firebird embedded is our #1 choice as well. And the suite Unified Interbase v2.0 with it. A great and stable solution!
I have a database that I have to record 5 field data for every 20 sec for 10 days.. 3 field are integer , 1 field is double ( time ) and 1 field is string[5].
I am still using Delphi6 srv2 because of my components. Newer delphi versions are terrible at components that I have to spend thousands of dollars of money to rebuild my component library. Therefor delphi 6 is still best for real commertial applications that never version of delphis give many problems. At many points such as USB or comport readings so on... they release newer ones before previous versions never sit on market.
I have setup a code with Delphi6 what appends 43200 records at a table for test because I will deploy the table in application while it has 43200 records. I will shown all the data on DBChart.
Test result is below databases filled the tables by insert command with 43200 records
Dbisam = 34 sec,
ElevateDb = 11 sec,
AbsoluteDB = 45 sec,
SQLlite = 32 Minute,
Firebird = 12 min,
MSSQL12 localDB = 28 Minute,
Easy table = 8 minute,
BDE = Blocked ,
I havent tested oracle , blackfish , sysbase, nexsusDb etc.. but it seems they will also very slow. I have connected with DBChart and only elevateDb and absoluteDB has loaded 43200 records on DBchart in exceptable time such as 7~10 secs. Other all taken minutes. So slower databases always needs coding tricks to succeed in some real jobs..
I have tested their search speed as well by locate command that unfortunatly the server based databases are always slower in.
MSSQL and SQLLite3 are extremely difficult to manage in to delphi that they made me very tired.
These are my test results
At the end I decided to use AbsoluteDB, Dbisam and Elevate. I have thrown the rest off the PC .
Elevate software doesnt support recno function that requires extra codes at runtime to manage. This makes the database slower Other bug is with Elevate software is autoinc fields. There is no way to reset it . Therefore I have not chosen the Elevat software even it is the fastest database. They say many good functions but how many of them we use it in fact . They just left the most important functions not supported but fixed many many unnecessary functions. and it seems since 8 years there is no any advantage either.
If you want to see with your own eye pls just try and see..
I am thinking between two now absolute DB or DBisam4
Firebird all the way. Does pretty well everything and so far version 2.1 is very solid.
FireBird offers the opportunity to scale up to multi-users sometime down the line, or if you need concurrency (if your application goes multi-threaded).
SQLite is quite unrivaled if you only need single-user access, no other database comes close to it on any aspect, be it performance, convenience, SQL support or stability.
Firebird is really awsome and has a small footprint so you can use embedded
and it can be scaled upward for many users
and does unicode faily well
I use devart components with delphi 2009
and FIB plus for delphi 6/7 (their version for 2009 and unicode is not ready yet too bad)
Hmmm, no one has recommended the BDE - I wonder why that is ;-)
BlackFishSQL is another possibility, although I haven't tested in depth as yet.
when it comes to embedded databases the first question is : is it multiuser ?
Actually,who needs a database that does not allow multiple connections (read&write) to it ?
I have tried (intensly) all mentioned databases and found only one that actually functions the way it should. And that is Accuracer.
The only pity with accuracer is that its a three man band and chronic lack of proper support. It also is mainly static in development as we have seen no real features in years.Not surprising since only one person actually develops it. It seems they are living on old fame. Users praise reflect that (usually 10 years old comments).
For a single user experience I would recommend Absolute Database.
As for major players I would recommend SQL Server from Microsoft. Oracle has become a bloatware and is slowly dying out.
ps
what is nice in accuracer is that their embedded database functions just like full blown server. It locks only current record if its in use while the rest functions normally. Nice database. Pity only it is stagnant.
I have an experiment streaming up 1Mb/s of numeric data which needs to be stored for later processing.
It seems as easy to write directly into a database as to a CSV file and I would then have the ability to easily retrieve subsets or ranges.
I have experience of sqlite2 (when it only had text fields) and it seemed pretty much as fast as raw disk access.
Any opinions on the best current in-process DBMS for this application?
Sorry - should have added this is C++ intially on windows but cross platform is nice. Ideally the DB binary file format shoudl be cross platform.
If you only need to read/write the data, without any checking or manipulation done in database, then both should do it fine. Firebird's database file can be copied, as long as the system has the same endianess (i.e. you cannot copy the file between systems with Intel and PPC processors, but Intel-Intel is fine).
However, if you need to ever do anything with data, which is beyond simple read/write, then go with Firebird, as it is a full SQL server with all the 'enterprise' features like triggers, views, stored procedures, temporary tables, etc.
BTW, if you decide to give Firebird a try, I highly recommend you use IBPP library to access it. It is a very thin C++ wrapper around Firebird's C API. I has about 10 classes that encapsulate everything and it's dead-easy to use.
If all you want to do is store the numbers and be able to easily to range queries, you can just take any standard tree data structure you have available in STL and serialize it to disk. This may bite you in a cross-platform environment, especially if you are trying to go cross-architecture.
As far as more flexible/people-friendly solutions, sqlite3 is widely used, solid, stable,very nice all around.
BerkeleyDB has a number of good features for which one would use it, but none of them apply in this scenario, imho.
I'd say go with sqlite3 if you can accept the license agreement.
-D
Depends what language you are using. If it's C/C++, TCL, or PHP, SQLite is still among the best in the single-writer scenario. If you don't need SQL access, a berkeley DB-style library might be slightly faster, like Sleepycat or gdbm. With multiple writers you could consider a separate client/server solution but it doesn't sound like you need it. If you're using Java, hdqldb or derby (shipped with Sun's JVM under the "JavaDB" branding) seem to be the solutions of choice.
You may also want to consider a numeric data file format that is specifically geared towards storing these types of large data sets. For example:
HDF -- the most common and well supported in many languages with free libraries. I highly recommend this.
CDF -- a similar format used by NASA (but useable by anyone).
NetCDF -- another similar format (the latest version is actually a stripped-down HDF5).
This link has some info about the differences between the above data set types:
http://nssdc.gsfc.nasa.gov/cdf/html/FAQ.html
I suspect that neither database will allow you to write data at such high speed. You can check this yourself to be sure. In my experience - SQLite failed to INSERT more then 1000 rows per second for a very simple table with a single integer primary key.
In case of a performance problem - I would use CSV format to write the files, and later I would load their data to the database (SQLite or Firebird) for further processing.