Database or format for help system? - c

I'm implementing a help system for my app. Its written in C and Gtk+.
Help is divided on books. Each book is just a bunch of HTML pages with resource.
The problem I've encountered is that each book is ~30M (using WebKit Gtk port to display it). After zipping it becomes ~7M, but opening document becomes extremely slow :( So I'm thinking about using some kind of library able to provide me with: full text search index creation, document listing, tree structure (a-la file system), and compression of course.
Any ideas on such thing?
P.S. Not all of the requirements are "must have", I'm still exploring this part and not sure that all of them are required, so if it'll miss something it'll be ok.

SQLite supports compressed databases (readonly), and is ideal for a one user database.
However, you should think about the need to compress. Hard disks are so big these days that a 700MB library on a computer isn't too much of a worry.
Alternatively, you could go for Firebird, which as far as I know doesn't support a compressed database, but you could compress your individual pages, in which case you would need to build your own index for full-text search - which I would consider unnecessary work.
Firebird supports a feature called "Embedded Server" which is especially designed for deployment with Windows applications.
I think you should make an ordered list, of the features you want, then pick the top two or three, and do that before advancing on. E.g. getting it into a database is something you'd want to do, before you thought about compression.

Related

How to reverse engineer a database file format?

It is an accounting database whose database file format is proprietary data. But the problem is the database is highly unreliable, corruption go unnoticed for years after which it becomes unbearably difficult to recover data, we maintain accounts of lots of companies, so reversing the format will be highly useful. Also, the software is odbc compliant which might be very helpul for reversing it.The software is partially open-source, i.e. except database engine everything is open-source(and fully customizable), though they have there own different language.
I have no experience in reversing. we would like to reverse it in such a way that we could write data in that format directly. So I would like to know where I could get tools and how should I approach this, any articles or videos will be awesome.
Usually it's not so "hard". For example, I like hybrid solution:
Basic target file analyze: entropy, signatures, etc. It will be a good idea to look at possible chunks with some hex-editor (be ready to live in hex-editor for a long time), I recommend you 010 Editor with templates / scripts inside. Basic target application analyze: you need to understand how it was written (language, frameworks, libraries and so on) it'll be really useful on next steps.
Classic rev-engineering of application that can open this file format. Best way to use something like IDA Pro + Hex-Rays (but be ready to spend a lot of money) to generate C-like sources. You can break target file (change / steal some bytes) and break on exception in target application, for example, to start exploration.
Try to use FileMon (now it's a part of ProcMon) to understand how target applications reads target file. Sometimes it's possible (some applications doesn't use read buffer at all, so, you will see small readings + call stack for exploration!).
Different tools that can help you in some cases.
Best of all, upload few samples of database and application and give a link. We can try to do it together and write a little tutorial for others.

which db to go for tiny data requirements

I need some help choosing databases for my application.
My web application will basically consist of a main table. lets call it the "User" table.
it will have the user info like name, id, password, address, phone etc.
There will be 5 other related tables where i will save each user's info.
eg. Table for books read, Table for songs heard, Food eaten etc.
Overall i dont expect my data to go beyond 1,000 users.
So, i have got tiny data requirements.
Generally i would have gone with mysql, but i am feeling a bit adventurous.
I want to try out some of the new solutions on the block.
my requirements are:
1. pure performance
2. good documentation, ease of use
since my db size shouldn't be more than a few hundreds megs in size, i'd rather the entire tablespace in the memory itself for faster performance. How about some of the new NoSQL DBs.
any recommendations? I have worked mainly on oracle and MySQl and don't have much idea of all the new exciting stuff out there.
I would suggest to go with sqlite if your database requirement is small.
From sqlite website:
SQLite is a compact library. With all features enabled, the library
size can be less than 350KiB, depending on the target platform and
compiler optimization settings. (64-bit code is larger. And some
compiler optimizations such as aggressive function inlining and loop
unrolling can cause the object code to be much larger.) If optional
features are omitted, the size of the SQLite library can be reduced
below 200KiB. SQLite can also be made to run in minimal stack space
(4KiB) and very little heap (100KiB), making SQLite a popular database
engine choice on memory constrained gadgets such as cellphones, PDAs,
and MP3 players. There is a tradeoff between memory usage and speed.
SQLite generally runs faster the more memory you give it.
Nevertheless, performance is usually quite good even in low-memory
environments.
Object oriented dbs can be used like db4o or versant.
Neo4j (for Java) is a pretty awesome tool. It's technically a graph database, but by the sounds of your data model, I think it would be well-suited for you. From what I've seen it performs very well, its documentation was just incredibly good, and if you are using Java then it's like second nature. You basically point it at a directory and it sets up shop there.
If you are feeling adventurous and happen to be using Java, I suggest you give it a try.
I think redis is exactly what you want!
Yesterday I downloaded and installed it for the first time. It runs completely in memory and that meets your performance requirement. (It only writes the data to disk for cases like power failure, like a backup, but this does not slow down the writes to it.)
For linux and such there is tar.gz on the download page.
For windows you can download Dusan's native port: http://redis.io/download - it is precompiled and also has the client console to try out.
The documentation is very good, for example this is the page for the data types: http://redis.io/topics/data-types and you also find all the other relevant information as a fast to browse reference there.
And there is a nice online tutorial to get started quickly: http://try.redis-db.com/ which is actually fun to work through.
I like the atomic operations like "increment by" and the list stuctures with push and pop.
There is also a hash type.
For python there is redis-py: https://github.com/andymccurdy/redis-py
Me myself being a python coder I think the data structures that redis offers do very good match the python datatypes.

Appropriate Database to store 20 GB for Delphi, Firemonkey

I don't have experience in database development, so I need your suggestions in choosing of a database that can be used in Firemonkey.
I need to store html files (without media now, but they can be with), their total size is around 20 GB (uncompressed text). A main feature must be maximally fast searching of text in the database, and it must be possible to implement human searching (like google). Plus, there can be compression (20 GB is to much to store. If compression makes searching slow it's not required).
What kind of databases are appropriate for my concern?
Thanks a lot for your suggestions!
Edited
Requirements:
Price: Free
Location: local or remote
Operating system support: Windows
System requirements: a database with a large footprint
(hopefully in exchange of better performances)
Performances: fast text searching
Concurrent users: 20
Full text indexing and searching: human (Google-like) fast
text searching is required
Manageability: doesn't matter much
I know an on-line web legal database that can search words through 100 GB of information in milliseconds. I need the same performance, and Google-like searching is required.
Delphi database access layer is separated from FireMonkey, it's the same used by VCL (although FM AFAIK relies only on LiveBindings to access data, but that's not an issue in your case).
Today 20 GB are really not much data. Almost any database will handle them without much effort if properly configured. What engine to choose depends on:
Price: how much are you going to spend for it?
Location: do you need a local database (same machine) or a remote one (LAN or WAN)?
Operating system support: which OS should it run on?
System requirements: do you need a database with a small footprint, or you can use one with a larger one (hopefully in exchange of better performances)?
Performances: what are the required performances?
Concurrent users: how much user will connect to the database concurrently?
Full text indexing and searching: not all databases offer it out of the box
Manageability: some databases may require more management than others.
There is no "one database fits all" yet.
I'm no DBA so I can't say directly, and honestly I'm not sure that any one person could give a direct answer to this question as it's one of those it just depends scenarios.
http://en.wikipedia.org/wiki/Comparison_of_relational_database_management_systems
That's a good starting point to compare features and platform compatibility. I think the major thing to consider here is what hardware will be running it and how can you best utilize that to accomplish the task at hand.
If you have a server farm being sure your DB supports distribution and some sort of load balancing (most do to some degree from what I understand).
To speed up searching unless you code up a custom algorithm that searches the compressed version somehow I think you're going to want to keep the data un-compressed. Searching the compressed data actually might be faster. If you're able to use the index for the compressed file to compare against your plain text search parameters then are just looking for those keys that were matched within the index. If any are found in the index check for them within the compressed data. Without tons of custom code I haven't heard of any DB that supports this idea of searching compressed text (though I could easily be wrong on this point).
If the entire data set needs to be decompressed before doing the search it will very likely be much slower (memory is relatively cheap compared to CPU time). It looks like Firemonkey has a limited selection of DBs to use so that will help to narrow your choices down as well.
What I would suggest based on your edited question, is to write (or find) a parser or regular expression to extract all the important elements from the HTML that you would like to be searchable. Then store those in a database along with a reference for where they were found in the HTML. In terms of Google like searching, if you mean in terms of how it can correct misspellings and use synonyms, you probably need some sort of custom code to do dictionary look ups for spelling and thesaurus look ups for synonyms. I believe full text searching in any modern DB will handle the need to query with LIKE or similar statements in the where clause.
Looks like ldsandon's answer covers most of this anyhow. TLDR; if not thanks for reading.
I would recommend PostgreSQL for this task. It has good performance, and built in full text search capability for Google-like searching. And it's free and open source.
Unfortunately Delphi doesn't come with Postgres data access components out of the box. You can connect by ODBC, or you can purchase components available from, for example, Devart, DA-Soft or microOLAP.
Have you considered NoSQL databases? The Wikipedia article explains their differences to SQL databases and also mentions that they are suited as document store.
http://en.wikipedia.org/wiki/NoSQL
The article lists around twelve implementations in the document store category, many are open source. (Jackrabbit, CouchDB, MongoDB).
This question on Stackoverflow contains some pointers to Delphi clients:
Delphi and NoSQL
I would also consider caching on the application server, to speed up search. And of course a text indexing solution like Apache Lucene.
I would take Microsoft SQL Server Express Edition. I think 2008 R2 is latest stable version but there is also Denali (2011). It match all criterien you have.
You can use ADO to work with.
Try the Advantage Database Server.
It's easy to manage and configure.
Both dbase-like and SQL data management languages.
Fast indexed full text search capabilities.
Plus, unparalled support from the developers themselves.
The local server (stand-alone version, as opposed to the network based server) is free.
devzone.advantagedatabase.com
There is a Firebird version with full text search according to its documentation - http://www.red-soft.biz/en/document_21 - it uses Apache Lucene, a popular search engine

Delphi embedded DB

I need a SMALL/LIGHTWEIGHT DB control (maybe delivered as a single PAS file) that I can integrate directly into my application. I need to store relatively small amounts of data in a small number of tables and I want to access some columns fast. I know that Delphi 7 has that nice BDE but I don't want to trouble the user with the installation process.
I use Delphi 7.
EDIT:
I think I asked the wrong question. So, here is what I actually need:
How to store dynamic data (unknown number of fields) to a file?
NexusDB offers a free embedded version. Here is an Example
If you're committed to not include any more dependencies with your application, take a look at TClientDataSet.
I'd recommend some sort of "embedded" database. Example: In order to use Firebird as an embedded database, at a minimum, you only need to ship one DLL. You can put that DLL in your installer so the user doesn't need to install anything.
As an alternative, how about the freeware TDbf database? It compiles directly into your application and is reliable for lightweight uses.
Also, if you are old enough to remember the days when DBase was the standard desktop database platform, then you probably know how to use it already. :-)
It's at http://tdbf.sourceforge.net
(If there doesn't seem to be a lot of recent activity then it is because it's been around for 10 years and is very stable).
Just a thought.
You could try SQLite. Its an excellent embedded database. Fast, reliable, and you cant beat the price (open source, public domain). There's a number of Delphi wrappers, or you can use the library directly if you want a light weight solution.
Two Open Source solutions (working from Delphi 6 up to XE):
One ORM oriented solution, which can store data using SQLite or a pure Delphi storage. It can be either as stand-alone, either as Client/Server.
One very fast pure Delphi NoSQL table storage engine. Sample benchmark was able to store 1,000,000 records with one integer and one text field in 800 ms (with automatic index creation). You create your own table columns, then access the fields content via Late-Binding.
If have some budget to spend, give AnyDAC a try. It provides native and embedded SQLite access, so you don't even have to ship an external DLL.
TurboPower FlashFiler
I have tried (arranged by lightweight):
NexusDB - commercial, too big for what I need; adds quite some overhead
DISQLite - seems powerful; difficult to use
kbmMemTable - commercial, UNDOCUMENTED FOR TRIAL USERS (it cannot be trialed unless you purchase the documentation, first which befits the purpose of the TRIAL concept)
TDBF - free but not maintained anymore; also it totally lacks of documentation
Synopse BigTable - seems to be the solution that I need. It consists in 2 PAS files only.
In some situations a custom system may fit better that a general one. So, for what I need, I will tailor my own system. Because I know the size/type of the data I can make fields that perfectly fit my data. The DB size will be smaller this way and faster (plus it is free).
:)
Solution: How to store dynamic data (unknown number of fields) to a file?
JvCSVDataSet and JvCSVBase from JVCL
kbmMemTable
Embedded Firebird server
jbDBF
While the manual is paid for, the actual component is available for free, including a pretty extensive demo application showing lots of the features. In addition there are lots and lots of information on the internet about how to use it.
Use www.codenewsfast.com or google.com to search for it.
best regards
Kim Madsen
kbm#components4developers.com

Nonrelational Databases for C++

I was thinking of starting a project that very clearly needs a persistent store. I was about to reluctantly decide on a RDBMS, when I came across an article which briefly mentions CouchDB. Seems some advancements in DB technology have happened since I last looked, so I thought I would ask here about databases before I got into it.
Here are my criteria. ( I list the criteria again at the end, so if you want to skip the explanations just scroll down. )
The project is open source and I will not be asking anything for it, so preferably the database is open source and free. Furthermore the software has to run on both Linux and Windows.
There are parts of the project that have to be in C++. The project is not large enough code wise to justify using a second language. So basically the whole thing will be C++.
This project will not have anything to do with the web, so preferably
the database will not require the detritus of a web library.
The objects I want to store fall into one of two categories: a basic object and a container object. The difference being objects which are containers will contain even more objects, ie: a parts of parts problem. I need a database that can handle such cases cleanly and efficiently.
I also expect the schema to evolve rapidly, at least initially. I alse suspect that some of the old data simply will not fit into the new schemas. So I would like to keep different versions of the schema around. Win possible, I would like to be able to transform data in one to schema into another schema.
For the application to work the way intended, people would have to exchange large chunks of database with each other. So I would want simple ways of importing and exporting data, which I could automate to some degree.
Finally it would be nice if the database could in someway be simulated in unit tests.
THose are my requirements. I have replicated them below to make it easier for people answering.
Thank you
Non Technical requirements
1. Open source preferably free.
2. Run on Windows and Linux
Has a C++ interface.
Is able to handle a non-web application, preferably without REST.
Can handle a "parts of parts" problem fairly well.
Can handle multiple indexes.
Has sort of concept of schema version, can handle multiple schema versions, and can migrate tables from one schema to another.
Should have a simple mechanism for move data from one instance of the database to another.
Preferably has some mechanism for testing.
HDF5 is a binary format which behaves like an hierarchical database. It has binding and libraries for C++ and python (I only use the latter) and it is used to store big amounts of data, like the ones produces in certain physics and astronomy experiments.
http://www.hdfgroup.org/HDF5/
I've looked at a few nosql databases some time ago (had an different requirement than than you though - needed it to be a standalone server). The ones that I remember as particularly interesting are Redis and Kyoto Cabinets. Have a look.
BTW, you don't mention any performance requirement. If so, have you considered SQLite? Simple, embedded, stable, and with the flexibility of SQL after all. With prepared statement the performance penalty of SQL should not be very high.
EDIT: ooops, just noticed that you asked this more than a year ago... Well, perhaps you can tell us what you've chosen :)

Resources