Is there is a software caching API out there? - c

I'm looking for an API or an application that can cache data from a file or database.
My idea is that I have an application that reads a database, but the access to database is sequential and it is on a disk.
What I basically want to do is get the data from cache first and then if it doesn't exist in cache, then hit the database. Note I'm not using a mainstream database, I'm using SQLite, but my performance requirements are very high.
So is there any product or API (free or commercial) that I can use for this purpose? Also I must have an API to interface with my cache.
I want to implement something like a web server cache or something like that.
I'm using C and Unix platform.
Thanks

You might want to look at using a shared memory cache such as memcached, although this requires a separate daemon, or roll something similar for yourself.
One thing I'd mention is that you should probably do some actual benchmarking to check that your database is your bottleneck, and if performance is a real concern there, then you're going to have to consider scaling up to a non-embedded DBMS. If that's not an option, then you may still be able to optimise the existing database accesses (query optimisation, indices, etc.).

Check out memcached. Brian Aker has written a C library for it.
But I would also second Rob's suggestion. SQLite and "performance requirements are very high" may not necessarily go together, depending on what aspect of performance you mean.

You might try Zola's CaLi library:
http://icis.pcz.pl/~zola/CaLi/

Related

Redis for cakePHP app

I want to start a big cakePHP project where performance will be an issue. I will have a users table with act as tree behavior and many financial data related to the users. This application will make a lot of dynamic reports aggregating data for different tree nodes etc.
Since there is on github an easy to use library which sets data source of model to redis, I was wondering if it's a good idea to use it for entire app? Is there anyone who has experience with it, and what could be potential problems if I decide to depend on redis as main/only data storage?
EDIT: I have installed redis and Tried to use RedisModel for two models with simple relation HasMany/BelongsTo. When I tried to simply use those models like standard AppModels - it simply wont work (Redis Error: Missing key). Apparently you can't use Model->find Model->save etc. in standard way. You have to use redis methods instead (setKeyValue ect.). This means that pagination and other cakePHP futures will also not work. So maybe it is not the best idea to use redisModel for all my models...
I cannot speak for CakePHP specifically, but I'll talk about redis in general and the points of your question in particular, it should be applicable to your framework of choice in the end. Let's see:
You mention you want to start an application where performance will be an issue — I just wanted to mention you should be careful with the assumption that you will need a nosql solution, because this is hard to assess beforehand. Redis is hella fast, but MySQL for instance has been proven to be capable to handling millions of records and operations just fine, provided it's properly configured and used, and it's much simpler if you need lots of relational structures.
Concerning Redis as the main and only data store:
Redis is perfectly stable for the job. Instagram
reportedly stored 300 million key-value pairs pseudo-sharded
using hashes to great effect, and while it's not the only data
storage system they use, it goes to show redis is pretty reliable.
This very site (Stack Overflow) uses redis also extensively for
caching purposes.
Redis is also reported to have an overall excellent continuous uptime on average (which shouldn't be surprising considering the point above)
Options exists to mitigate downtime issues, replication is supported to some extent, and Redis Cluster is coming soon to support proper distributed approaches.
The main problem you could face is not understanding properly how its
persistence works. You should absolutely read this and this article before you get started because this point is important. In a nutshell, redis does not write changes immediately to disk, which means that depending on your configuration, a crash can cause a data loss ranging from a few seconds to several minutes since the last disk write. This might or might not be a problem depending on your use case; if the data is extremely sensitive (ie, financial records) you might want to think twice before jumping to redis, or build a system where redis is not exclusively used but rather combined with another storage system.
Relational structures in a non-relational data store like redis mean doing more work and often duplicating/denormalizing data. It can be done, but it's something to consider; in your question you mention you'll need to aggregate data to generate dynamic reports, are you sure you want to use redis for this? it sounds like a relational database would give you way more flexibility at a very small cost of performance. If you know in advance you'll need to run complex queries over your data, it could be a good idea not to reinvent the wheel unless you absolutely need to.
My advice here would be to first get a better feeling on what redis is and how works, potentially build your own models instead of relying on others to better understand what can and cannot be done, and from there assess where you want to take it. Redis is reliable enough to be used standalone, but at the end of the day what's smart is to use the right tool for the right job, and you might find some things of your app work well with redis while some others are better off to a more traditional storage system.

which db to go for tiny data requirements

I need some help choosing databases for my application.
My web application will basically consist of a main table. lets call it the "User" table.
it will have the user info like name, id, password, address, phone etc.
There will be 5 other related tables where i will save each user's info.
eg. Table for books read, Table for songs heard, Food eaten etc.
Overall i dont expect my data to go beyond 1,000 users.
So, i have got tiny data requirements.
Generally i would have gone with mysql, but i am feeling a bit adventurous.
I want to try out some of the new solutions on the block.
my requirements are:
1. pure performance
2. good documentation, ease of use
since my db size shouldn't be more than a few hundreds megs in size, i'd rather the entire tablespace in the memory itself for faster performance. How about some of the new NoSQL DBs.
any recommendations? I have worked mainly on oracle and MySQl and don't have much idea of all the new exciting stuff out there.
I would suggest to go with sqlite if your database requirement is small.
From sqlite website:
SQLite is a compact library. With all features enabled, the library
size can be less than 350KiB, depending on the target platform and
compiler optimization settings. (64-bit code is larger. And some
compiler optimizations such as aggressive function inlining and loop
unrolling can cause the object code to be much larger.) If optional
features are omitted, the size of the SQLite library can be reduced
below 200KiB. SQLite can also be made to run in minimal stack space
(4KiB) and very little heap (100KiB), making SQLite a popular database
engine choice on memory constrained gadgets such as cellphones, PDAs,
and MP3 players. There is a tradeoff between memory usage and speed.
SQLite generally runs faster the more memory you give it.
Nevertheless, performance is usually quite good even in low-memory
environments.
Object oriented dbs can be used like db4o or versant.
Neo4j (for Java) is a pretty awesome tool. It's technically a graph database, but by the sounds of your data model, I think it would be well-suited for you. From what I've seen it performs very well, its documentation was just incredibly good, and if you are using Java then it's like second nature. You basically point it at a directory and it sets up shop there.
If you are feeling adventurous and happen to be using Java, I suggest you give it a try.
I think redis is exactly what you want!
Yesterday I downloaded and installed it for the first time. It runs completely in memory and that meets your performance requirement. (It only writes the data to disk for cases like power failure, like a backup, but this does not slow down the writes to it.)
For linux and such there is tar.gz on the download page.
For windows you can download Dusan's native port: http://redis.io/download - it is precompiled and also has the client console to try out.
The documentation is very good, for example this is the page for the data types: http://redis.io/topics/data-types and you also find all the other relevant information as a fast to browse reference there.
And there is a nice online tutorial to get started quickly: http://try.redis-db.com/ which is actually fun to work through.
I like the atomic operations like "increment by" and the list stuctures with push and pop.
There is also a hash type.
For python there is redis-py: https://github.com/andymccurdy/redis-py
Me myself being a python coder I think the data structures that redis offers do very good match the python datatypes.

SQLite as a production database for a low-traffic site?

I'm considering using SQLite as a production database for a site that would receive perhaps 20 simultaneous users, but with the potential for a peak that could be many multiples of that (since the site would be accessible on the open internet and there's always a possibility that someone will post a link somewhere that could drive many people to the site all at once).
Is SQLite a possibility?
I know it's not an ideal production scenario. I'm only asking if this is within the realm of being a realistic possibility.
SQLite doesn't support any kind of concurrency, so you may have problems running it on a production website. If you're looking for a 'lighter' database, perhaps consider trying a contemporary object-document store like CouchDB.
By all means, continue to develop against SQLite, and you're probably fine to use it initially. If you find your application has more users down the track, you're going to want to transition to Postgres or MySQL however.
The author of SQLite addresses this on the website:
SQLite works great as the database engine for most low to medium traffic websites (which is to say, most websites). The amount of web traffic that SQLite can handle depends on how heavily the website uses its database. Generally speaking, any site that gets fewer than 100K hits/day should work fine with SQLite. The 100K hits/day figure is a conservative estimate, not a hard upper bound. SQLite has been demonstrated to work with 10 times that amount of traffic.
The SQLite website (https://www.sqlite.org/) uses SQLite itself, of course, and as of this writing (2015), it handles about 400K to 500K HTTP requests per day, about 15-20% of which are dynamic pages touching the database. Dynamic content uses about 200 SQL statements per webpage. This setup runs on a single VM that shares a physical server with 23 others and yet still keeps the load average below 0.1 most of the time.
So I think the long and short of it is, go for it, and if it's not working well for you, making the transition to an enterprise-class database is fairly trivial anyway. Do take care of your schema, however, and design your database with growth and efficiency in mind.
Here's a thread with some more independent comments around using SQLite for a production web application. It sounds like it has been used with some mixed results.
Edit (2014):
Since this answer was posted, SQLite now features a multi-threaded mode and write ahead logging mode which may influence your evaluation of its suitability for low-medium traffic sites.
Charles Leifer has written a blog post about SQLite's WAL (write ahead logging) feature and some well-considered opinions on appropriate use cases.
The small excerpt from SQLite website says it all.
Is the data separated from the application by a network? → choose
client/server
Many concurrent writers? → choose client/server
Big data? → choose client/server
Otherwise → choose SQLite!
SQLite "just works" (until it doesn't of course)
We often use SQLite for internal databases; The employee directory, our calendar of events, and other intranet services all run on lightweight databases. It would be major overkill to be running these apps at the scale we do on a "real" database like mySQL. This is especially true when you factor in that they're running along side 4 other virtual machines on a single mid-range computer.
At one point we had an outward facing site that ran on an sqlite db for months with only a single reboot required. Obviously, it was very low traffic, but it putted along nicely for what it did.
We have encountered a similar option on an environment with absolutely no writes, and we selected using SQLite.
See my blog post on the subject:
Well, the main assumption which makes this solution theoretically
possible is that our SQLite database is totally read-only. Our server
code should never change it. This would solve any locking problems, as
there are no read locks. We could find nowhere on the internet anyone
saying there is a problem in high-throughput reading of SQLite when
there are no writes - it could be possible!
I think it would depend mostly on what your read/write ratio will be. If it's mostly reading from the database, you may be okay. Multi-user writing in SQLite can be a problem because of how it locks the database.
People speak about concurrency problems, but sqlite has a way to cache incoming requests and have them wait for some time. It doesn't timeout immediately.
I've read things about the default timeout setting begin zero, meaning it times out immediately and that's nonsense. Maybe people didn't adjust this setting?
Depends on the usage of the site. If most of the time you're just reading data, you can pretty much use anything for a DB and cache the data in the application to achieve good performance.
I am using it in a very low traffic web server (it is a genomic database) and I don't have any problems. But there are only SELECT statements, no writing to the DB involved.
To add to an already brilliant answer: Since you are working with a server-less solution in this case, you can say goodbye to replication, or any sort of horizontal scaling of your db, as well as other advanced options. It also isn't the best choice if you have multiple users updating the same exact chunk of information. If you were to shard the database in the future you would have to migrate the data and move to something else. Also if you have a load balancer and multiple systems involved it would be difficult to maintain data centrality if using sqlite. These are just some of the reasons why it isn't recommended. Its great for smaller projects, and great for development.
It seems like with queuing you could also get away with avoiding a lot of the concurrency write problems with SQLite. Instead of writing directly to the sqlite db you would write to a queue that then in turn sequentially writes to the sqlite db in a first in first out mode. Not sure if your application reaches to where you would need this if it would be worth writing or just moving on to client/server DB...but a thought.

In Memory Database

I'm using SqlServer to drive a WPF application, I'm currently using NHibernate and pre-read all the data so it's cached for performance reasons. That works for a single client app, but I was wondering if there's an in memory database that I could use so I can share the information across multiple apps on the same machine. Ideally this would sit below my NHibernate stack, so my code wouldn't have to change. Effectively I'm looking to move my DB from it's traditional format on the server to be an in memory DB on the client.
Note I only need select functionality.
I would be incredibly surprised if you even need to load all your information in memory. I say this because, just as one example, I'm working on a Web app at the moment that (for various reasons) loads thousands of records on many pages. This is PHP + MySQL. And even so it can do it and render a page in well under 100ms.
Before you go down this route make sure that you have to. First make your database as performant as possible. Now obviously this includes things like having appropriate indexes and tuning your database but even though are putting the horse before the cart.
First and foremost you need to make sure you have a good relational data model: one that lends itself to performant queries. This is as much art as it is science.
Also, you may like NHibernate but ORMs are not always the best choice. There are some corner cases, for example, that hand-coded SQL will be vastly superior in.
Now assuming you have a good data model and assuming you've then optimized your indexes and database parameters and then you've properly configured NHibernate, then and only then should you consider storing data in memory if and only if performance is still an issue.
To put this in perspective, the only times I've needed to do this are on systems that need to perform millions of transactions per day.
One reason to avoid in-memory caching is because it adds a lot of complexity. You have to deal with issues like cache expiry, independent updates to the underlying data store, whether you use synchronous or asynchronous updates, how you give the client a consistent (if not up-to-date) view of your data, how you deal with failover and replication and so on. There is a huge complexity cost to be paid.
Assuming you've done all the above and you still need it, it sounds to me like what you need is a cache or grid solution. Here is an overview of Java grid/cluster solutions but many of them (eg Coherence, memcached) apply to .Net as well. Another choice for .Net is Velocity.
It needs to be pointed out and stressed that something like NHibernate is only consistent so long as nothing externally updates the database and that there is exactly one NHibernate-enabled process (barring clustered solutions). If two desktop apps on two different PCs are both updating the same database with NHibernate the caching simply won't work because the persistence units simply won't be aware of the changes the other is making.
http://www.db4o.com/ can be your friend!
Velocity is an out of process object caching server designed by Microsoft to do pretty much what you want although it's only in CTP form at the moment.
I believe there are also wrappers for memcached, which can also be used to cache objects.
You can use HANA, express edition. You can download it for free, it's in-memory, columnar and allows for further analytics capabilities such as text analytics, geospatial or predictive. You can also access with ODBC, JDBC, node.js hdb library, REST APIs among others.

Which embedded database capable of 100 million records has an efficient C or C++ API

I'm looking for a cross-platform database engine that can handle databases up hundreds of millions of records without severe degradation in query performance. It needs to have a C or C++ API which will allow easy, fast construction of records and parsing returned data.
Highly discouraged are products where data has to be translated to and from strings just to get it into the database. The technical users storing things like IP addresses don't want or need this overhead. This is a very important criteria so if you're going to refer to products, please be explicit about how they offer such a direct API. Not wishing to be rude, but I can use Google - please assume I've found most mainstream products and I'm asking because it's often hard to work out just what direct API they offer, rather than just a C wrapper around SQL.
It does not need to be an RDBMS - a simple ISAM record-oriented approach would be sufficient.
Whilst the primary need is for a single-user database, expansion to some kind of shared file or server operations is likely for future use.
Access to source code, either open source or via licensing, is highly desirable if the database comes from a small company. It must not be GPL or LGPL.
you might consider C-Tree by FairCom - tell 'em I sent you ;-)
i'm the author of hamsterdb.
tokyo cabinet and berkeleydb should work fine. hamsterdb definitely will work. It's a plain C API, open source, platform independent, very fast and tested with databases up to several hundreds of GB and hundreds of million items.
If you are willing to evaluate and need support then drop me a mail (contact form on hamsterdb.com) - i will help as good as i can!
bye
Christoph
You didn't mention what platform you are on, but if Windows only is OK, take a look at the Extensible Storage Engine (previously known as Jet Blue), the embedded ISAM table engine included in Windows 2000 and later. It's used for Active Directory, Exchange, and other internal components, optimized for a small number of large tables.
It has a C interface and supports binary data types natively. It supports indexes, transactions and uses a log to ensure atomicity and durability. There is no query language; you have to work with the tables and indexes directly yourself.
ESE doesn't like to open files over a network, and doesn't support sharing a database through file sharing. You're going to be hard pressed to find any database engine that supports sharing through file sharing. The Access Jet database engine (AKA Jet Red, totally separate code base) is the only one I know of, and it's notorious for corrupting files over the network, especially if they're large (>100 MB).
Whatever engine you use, you'll most likely have to implement the shared usage functions yourself in your own network server process or use a discrete database engine.
For anyone finding this page a few years later, I'm now using LevelDB with some scaffolding on top to add the multiple indexing necessary. In particular, it's a nice fit for embedded databases on iOS. I ended up writing a book about it! (Getting Started with LevelDB, from Packt in late 2013).
One option could be Firebird. It offers both a server based product, as well as an embedded product.
It is also open source and there are a large number of providers for all types of languages.
I believe what you are looking for is BerkeleyDB:
http://www.oracle.com/technology/products/berkeley-db/db/index.html
Never mind that it's Oracle, the license is free, and it's open-source -- the only catch is that if you redistribute your software that uses BerkeleyDB, you must make your source available as well -- or buy a license.
It does not provide SQL support, but rather direct lookups (via b-tree or hash-table structure, whichever makes more sense for your needs). It's extremely reliable, fast, ACID, has built-in replication support, and so on.
Here is a small quote from the page I refer to above, that lists a few features:
Data Storage
Berkeley DB stores data quickly and
easily without the overhead found in
other databases. Berkeley DB is a C
library that runs in the same process
as your application, avoiding the
interprocess communication delays of
using a remote database server. Shared
caches keep the most active data in
memory, avoiding costly disk access.
Local, in-process data storage
Schema-neutral, application native data format
Indexed and sequential retrieval (Btree, Queue, Recno, Hash)
Multiple processes per application and multiple threads per process
Fine grained and configurable locking for highly concurrent systems
Multi-version concurrency control (MVCC)
Support for secondary indexes
In-memory, on disk or both
Online Btree compaction
Online Btree disk space reclamation
Online abandoned lock removal
On disk data encryption (AES)
Records up to 4GB and tables up to 256TB
Update: Just ran across this project and thought of the question you posted:
http://tokyocabinet.sourceforge.net/index.html . It is under LGPL, so not compatible with your restrictions, but an interesting project to check out, nonetheless.
SQLite would meet those criteria, except for the eventual shared file scenario in the future (and actually it could probably do that to if the network file system implements file locks correctly).
Many good solutions (such as SQLite) have been mentioned. Let me add two, since you don't require SQL:
HamsterDB fast, simple to use, can store arbitrary binary data. No provision for shared databases.
Glib HashTable module seems quite interesting too and is very
common so you won't risk going into a dead end. On the other end,
I'm not sure there is and easy way to store the database on the
disk, it's mostly for in-memory stuff
I've tested both on multi-million records projects.
As you are familiar with Fairtree, then you are probably also familiar with Raima RDM.
It went open source a few years ago, then dbstar claimed that they had somehow acquired the copyright. This seems debatable though. From reading the original Raima license, this does not seem possible. Of course it is possible to stay with the original code release. It is rather rare, but I have a copy archived away.
SQLite tends to be the first option. It doesn't store data as strings but I think you have to build a SQL command to do the insertion and that command will have some string building.
BerkeleyDB is a well engineered product if you don't need a relationDB. I have no idea what Oracle charges for it and if you would need a license for your application.
Personally I would consider why you have some of your requirements . Have you done testing to verify the requirement that you need to do direct insertion into the database? Seems like you could take a couple of hours to write up a wrapper that converts from whatever API you want to SQL and then see if SQLite, MySql,... meet your speed requirements.
There used to be a product called b-trieve but I'm not sure if source code was included. I think it has been discontinued. The only database engine I know of with an ISAM orientation is c-tree.

Resources