Using atom as in-memory database in ring website - database

I'm trying to build a very simple wiki-like system in Clojure and serving the http using Ring.
Instead of using a regular database i was thinking about using just an atom and serialise it to a file when it gets changed. Something like https://github.com/alandipert/enduro just with a delayed write.
Having the data in-mem in vectors and maps will surely make the service faster and the code simpler/more intuitive to write?
Will that work with a multithreaded Jetty/Ring server?
The content of the atom will surely fit in memory for now, but that might not hold true in the future. Any ideas to how i can structure the code to make it easier to switch to an alternative storage backend in the future?

This is the best guide for keeping data in memory and storing it to a single file: http://www.brandonbloom.name/blog/2013/06/26/slurp-and-spit/

Datomic would give you a few options.
You could use the in-memory db which would give you query power and thread safety. It would also be very easy to switch to a persistent datastore if/when the time comes. However, I'm not sure about serialization of the in-memory db.
Or you could use Datomic just for Datalog, which can be used for querying data structures. In that case, you could use an atom and then serialize as planned. Moving to a persistent datastore would be more work than the first case, but still not much. In either case, most of your code wouldn't need to change.
In my opinion, you'd be better of just starting with the free version of Datomic that uses the file system as a datastore. I don't think using an atom simplifies your code very much.

I second the recommendation for Datomic.
I've been using it on a "real" project for a few weeks now, and the more I use it, the more I realize that it would be a solid foundation for handling your data in any non-trivial project. Even if you never plan to use a "real" database in the future, just having a fact-based data model, powerful querying, and even full-text search built in is a huge win over just using an atom to store some huge map.
I checked and the free version does give you local storage as well as the in-memory database, so that would solve your storage needs perfectly (it uses an H2 database behind the scenes). And if you ever find yourself needing to scale to something bigger, you're already set.

Related

Redis for cakePHP app

I want to start a big cakePHP project where performance will be an issue. I will have a users table with act as tree behavior and many financial data related to the users. This application will make a lot of dynamic reports aggregating data for different tree nodes etc.
Since there is on github an easy to use library which sets data source of model to redis, I was wondering if it's a good idea to use it for entire app? Is there anyone who has experience with it, and what could be potential problems if I decide to depend on redis as main/only data storage?
EDIT: I have installed redis and Tried to use RedisModel for two models with simple relation HasMany/BelongsTo. When I tried to simply use those models like standard AppModels - it simply wont work (Redis Error: Missing key). Apparently you can't use Model->find Model->save etc. in standard way. You have to use redis methods instead (setKeyValue ect.). This means that pagination and other cakePHP futures will also not work. So maybe it is not the best idea to use redisModel for all my models...
I cannot speak for CakePHP specifically, but I'll talk about redis in general and the points of your question in particular, it should be applicable to your framework of choice in the end. Let's see:
You mention you want to start an application where performance will be an issue — I just wanted to mention you should be careful with the assumption that you will need a nosql solution, because this is hard to assess beforehand. Redis is hella fast, but MySQL for instance has been proven to be capable to handling millions of records and operations just fine, provided it's properly configured and used, and it's much simpler if you need lots of relational structures.
Concerning Redis as the main and only data store:
Redis is perfectly stable for the job. Instagram
reportedly stored 300 million key-value pairs pseudo-sharded
using hashes to great effect, and while it's not the only data
storage system they use, it goes to show redis is pretty reliable.
This very site (Stack Overflow) uses redis also extensively for
caching purposes.
Redis is also reported to have an overall excellent continuous uptime on average (which shouldn't be surprising considering the point above)
Options exists to mitigate downtime issues, replication is supported to some extent, and Redis Cluster is coming soon to support proper distributed approaches.
The main problem you could face is not understanding properly how its
persistence works. You should absolutely read this and this article before you get started because this point is important. In a nutshell, redis does not write changes immediately to disk, which means that depending on your configuration, a crash can cause a data loss ranging from a few seconds to several minutes since the last disk write. This might or might not be a problem depending on your use case; if the data is extremely sensitive (ie, financial records) you might want to think twice before jumping to redis, or build a system where redis is not exclusively used but rather combined with another storage system.
Relational structures in a non-relational data store like redis mean doing more work and often duplicating/denormalizing data. It can be done, but it's something to consider; in your question you mention you'll need to aggregate data to generate dynamic reports, are you sure you want to use redis for this? it sounds like a relational database would give you way more flexibility at a very small cost of performance. If you know in advance you'll need to run complex queries over your data, it could be a good idea not to reinvent the wheel unless you absolutely need to.
My advice here would be to first get a better feeling on what redis is and how works, potentially build your own models instead of relying on others to better understand what can and cannot be done, and from there assess where you want to take it. Redis is reliable enough to be used standalone, but at the end of the day what's smart is to use the right tool for the right job, and you might find some things of your app work well with redis while some others are better off to a more traditional storage system.

Is it more efficient to initialize objects or read/write directly from/to the database?

I'm creating an app that uses a SQLite database to store it's data. The application is similar to a task app, where it mostly just grabs and displays data and updates it when needed.
My question is whether it is worth it to initialize separate objects (which I would assume would happen when the app is loading), or whether it's better to simply read/write values from/to the database directly.
I've seen both methods used, but my intuition says it would be much cheaper to interface with the database directly, as there would be less memory overhead, and the values will have to be read from/saved to the database eventually anyway, but perhaps running queries every time data is loaded or updated would be slower than interacting with objects.
As for constraints, I am using a SQLite database over an ORM as I would like the code and data store to be as cross-platform as possible, and I haven't found any ORMs that interface with Python, (Obj-)C, and java, which are the target languages I'm using. If anyone has any suggestions that work with each of the languages, please let me know.
I think caching will help, especially for objects that will be needed often (like users - permissions).
You can use a mode like this on the objects that you want stored in cache and later on easily set this cache on/off at any time (for java I recommend ehcache):
getObject (key){
if(object present in cache) return object from cache;
load object from database;
store it to cache;
return object
}
Mihai
Usually the cache will help, especially if you have a large number of concurrent accesses to your application.
Each time you have to go to the webserver you loose time (you need to go to another application, usually to a different server, transport the data back and forward). Accessing a local object is much faster.
The easiest solution is to try this with cache on/off for a class like Users and see if it makes a difference.
If the database is not too big, I'd suggest you to (create and) use the in-memory style of SQLite database; you can have a look at this just to begin with. Plus, after you are done with accessing/writing to the database in memory, you can always dump it to a file to be used later; loading and saving an SQLite database to memory and disk respectively is pretty straightforward using sqlite3_backup_init(), sqlite3_backup_step() and sqlite3_backup_finish() as is given here.
(I am not clear as to what functionality you require in an ORM.)

Recommend database in pure node.js with no dependencies?

I would like to know if a pure node.js web app can be developed, which means very simple deployment. From my understanding since node.js is good at i/o, a database in node.js should be good too. Does one exist? Especially one that lives in RAM and occasionally persists to disk.
First of I don't see the problem in installing redis or mongodb. It can be done without any effort at all.
That said there are a number of such databases like:
ministore: save at specified intervals.
alfred: Reads are fast because indexes into files are kept in memory.
nStore: Also a index of all documents and their exact location on the disk is stored in in memory for fast reads of any document.
jsonds: Jsonds is a 'data store' which is just a JSON object which is written to disk at a set frequency.
supermarket
chaos
node-dirty
node-tiny
nedb: Embedded pure JS database with MongoDB-compatible API.
Also most of these product are very young and should probably not be used in production yet.
You could also code something yourself I assume using node-sqlite3 to store data back to disc.
If you want a database in Node that exists only in ram you could simply use javascript objects and arrays to contain your data. If you need something more powerful with queries that ressemble SQL, then maybe pure javascript objects would not be the best idea. Also, with this idea you could make it persistant by flushing the data to disk using JSON.stringify at a set interval.
Try looking here: https://github.com/joyent/node/wiki/modules#database
Sorry for the short answer guys.

Database recommendation

I'm writing a CAD (Computer-Aided Design) application. I'll need to ship a library of 3d objects with this product. These are simple objects made up of nothing more than 3d coordinates and there are going to be no more than about 300 of them.
I'm considering using a relational database for this purpose. But given my simple needs, I don't want any thing complicated. Till now, I'm leaning towards SQLite. It's small, runs within the client process and is claimed to be fast. Besides I'm a poor guy and it's free.
But before I commit myself to SQLite, I just wish to ask your opinion whether it is a good choice given my requirements. Also is there any equivalent alternative that I should try as well before making a decision?
Edit:
I failed to mention earlier that the above-said CAD objects that I'll ship are not going to be immutable. I expect the user to edit them (change dimensions, colors etc.) and save back to the library. I also expect users to add their own newly-created objects. Kindly consider this in your answers.
(Thanks for the answers so far.)
The real thing to consider is what your program does with the data. Relational databases are designed to handle complex relationships between sets of data. However, they're not designed to perform complex calculations.
Also, the amount of data and relative simplicity of it suggests to me that you could simply use a flat file to store the coordinates and read them into memory when needed. This way you can design your data structures to more closely reflect how you're going to be using this data, rather than how you're going to store it.
Many languages provide a mechanism to write data structures to a file and read them back in again called serialization. Python's pickle is one such library, and I'm sure you can find one for whatever language you use. Basically, just design your classes or data structures as dictated by how they're used by your program and use one of these serialization libraries to populate the instances of that class or data structure.
edit: The requirement that the structures be mutable doesn't really affect much with regard to my answer - I still think that serialization and deserialization is the best solution to this problem. The fact that users need to be able to modify and save the structures necessitates a bit of planning to ensure that the files are updated completely and correctly, but ultimately I think you'll end up spending less time and effort with this approach than trying to marshall SQLite or another embedded database into doing this job for you.
The only case in which a database would be better is if you have a system where multiple users are interacting with and updating a central data repository, and for a case like that you'd be looking at a database server like MySQL, PostgreSQL, or SQL Server for both speed and concurrency.
You also commented that you're going to be using C# as your language. .NET has support for serialization built in so you should be good to go.
I suggest you to consider using H2, it's really lightweight and fast.
When you say you'll have a library of 300 3D objects, I'll assume you mean objects for your code, not models that users will create.
I've read that object databases are well suited to help with CAD problems, because they're perfect for chasing down long reference chains that are characteristic of complex models. Perhaps something like db4o would be useful in your context.
How many objects are you shipping? Can you define each of these Objects and their coordinates in an xml file? So basically use a distinct xml file for each object? You can place these xml files in a directory. This can be a simple structure.
I would not use a SQL database. You can easy describe every 3D object with an XML file. Pack this files in a directory and pack (zip) all. If you need easy access to the meta data of the objects, you can generate an index file (only with name or description) so not all objects must be parsed and loaded to memory (nice if you have something like a library manager)
There are quick and easy SAX parsers available and you can easy write a XML writer (or found some free code you can use for this).
Many similar applications using XML today. Its easy to parse/write, human readable and needs not much space if zipped.
I have used Sqlite, its easy to use and easy to integrate with own objects. But I would prefer a SQL database like Sqlite more for applications where you need some good searching tools for a huge amount of data records.
For the specific requirement i.e. to provide a library of objects shipped with the application a database system is probably not the right answer.
First thing that springs to mind is that you probably want the file to be updatable i.e. you need to be able to drop and updated file into the application without changing the rest of the application.
Second thing is that the data you're shipping is immutable - for this purpose therefore you don't need the capabilities of a relational db, just to be able to access a particular model with adequate efficiency.
For simplicity (sort of) an XML file would do nicely as you've got good structure. Using that as a basis you can then choose to compress it, encrypt it, embed it as a resource in an assembly (if one were playing in .NET) etc, etc.
Obviously if SQLite stores its data in a single file per database and if you have other reasons to need the capabilities of a db in you storage system then yes, but I'd want to think about the utility of the db to the app as a whole first.
SQL Server CE is free, has a small footprint (no service running), and is SQL Server compatible

In Memory Database

I'm using SqlServer to drive a WPF application, I'm currently using NHibernate and pre-read all the data so it's cached for performance reasons. That works for a single client app, but I was wondering if there's an in memory database that I could use so I can share the information across multiple apps on the same machine. Ideally this would sit below my NHibernate stack, so my code wouldn't have to change. Effectively I'm looking to move my DB from it's traditional format on the server to be an in memory DB on the client.
Note I only need select functionality.
I would be incredibly surprised if you even need to load all your information in memory. I say this because, just as one example, I'm working on a Web app at the moment that (for various reasons) loads thousands of records on many pages. This is PHP + MySQL. And even so it can do it and render a page in well under 100ms.
Before you go down this route make sure that you have to. First make your database as performant as possible. Now obviously this includes things like having appropriate indexes and tuning your database but even though are putting the horse before the cart.
First and foremost you need to make sure you have a good relational data model: one that lends itself to performant queries. This is as much art as it is science.
Also, you may like NHibernate but ORMs are not always the best choice. There are some corner cases, for example, that hand-coded SQL will be vastly superior in.
Now assuming you have a good data model and assuming you've then optimized your indexes and database parameters and then you've properly configured NHibernate, then and only then should you consider storing data in memory if and only if performance is still an issue.
To put this in perspective, the only times I've needed to do this are on systems that need to perform millions of transactions per day.
One reason to avoid in-memory caching is because it adds a lot of complexity. You have to deal with issues like cache expiry, independent updates to the underlying data store, whether you use synchronous or asynchronous updates, how you give the client a consistent (if not up-to-date) view of your data, how you deal with failover and replication and so on. There is a huge complexity cost to be paid.
Assuming you've done all the above and you still need it, it sounds to me like what you need is a cache or grid solution. Here is an overview of Java grid/cluster solutions but many of them (eg Coherence, memcached) apply to .Net as well. Another choice for .Net is Velocity.
It needs to be pointed out and stressed that something like NHibernate is only consistent so long as nothing externally updates the database and that there is exactly one NHibernate-enabled process (barring clustered solutions). If two desktop apps on two different PCs are both updating the same database with NHibernate the caching simply won't work because the persistence units simply won't be aware of the changes the other is making.
http://www.db4o.com/ can be your friend!
Velocity is an out of process object caching server designed by Microsoft to do pretty much what you want although it's only in CTP form at the moment.
I believe there are also wrappers for memcached, which can also be used to cache objects.
You can use HANA, express edition. You can download it for free, it's in-memory, columnar and allows for further analytics capabilities such as text analytics, geospatial or predictive. You can also access with ODBC, JDBC, node.js hdb library, REST APIs among others.

Resources