Currently I am working on failover support of an existing application.
The application uses postgres to store data but does not use any special feature (view/trigger etc). The database is more of a configuration storage rather than real data storage. When the application starts, it loads the data in memory and seldom goes back to database only when the configurations are changed. Trying to configure postgres failover solution for this simple task feels like overkill.
Is there any lightweight database which has built in failover support and simple to configure and use in production? Most of my data model is single table and there are like 5 transactions per minute or so.
BerkeleyDB is a very simple key/value store, probably it is perfectly adequate for your application, and it has support for hot failover.
Related
I am building a mobile app with the following business requirements:
Db to be stored locally on the device for use when disconnected from
the cloud.
A NoSQL type store is required to provide for future changes without requiring complex db rebuild and data migration.
Utilises a SQL query language for simple programming.
Run on all target platforms - Windows, Android, iOS
No central database server - data to be synchronised by matching two local copies of the db file.
I have examined a lot of dbs for mobile and none provide all these features except Couchbase Lite 2.1 Enterprise Edition. The downside of that is that the EE license might be price prohibitive in my use case.
[EDIT: yes the EE license is USD$35K for <= 1000 devices to that option is out for me sadly.]
Are there any other such products out there that someone could point me to?
The client-side synchronization of local databases done by Couchbase Lite is a way to replicate data from one mobile device to another. Though is a limited feature because it works on P2P. Take as an example BitTorrent, the fastest and most effective P2P protocol. It still has flaws, risk of data corruption and partial data loss. A P2P synchronization would only be safe when running between two distinct applications on the same mobile device.
In case both databases are in the same mobile device and managed by the same application, it would be much simpler. You could do the synchronization yourself by reading data from one and saving in the other, and dealing with conflicts if needed.
I'm curious, why is it a requirement not to have a central database server? You can fine tune what data is shared and between which users is it shared. Here is how it works:
On server-side user registry, each user is assigned a list of channel names. At the same time, each JSON document added or updated is also linked to a list of channel names. For every pair of user x document with at least one channel name in common, the server allows push/pull replications to occur.
Good luck !
We're developing a new version of our site using Node, but we need to continue using a legacy mysql database as-is yet also add new fields to some models via new tables in a new database, AND add a caching layer.
What's the best way to do this? We were thinking of using Jugglingdb and writing our own adapter. It would need to do several things:
round-robin select from several servers in our db herd.
cache into Redis for read-only connections
know which fields are in the legacy database and which are in the new database.
connect to databases for CRUD connections.
Is this something theoretically doable using a jugglingdb adapter? Or does anyone have other recommendations using another better technique and/or a completely different ORM package?
There's an adapter, jugglingdb-redis-hq, that has a "backyard" feature that is almost what we want, except that it seems to basically be for a sort of backwards caching, i.e. making a persistent copy of expired data in redis over to the database. We don't want to touch the database read/write unless we're changing or inserting something.
Amazing that it's been 3 years since I posted this question. What we ended up doing, and we're finally almost live with this, is this stack:
nodejs (of course)
hapijs for backend framework
Sequelize ORM to talk
to mysql (Sequelize has built-in connection pooling!)
Redis for caching
graphql api using graphql-sequelize module
wrote a service layer under hapi application layer to make queries to graphql api
Crucially, Sequelize did not make it easy to have connections to 2 different databases, so we made the decision to just only add new tables to the old schema, and not make any changes to the old tables. We've since ended up making a couple minor ALTER TABLEs when we really had to. Am still curious if we could have done this part another way, if another ORM would have let us more easily meld the 2 databases under the hood.
I am working on a project where we are scoping out the specs for an interface to the backend systems of multiple wholesalers. Here is what we are working with,
Each wholesaler has multiple products, upwards of 10,000. And each wholesaler has customized prices for their products.
The list of wholesalers being accessed will keep growing in the future, so potentially 1000s of wholesalers could be accessed by the system.
Wholesalers are geographically dispersed.
The interface to this system will allow the user to select the wholesaler they wish and browse their products.
Product price updates should be reflected on the site in real time. So, if the wholesaler updates the price it should immediately be available on the site.
System should be database agnostic.
The system should be easy to setup on the wholesalers end, and be minimally intrusive in their daily activities.
Initially, I thought about creating databases for each wholesaler on our end, but with potentially 1000s of wholesalers in the future, is this the best option as far as performance and storage.
Would it be better to query the wholesalers database directly instead of storing their data locally? Can we do this and still remain database agnostic?
What would be best technology stack for such an implementation? I need some kind of ORM tool.
Java based frameworks and technologies preferred.
Thanks.
If you want to create a software that can switch the database I would suggest to use Hibernate (or NHibernate if you use .Net).
Hibernate is an ORM which is not dependent to a specific database and this allows you to switch the DB very easy. It is already proven in large applications and well integrated in the Spring framework (but can be used without Spring framework, too). (Spring.net is the equivalent if using .Net)
Spring is a good technology stack to build large scalable applications (contains IoC-Container, Database access layer, transaction management, supports AOP and much more).
Wiki gives you a short overview:
http://en.wikipedia.org/wiki/Hibernate_(Java)
http://en.wikipedia.org/wiki/Spring_Framework
Would it be better to query the wholesalers database directly instead
of storing their data locally?
This depends on the availability and latency for accessing remote data. Databases itself have several posibilities to keep them in sync through multiple server instances. Ask yourself what should/would happen if a wholesaler database goes (partly) offline. Maybe not all data needs to be duplicated.
Can we do this and still remain database agnostic?
Yes, see my answer related to the ORM (N)Hibernate.
What would be best technology stack for such an implementation?
"Best" depends on your requirements. I like Spring. If you go with .Net the built-in ADO.NET Entity Framework might be fit, too.
Note: (I have investigated CouchDB for sometime and need some actual experiences).
I have an Oracle database for a fleet tracking service and some status here are:
100 GB db
Huge insertion/sec (our received messages)
Reliable replication (via Oracle streams on 4 servers)
Heavy complex queries.
Now the question: Can CouchDB be used in this case?
Note: Why I thought of CouchDB?
I have read about it's ability to scale horizontally very well. That's very important in our case.
Since it's schema free we can handle changes more properly since we have a lot of changes in different tables and stored procedures.
Thanks
Edit I:
I need transactions too. But I can tolerate other solutions too. And If there is a little delay in replication, that would be no problem IF it is guaranteed.
You are enjoying the following features with your database:
Using it in production
The data is naturally relational (related to itself)
Huge insertion rate (no MVCC concerns)
Complex queries
Transactions
These are all reasons not to switch to CouchDB.
Of course, the story is not so simple. I think you have discovered what many people never learn: complex problems require complex solutions. We cannot simply replace our database and take the rest of the month off. Sure, CouchDB (and BigCouch) supports excellent horizontal scaling (and cross-datacenter replication too!) but the cost will be rewriting a production application. That is not right.
So, where can CouchDB benefit you?
I suggest that you begin augmenting your application with CouchDB applications. Deploy CouchDB, import your data into it, and build non mission-critical applications. See where it fits best.
For your project, these are the key CouchDB strengths:
It is a small, simple tool—easy for you to set up on a workstation or server
It is a web server. It integrates very well with your infrastructure and security policies.
For example, if you have a flexible policy, just set it up on your LAN
If you have a strict network and firewall policy, you can set it up behind a VPN, or with your SSL certificates
With that step done, it is very easy to access now. Just make http or http requests. Whether you are importing data from Oracle with a custom tool, or using your web browser, it's all the same.
Yes! CouchDB is an app server too! It has a built-in administrative app, to explore data, change the config, etc. (like a built-in phpmyadmin). But for you, the value will be building admin applications and reports as simple, traditional HTML/Javascript/CSS applications. You can get as fancy or as simple as you like.
As your project grows and becomes valuable, you are in a great position to grow, using replication
Either expand the core with larger CouchDB clusters
Or, replicate your data and applications into different data centers, or onto individual workstations, or mobile phones, etc. (The strategy will be more obvious when the time comes.)
CouchDB gives you a simple web server and web site. It gives you a built-in web services API to your data. It makes it easy to build web apps. Therefore, CouchDB seems ideal for extending your core application, not replacing it.
I don't agree with this answer..
I think CouchDB suits especially well fleet tracking use case, due to their distributed nature. Moreover, the unreliable nature of gprs connections used for transmitting position data, makes the offline-first paradygm of couchapps the perfect partner for your application.
For uploading data from truck, Insertion-rate can take a huge advantage from couchdb replication and bulk inserts, especially if performed on ssd-based couchdb hosting.
For downloading data to truck, couchdb provides filtered replication, allowing each truck to download only the data it really needs, instead of the whole database.
Regarding complex queries, NoSQL database are more flexible and can perform much faster than relation databases.. It's only a matter of structuring and querying your data reasonably.
We have 2 server clusters: the first is made up of typical web applications backed by SQL databases. The second are highly optimized multiplayer game servers which keep all data in memory. Both clusters communicate with clients via HTTP (Ajax with JSON). There are a few cases in which we need to share data between the two server types, for example, reporting back and storing the results of a game (should ultimately end up in the database).
We're considering several approaches for inter-server communication:
Just share the MySQL databases between clusters (introduce SQL to the game servers)
Sharing data in a distributed key-value store like Memcache, Redis, etc.
Use an RPC technology like Google ProtoBufs or Apache Thrift
Using RESTful web services (the game server would POST back to the web servers, for example)
At the moment, we're leaning towards web services or just sharing the database. Sharing the database seems easy, but we're concerned this adds extra memory and a new dependency into the game servers. Web services provide good separation of concerns and fit with the existing Ajax we use, but add complexity, overhead and many more ways for communication to fail.
Are there any other good reasons not to use one or the other approach? Which would be easier to scale?
Sharing the DB brings the obvious drawback of not having one unit in control of the data going into the DB. This can be a big hassle, which is I would recommend building an application layer.
If this application layer is what your web applications form, then I see nothing wrong with implementing client-server communication between the game servers and the web apps. Let the game servers push data to the application layer and have them subscribe to updates. This is a good fit to a message queueing system, but you could get away with building your own REST-based system for instance, if this fits better with your current architecture.
If the web apps do not form the application layer, I would suggest introducing such a layer by writing a small app, which hides the specifics of the storage. Each side gets a handle to the app interface, and writes it data to it.
In order to share the data between the two systems, the application layer could then use a distributed DB, like mnesia, or implement a multi-level cache system with replication. The simplest version of this would be time-triggered replication with for instance MySQL as you mention. Other options are message queues, replicated memory (Terracotta) and/or replicated caches (memcached), although these do not provide persistent storage.
I'd also suggest looking at Redis as a data store and nodered for distributed pub-sub.
Although Redis is an in-memory K/V store, the latest version has VM support where keys are kept in memory, but values may be swapped out as memory pressure hits a configurable threshold. It also has simple master-slave replication and publish-subscribe built in.
NodeRed is built on node.js which is a scalable and ridiculously fast server-side js engine.