Zookeeper Distributed system Design - distributed

I have an interesting distributed systems problem and not sure if zookeeper will fix this. In my enterprise application there are two applications integrated through REST/Web services - say North and South applications. North has multiple instances created and pooled for South interaction. North instance is a REST client which invoke REST APIs exposed by South , and South also generates lot of events which North instance subscribes and processes. North is computationally heavy and writes the data into a shared in-memory datastore like Hazelcast. and have many associated applications connected to it for processing the response received from South. For this reason North instances are split into distributed systems. In case of load (especially events from South) additional North instances are created (is there any framework who will take care of this). I would like to know if I have say a pool of nodes, can I split the North instances across these nodes and then use Zookeper for HA, Fault tolerence and synchronizations.

Why are you not using Hazelcast for everything? Fault tolerance and synchronization are its key features.


Mobile database with client-side synchronisation of local databases required

I am building a mobile app with the following business requirements:
Db to be stored locally on the device for use when disconnected from
the cloud.
A NoSQL type store is required to provide for future changes without requiring complex db rebuild and data migration.
Utilises a SQL query language for simple programming.
Run on all target platforms - Windows, Android, iOS
No central database server - data to be synchronised by matching two local copies of the db file.
I have examined a lot of dbs for mobile and none provide all these features except Couchbase Lite 2.1 Enterprise Edition. The downside of that is that the EE license might be price prohibitive in my use case.
[EDIT: yes the EE license is USD$35K for <= 1000 devices to that option is out for me sadly.]
Are there any other such products out there that someone could point me to?
The client-side synchronization of local databases done by Couchbase Lite is a way to replicate data from one mobile device to another. Though is a limited feature because it works on P2P. Take as an example BitTorrent, the fastest and most effective P2P protocol. It still has flaws, risk of data corruption and partial data loss. A P2P synchronization would only be safe when running between two distinct applications on the same mobile device.
In case both databases are in the same mobile device and managed by the same application, it would be much simpler. You could do the synchronization yourself by reading data from one and saving in the other, and dealing with conflicts if needed.
I'm curious, why is it a requirement not to have a central database server? You can fine tune what data is shared and between which users is it shared. Here is how it works:
On server-side user registry, each user is assigned a list of channel names. At the same time, each JSON document added or updated is also linked to a list of channel names. For every pair of user x document with at least one channel name in common, the server allows push/pull replications to occur.
Good luck !

Where should i access my Database

I'm curious how you would handle following Database access.
Let's suggest you have a Computer which Hosts your database as part of his server work and multiple client PC's which has some client-side-software on it that need to get information from this database
AFAIK there are 2 way's to do this
each client-side-software connects directly to the Database
each client-side-software connects to a server-side-software which connects to the Database as some sort of data access layer.
so what i like to know is:
What are the pro and contra's of each solution?
And are other solutions out there which maybe "better" to do this work
I would DEFINITELY go with suggestion number 2. No client application should talk to a datastore without a broker ie:
ClientApp -> WebApi -> DatabaseBroker.class -> MySQL
This is the sound way to do it as you separate concerns and define an organized throughput to the datastore.
Some benefits are:
decouple the client from the database
you can centralize all upgrades, additions and operability in one location (DatabaseBroker.class) for all clients
it's very scaleable
its safe in regards to business logic
Think of it like this with this laymans example:
Marines are not allowed to bring their own weapons to battle (client apps talking directly to DB). instead they checkout the weapon from the armory (API). The armory has control over all weapons, repairs and upgrades (data from database) and determines who gets what.
What you have described sounds like two different kind of multitier architectures.
The first point matches with a two-tier and the second one could be a three-tier.
AFAIK there are 2 way's to do this
You can divide your application in several physical tiers, therefore, you will find more cases suitable to this architecture (n-tier) than the described above.
What are the pro and contra's of each solution?
Usually the motivation for splitting your application in tiers is to achieve some kind of non-functional requirements (maintainability, availability, security, etc.), the problem is that when you add extra tiers you also add complexity,e.g.: your app components need to communicate with each other and this is more difficult when they are distributed among several machines.
And are other solutions out there which maybe "better" to do this work.
I'm not sure what you mean with "work" here, but notice that you don't need to add extra tiers to access a database. If you have a desktop application installed in a few machines a classical client/server (two-tier) model should be enough. However, a web-based application needs an extra tier for interacting with the browser. In this case the database access is not the motivation for adding this extra tier.

How is singleton code+data handled in scale-out architectures?

This is more of a conceptual question but answers specific to opensource products like (JBoss, etc) are also welcome.
If my enterprise app needs to scale and I want to choose the scale-out model (instead of the scale-up) model, how would the multiple app server instances preserve the singleton semantics of a piece of code/data?
Example: Let's say, I have an ID-generation class whose logic demands that it be instantiated as a singleton. This class may or may not talk to the underlying database. Now, how would I ensure that the singleton semantics of this class are maintained when scaling out?
Secondly, is there a book or an online resource that both lists such conceptual issues and suggests solutions?
EDIT: In general, how would one handle generic, application state in the app server layer to allow the application to scale out? What design patterns, software components/products, etc I should be exploring further?
The further you scale out, the less able you are going to be to manage global static atomically. In other words, if you have 100 servers that need to share state (knowing which ID is next in an ID generating singleton class), then there is no technology I know of that will quickly and atomically get that ID for you.
Data has to travel from machine to machine in regards to the ID generation.
There are a few options I can think of for the scenario you mentioned:
Wait for all machines to catch up/sync before accepting a new ID. You could generate the ID locally and then check that it's good across other machines - or - run a job to get the next ID across all machines (think map/reduce).
Think sharding. With sharding you can generate IDs "locally" and be guaranteed to have uniqueness. So if you had 100 machines, machines 1-10 are for users in California, machines 11-20 are for users in New York, etc. Picking a sharding key can be tough.
Start looking to messaging systems. You would create/modify your object locally on a machine and then send the result to a service bus/messaging system and the other machines subscribe to a topic/queue and can get the object and process it.
Pick a horizontally scalable database to manage objects. They've already solved the issues of syncing and replication.

Web services or shared database for (game) server communication?

We have 2 server clusters: the first is made up of typical web applications backed by SQL databases. The second are highly optimized multiplayer game servers which keep all data in memory. Both clusters communicate with clients via HTTP (Ajax with JSON). There are a few cases in which we need to share data between the two server types, for example, reporting back and storing the results of a game (should ultimately end up in the database).
We're considering several approaches for inter-server communication:
Just share the MySQL databases between clusters (introduce SQL to the game servers)
Sharing data in a distributed key-value store like Memcache, Redis, etc.
Use an RPC technology like Google ProtoBufs or Apache Thrift
Using RESTful web services (the game server would POST back to the web servers, for example)
At the moment, we're leaning towards web services or just sharing the database. Sharing the database seems easy, but we're concerned this adds extra memory and a new dependency into the game servers. Web services provide good separation of concerns and fit with the existing Ajax we use, but add complexity, overhead and many more ways for communication to fail.
Are there any other good reasons not to use one or the other approach? Which would be easier to scale?
Sharing the DB brings the obvious drawback of not having one unit in control of the data going into the DB. This can be a big hassle, which is I would recommend building an application layer.
If this application layer is what your web applications form, then I see nothing wrong with implementing client-server communication between the game servers and the web apps. Let the game servers push data to the application layer and have them subscribe to updates. This is a good fit to a message queueing system, but you could get away with building your own REST-based system for instance, if this fits better with your current architecture.
If the web apps do not form the application layer, I would suggest introducing such a layer by writing a small app, which hides the specifics of the storage. Each side gets a handle to the app interface, and writes it data to it.
In order to share the data between the two systems, the application layer could then use a distributed DB, like mnesia, or implement a multi-level cache system with replication. The simplest version of this would be time-triggered replication with for instance MySQL as you mention. Other options are message queues, replicated memory (Terracotta) and/or replicated caches (memcached), although these do not provide persistent storage.
I'd also suggest looking at Redis as a data store and nodered for distributed pub-sub.
Although Redis is an in-memory K/V store, the latest version has VM support where keys are kept in memory, but values may be swapped out as memory pressure hits a configurable threshold. It also has simple master-slave replication and publish-subscribe built in.
NodeRed is built on node.js which is a scalable and ridiculously fast server-side js engine.

At what level should I implement communication between nodes on a distributed system?

I'm building a web application that from day one will be on the limits of what a single server can handle. So I'm considering to adopt a distributed architecture with several identical nodes. The goal is to provide scalability (add servers to accommodate more users) and fault tolerance. The nodes need to share some state between them, therefore some communication between them is required. I believe I have the following alternatives to implement this communication in Java:
Implement it using sockets and a custom protocol.
Use web services (each node can send and receive/parse HTTP request).
Use another high-level framework like Terracotta or hazelcast
I would like to know how this technologies compare to each other:
When the number of nodes increases
When the amount of communication between the nodes increases (1000s of messages per second and/or messages up to 100KB etc)
On a practical level (eg ease of implementation, available documentation, license issues etc)
I'm also interested to know what technologies are people using in real production projects (as opposed to experimental or academic ones).
Don't forget Jini.
It gives you automatic service discovery, service leasing, and downloadable proxies so that the actual client/server communication protocol is up to you and not enforced by the framework (e.g. you can choose HTTP/RMI/whatever).
The framework is built around acknowledgement of the 8 Fallacies of Distributed Computing and recovery-oriented computing. i.e. you will have network problems, and the architecture is built to help you recover and maintain a service.
If you also use Javaspaces it's trivial to implement workflows and consumer-producer architectures. Producers will write into the Javaspaces, and one or more consumers will take that work from the space (under a transaction) and work with it. So you scale it simply by providing more consumers.
