Database Clustering - database

My Application is built on monolithic architecture using laravel framework and mysql database
The application targetting that it will serve more than 1 million users and at its pick hour will face to be managed more than 50k request per sec.
I know about Load balancing. But i want a help about database clustering
i want to implement a master-slave topology but i have no clue how to start with it.
i found some resource about clusterControl https://severalnines.com/product/clustercontrol But not understanding the proper guideline
Question 1 : The Guideline for implementing Database Clustering.

you can watch a video on how to setup DB clustering. It is trivial with ClusterControl. Just setup the hosts with Debian/Ubuntu/Rhel/CentOS and provide the IP/hostnames to ClusterControl and it will deploy a database cluster Active/Standby (MySQL/Maria, PG) or Multi-Master (Galera).
https://www.youtube.com/watch?v=umgvVHHaBog

Related

Would Apache Gora fit when you have to build an application which writes/reads from a set of databases?

Would Apache Gora fit when you have to build an application which writes/reads from a set of databases including SQLServer, MongoDB, HBase & Cassandra?
The idea is to develop an application which is capable of performing CRUD operations across databases? Request 1 goes to SQLServer, Request 2 goes to MongoDB and Request 3 goes to HBase and so on. The Request will have the information as to which database the application should hit and there is a finite list of databases.
Are there any alternatives?
Any pointers?
Let me know if any other information is required.
From your description I would say "yes", except accessing SQL Server (not supported).
Two things I can tell you as BIG tips to begin:
Create your datastores with this DataStoreFactory#createDataStore() method that allows to configure a different "gora.properties" content, and Configuration.
Remember that each gora-xxx-mapping.xml is shared between all the connections to a same backend.
Alternatives:
Kundera, maybe?
-- Edit from comments:
There is a gora-sql module but it had to be disabled years ago because of some license issues. If you look at the modules in the pom, you will see that gora-sql is not being compiled. No one took the staff to rebuild it :(
About point 2, it can exist Application1MongoDB and Application2MongoDB: If they are different applications, they can have a different gora-xxx-mapping.xml in each one's classpath.
If they are datastores instances from calls to #createDataStore() (in the same application), then all the mappings will have to go in the casspath's gora-xxx-mapping.xml. It is just a tip I advised that I found tricky.
More alternatives:
Hibertane OGM as told in the comments.
EclipseLink (although does not support much backends)
DataNucleus

Clustering several instances of Apache Zeppelin?

I have been testing Apache Zeppelin to query several sources hosted on Apache Drill and then creating charts to analyze our data.
Since the product seems robust enough, I have been planning on rolling our analyst team over this solution for monitoring and discovering business data.
The problem I face now is that only 1 instance of Zeppelin will not be enough to manage concurrently the users (and thinking about HA, it's not a good idea on relying exclusively on 1 host). I have already built an Apache Drill cluster to be able to handle the traffic volumes, but I couldn't find anything on the documentation on how to build a cluster of several Zeppelin instances to share their notebooks and user sessions behind a load balancer.
Can you advice if what I'm trying to do is posible? If so, can you point me on the right direction?
Thanks
EDIT: Been playing with Zeppelin 0.8.0-snapshot and MongoDB integration to store notebooks. Although it seems to be able to write new and update notebooks, other connected Zeppelin instances will only update their internal notebooks after a restart.

How to design my GWT application on DB level

I thought and read so much about GWT, AppEngine and CloudSQL last week, but I still can't find the right setup for my app on DB level.
My requirements are:
I want to use GWT, since I'm good in Java
I'm creating a big data mining project.
I'll analyse the data from the DB with Matlab (using the Database Toolbox over JDBC)
Later, users can use the application as a collaborative web 2.0 application
I want to have an easy solution, where I don't run every day while developing into stupid
Class*Excepitions due to incorrect design
I thought about the following setups:
GWT + Appengine + CloudSQL + local MySQL DB:
I'll try to keep the CouldSQL DB and my MySQL DB in sync. Then I can run my data mining algorithms locally using the MySQL db.
GWT + MySQL: I'll connect GWT directly to my local MySQL db.
GWT + Appengine + NoSQL data store +synchronize data with local MySQL db using e.g. Approcket (http://code.google.com/p/approcket/). I'll then run my data mining algorithms on the synchronized MySQL db.
Comments:
I like the possibility on 1. and 3. to deploy my app quickly with the advantage of scaling. With solution 2. I'll need to host it myself. Moreover I don't have a standardized (google) solution.
On solution 1. and 3. I worry, that it is very complicated to keep my cloud and local data in sync.
On solution 2. I can't use the advantages of AppEngine. Moreover it'll be more difficult to set it up than the solutions 1. and 3. proposed from google.
What do you think is the best and easiest solution for my problem?
Thank you very much!

Cloudant vs JustOneDB - Which one to choose?

I am trying to decide which add-on DB to use with my application when I deploy it on AppHarbor. I've two choices: JustOneDB or Cloudant. I am planning to develop a web and mobile application, which will should work with Terabytes of data.
I am searching for the easiest solution to deploy my database, without me needing to partition the DB and the tables. I want a DB that can handle a very large amount of data, but takes the sharding and partitioning architecture building away from the developer.
I also want a solution that will allow me to easily backup my large database and easily restore it.
From what I've read, Cloudant and JustOneDB are the two most popular ones, and those are available as add-ons on AppHarbor for easy deployment.
I need your recommendations on which one I should go with, the cons and pros of each one. I am developing my application in ASP.NET and C# inside Visual Studio.
There's a recent post on the Cloudant blog about using the MyCouch .Net library with Cloudant databases:
https://cloudant.com/blog/how-to-customize-quorum-with-cloudant-using-mycouch/
Cloudant also offers free hosting up to a greater than $5 bill and can work with Apache CouchDB's replication if you want to develop locally and sync it to the cloud for production/deployment. Multi-master replication isn't something many other databases offer.
Best of luck with your application!
MyCouch.Cloudant was just released. Except from CouchDb and Cloudant core feature support the MyCouch.Cloudant NuGet package adds support for Searches. There will be more Cloudant specific features added to this. It's written in C# and supports .Net40, .Net45 and Windows store apps.
You will find more info about MyCouch in the GitHub repo.
You should probably also consider MongoDB and RavenDB.
If you're just starting out, your first concern should probably be to find a database that'll let you quickly get started and build the application you have in mind. When the application becomes a success and actually attracts terabytes of data, you can start worrying about how to scale it. If the application is soundly architected, adapting it to use an appropriate datastore should not be a monumental task.
Comment removed by originator.

To CouchDB or not to?

Note: (I have investigated CouchDB for sometime and need some actual experiences).
I have an Oracle database for a fleet tracking service and some status here are:
100 GB db
Huge insertion/sec (our received messages)
Reliable replication (via Oracle streams on 4 servers)
Heavy complex queries.
Now the question: Can CouchDB be used in this case?
Note: Why I thought of CouchDB?
I have read about it's ability to scale horizontally very well. That's very important in our case.
Since it's schema free we can handle changes more properly since we have a lot of changes in different tables and stored procedures.
Thanks
Edit I:
I need transactions too. But I can tolerate other solutions too. And If there is a little delay in replication, that would be no problem IF it is guaranteed.
You are enjoying the following features with your database:
Using it in production
The data is naturally relational (related to itself)
Huge insertion rate (no MVCC concerns)
Complex queries
Transactions
These are all reasons not to switch to CouchDB.
Of course, the story is not so simple. I think you have discovered what many people never learn: complex problems require complex solutions. We cannot simply replace our database and take the rest of the month off. Sure, CouchDB (and BigCouch) supports excellent horizontal scaling (and cross-datacenter replication too!) but the cost will be rewriting a production application. That is not right.
So, where can CouchDB benefit you?
I suggest that you begin augmenting your application with CouchDB applications. Deploy CouchDB, import your data into it, and build non mission-critical applications. See where it fits best.
For your project, these are the key CouchDB strengths:
It is a small, simple tool—easy for you to set up on a workstation or server
It is a web server. It integrates very well with your infrastructure and security policies.
For example, if you have a flexible policy, just set it up on your LAN
If you have a strict network and firewall policy, you can set it up behind a VPN, or with your SSL certificates
With that step done, it is very easy to access now. Just make http or http requests. Whether you are importing data from Oracle with a custom tool, or using your web browser, it's all the same.
Yes! CouchDB is an app server too! It has a built-in administrative app, to explore data, change the config, etc. (like a built-in phpmyadmin). But for you, the value will be building admin applications and reports as simple, traditional HTML/Javascript/CSS applications. You can get as fancy or as simple as you like.
As your project grows and becomes valuable, you are in a great position to grow, using replication
Either expand the core with larger CouchDB clusters
Or, replicate your data and applications into different data centers, or onto individual workstations, or mobile phones, etc. (The strategy will be more obvious when the time comes.)
CouchDB gives you a simple web server and web site. It gives you a built-in web services API to your data. It makes it easy to build web apps. Therefore, CouchDB seems ideal for extending your core application, not replacing it.
I don't agree with this answer..
I think CouchDB suits especially well fleet tracking use case, due to their distributed nature. Moreover, the unreliable nature of gprs connections used for transmitting position data, makes the offline-first paradygm of couchapps the perfect partner for your application.
For uploading data from truck, Insertion-rate can take a huge advantage from couchdb replication and bulk inserts, especially if performed on ssd-based couchdb hosting.
For downloading data to truck, couchdb provides filtered replication, allowing each truck to download only the data it really needs, instead of the whole database.
Regarding complex queries, NoSQL database are more flexible and can perform much faster than relation databases.. It's only a matter of structuring and querying your data reasonably.

Resources