MongoDB HA with DRBD ( Active-Standby) - database

I am working on MongoDB HA. Don't want to go with the HA approach mentioned in mongo official docs due to resource limitation.
I have done the MySQL (Active-Active) HA with DRBD, corosync & pacemaker. I have done mongoDB HA (Active- Standby) with DRBD, corosync & pacemaker. I have tested it a small scale data. It's working fine.
I read that mongoDB with DRBD is not good approach & it can lead to data corruption.
Should i go with this approach ??
if not any other approach apart from official one ??

If you're doing Active/Passive (Active/Standby) clustering, there is no difference between a MongoDB on DRBD vs. MongoDB on any other block device.
If you had multiple active MongoDB's accessing a dual-primary (Active/Active) DRBD device, that's where the potential for corruption would come in.

Related

Which M.. tier do I need for my app (MongoDB)?

I’m new to MongoDB,
I have an Ionic App for a local restaurant where you have some products which you can order. The app also have a register to create some users. There is also a Angular Web App where you can put in products and look up users etc.
Both apps are connected to the MongoDB. Unfortunately I don’t have any clue which data plan is necessary for the deployment of these two apps.
Is it maybe better to switch to Firebase?
Can anybody help me please?
Best regards
Basti
Selecting a tier in MongoDB-Atlas depends on various factors like data size, IOPS, Price etc.. As you're saying this is for a local restaurant & I would assume there could be less traffic to the App, then in that case you can go with M10 cause that's where MongoDB Atlas really provide some valuable features to database which is used in production environment. For development environment you can give a try with M5 cluster. Some features you can enjoy using M10 or above are :
Dedicated Cluster : Clusters deploy each mongod process to its own instance, Where as M0, M2 & M5 are in shared environment, So Atlas will automatically upgrade the cluster to latest version upon availability which is not preferred in realtime Apps as there could be a functionality/package that can break with upgrades.
Queryable backups : You can take advantage of querying specific continuous backup snapshot, Which is really helpful to restore a part of data instead of entire dataset which is back'd up a day ago.
Supports Network Peering : As most of projects now a days use cloud platforms to deploy Apps, Clusters >= M10 supports network peering.
Metrics & Performance Advisor : This is one most important thing which you'll get benefited using clusters >= M10. Using alerts you'll get to know which kind of queries are taking much time, How many connections are open at a given time, monitor CPU threshold & get alerted, additionally MongoDB can suggest you with indexes to be created for better performance of queries being run on collections which fail to use index already present in.
At the end of the day, Remaining most other features remain almost same. From my experience usually you'll estimate & pre-pay certain amount for MongoDB Atlas account for around 3 years, Where you don't get back anything if you've not utilized all of it. Also you can upgrade & downgrade clusters at anytime manually or can be automatically scaled up or down based on incoming traffic.
Ref : cluster-tier

Writing to many replicas of MongoDB

Let's say I have a distributed application that writes to the database. To decrease latency one of the instances (app + database) is hosted in Australia and another one is hosted in Europe. Both instances of database need to share the same data.
So what we are after here is data locality. The reason for it is obvious: we don't want users in Australia shooting requests to our database in Europe because that would increase latency.
The natural choice would be to deploy both instances of database in a one replica set. But it seems that with MongoDB you can write to only one Mongo instance within replica set.
What are the strategies with MongoDB to have two instances of database, sharing the same data, to which you can write to? Or is the MongoDB just a wrong choice for this requirement?
Huge subject, but i'll try to give you a short and simple answer :
As your two instances must share the same sata, you can't use sharded cluster with zones . But replica set can be your solution :
Create a replica set with at least the following :
a server in a 'neutral' zone. It will be the primary server (set a priority higher). This server, as long as it still primary, will handle your write operations.
your two existing servers with lower priority.
Set in your application Read Preference to 'nearest'. This way, your read operations will be handle by the server having the mower network latency, regardless of the Master/secondary roles of server.
But i highly recommand you to check the documentation, to see how correctly deploy this architecture. Here's a good start
EDIT
Some consideration about this solution :
This use case is one of the rare use case where it's better to read from secondaries. In general, prefer reading your data from MASTER, since replica set is done for high availability, not for scalability.
If some of your data can be 'located' to be accessed faster, consider sharding collections as a better solution

What is redundancy for databases?

Im using mLab and got the message "Sandbox databases do not have redundancy and therefore are not suitable for production" while using the free tier Sandbox.
It means that you do not have any protection against service unavailability and data loss because there are no replicas. You should generally use a 3 or 5 node replica set in production to protect against both things when a failure occurs.
P.S. I'm curious why you're using mLab rather than MongoDB Atlas?

Migrating Solr Cloud cluster over new cloud vendor

We need to move our solr cloud cluster from one cloud vendor to another, the cluster is composed of 8 shards with 2 replica factor spread among 8 servers with roughly a total of 500GB worth of data.
I wonder what are the common approaches to migrate the cluster but specially its data with the less impact in availability and performance etc..
I was thinking in some sort of initial dump copy to then synchronize them catching up the diff (which could be huge) after keeping them in sync just switch whenever everything is ready from the other side.
Is that something doable? what tools should/could I use?
Thanks!
You have multiple choices depending on your existing setup and Solr version:
As mentioned earlier, make use of backup and restore APIs from Collections API
If you have Solr 6 and above, I would recommend exploring the option of CDCR, which is Solr's native Cross Data Centre Replication.
Reindexing onto the new cluster and then leverage Solr Collection Aliasing to change your application end points to the target provider upon the completion of reindexing

Improving database record retrieval throughput with appengine

Using AppEngine with Python and the HRD retrieving records sequentially (via an indexed field which is an incrementing integer timestamp) we get 15,000 records returned in 30-45 seconds. (Batching and limiting is used.) I did experiment with doing queries on two instances in parallel but still achieved the same overall throughput.
Is there a way to improve this overall number without changing any code? I'm hoping we can just pay some more and get better database throughput. (You can pay more for bigger frontends but that didn't affect database throughput.)
We will be changing our code to store multiple underlying data items in one database record, but hopefully there is a short term workaround.
Edit: These are log records being downloaded to another system. We will fix it in the future and know how to do so, but I'd rather work on more important things first.
Try splitting the records on different entity groups. That might force them to go to different physical servers. Read entity groups in parallel from multiple threads or instances.
Using cache mght not work well for large tables.
Maybe you can cache your records, like use Memcache:
https://developers.google.com/appengine/docs/python/memcache/
This could definitely speed up your application access. I don't think that App Engine Datastore is designed for speed but for scalability. Memcache however is.
BTW, if you are conscious about the performance that GAE gives as per what you pay, then maybe you can try setting up your own App Engine cloud with:
AppScale
JBoss CapeDwarf
Both have an active community support. I'm using CapeDwarf in my local environment it is still in BETA but it works.
Move to any of the in-memory databases. If you have Oracle Database, using TimesTen will improve the throughput multifold.

Resources