Does clustering NestJS with PM2, creates multiple instances of MongoDB? - database

The probme is that we have a long running write operation, mongodb lock the collection for other read, writes. Does instantiating multiple instances with pm2 will use the same connection of mongodb and the locking problem will persist ?

Related

How to have AWS RDS synchronous read replication?

For AWS RDS there are 2 way to create a "clone" of your DB :
1/ Read replica : Create a read replica, data is asynchronous, meaning there's a little delay
2/ Multi-AZ standby : Create a standby DB, data is synchronous, meaning it's exactly same all the time; but this is for fail-over and cannot be used unless main DB down.
So the "synchronous" ability is there already, but I don't find any option to have a synchronous read only replica.
For my case, I want to have read replica to reduce read load on main DB, but data is very sensitive so cannot afford to read old data at all, any suggestion for my case here with AWS RDS service ? like making the "standby" being readable.
If you're using Postgres or MySQL, you can deploy to Aurora rather than standard RDS. It uses a shared data storage layer, so gives you synchronous read replicas, in addition to improved data durability and automatic failover.
There is a new option in RDS that allows having two readable standbys with synchronous replication: https://aws.amazon.com/rds/features/multi-az/#Amazon_RDS_Multi-AZ_with_two_readable_standbys. It's a relatively new offering that still needs testing if there is really no lag when you read from replicas.

Realtime Streaming of SQL SERVER (RDS) transactions to NoSQL

I have a situation where I want to stream all the Updates, Deletes and Inserts from my AWS RDS SQL Server to a NoSQL DB such as DynamoDB or RethinkDB.
What I am trying to achieve is to divide my users into critical and non critical databases reducing the load on my rds server and using technologies like rethinkdb or dynamodb streams to send the other set of data (non critical) to front end.
I have thought of various ways to do this:
the most obvious to just asynchronously make entry in both databases though I can end up in a situation where one of the entries may fail.
two is to use RabbitMQ or queing service aws sqs to que the second entry and make sure that it inserts.
(which I want to achive) is if somehow a nodejs service can listen to mssql streams and push the content to nosql.
What can be done in a situation like this?
The profit I am looking for is to store a dataset in nosql that can be served to over 100k users as they all want to see the same data with just some where clause changes and in realtime. This in turn will reduce the RDS Server transactions to a minimum reads and writes.
You can use 2 approach below :
AWS DMS
Or, combining EMR, Amazon Kinesis, and Lambda (with custom scripts)

Docker 1.12: Multiple replicas, single database

With the introduction of the new 'swarm mode' with Docker 1.12, we've been trying to migrate our application on containers and make use of the swarm mode's orchestration & clusters.
Our application requires some initial database scripts to be run for it to start.
We're not packaging the database inside our dockerized application so that it could follow a stateless microservice architecture and multiple containers would eventually talk to a single (at the moment) database instance.
While creating the service, we cannot use --replicas with the create service command as multiple instances would try and create tables on a single database and fail. Although our scripts would check if the database has been set-up and skip the creation but since all containers start simultaneously, it could not be used.
We couldn't find any wait-for kind of mechanism that we could leverage with dockers for this issue. It would have been good if we could only start the second container when the first one had created the database (and exposed the ports) but how can we configure inter-container communication for this?
Alternatively, can tools like flywaydb help in some way?
How should this be used in production?
From the Flyway FAQ:
Can multiple nodes migrate in parallel?
Yes! Flyway uses the locking technology of your database to coordinate multiple nodes. This ensures that even if even multiple instances of your application attempt to migrate the database at the same time, it still works. Cluster configurations are fully supported.
There is no easy way to coordinate this among containers. It basically requires a distributed lock solution. The first container that gets the lock could create db, while, other containers that not get the lock need to wait.
In AWS, you could leverage DynamoDB for it. DynamoDB supports conditional update. The container first tries to create the lock key in DynamoDB with "attribute_not_exists(yourKey)". The first creation will succeed and other creations will be rejected. The first container needs to create another key in DynamoDB to indicate the db is ready. Other containers simply waits till the ready key is created.
Or you could do it in your service deployment script. The script could create the service with 1 replica. Then keep checking if db is created. If yes, scale the service, such as docker service update yourservicce --replicas 5.

Loadbalancer and Solrcloud

I am wondering how loadbalancer can be set up on top of SolrCloud or a load-balancer is not needed?
If the former, shard leaders need to be added to the loadbalancer? Then what if the shard leader changes for some reason? Or all machines in the cluster (including replica) better be added to the load balancer?
If the latter, I guess a cname needs to point to the SolrCloud cluster and it should be round robin DNS?
Any advice from some actual Solrcloud operation experience would be really appreicated.
Usually SolrCloud is used with combination of ZooKeeper, the client uses CloudSolrServer to access to SolrCloud.
The query will be done in following flow.
Note that I only read the source code of Solr partially and there are lot of guesses. Also what I read was source code of Solr 4.1, so it might be outdated.
ZooKeeper holds the list of IPAddress:Port of all SolrCloud servers.
(Client Side) The instance of CloudSolrServer retrieves the list of servers from ZooKeeper.
(Client Side) The instance of CloudSolrServer chooses one of SolrCloud server randomly and sends query to it. (Also LBHttpSolrServer chooses the server in round-robin?)
(Server Side) The SolrCloud server which recieved the query chooses randomly from replica of shards (one server per shard) from server list and redirects the query to it. (Note that all the SolrCloud server holds the server list which can be recieved from ZooKeeper)
The update will be done in same manner as above but also be populated to all servers.
Note that as for SolrCloud, the leader and replica has small difference and we can send query/update to any of the server. It is automatically redirected to other servers.
In short, the loadbalancing is done in both client side and server side.
So you don't need to worry about it.
A Load Balancer is needed and would be implemented by Zookeeper used in conjunction with SolrCloud.
When you use SolrCloud you must setup sharding and replication through the use of Zookeeper either using the embedded Zookeeper server that comes bundled with SolrCloud or you use a stand-alone Zookeeper ensemble (which is recommended for redundancy).
Then you would use SolrCloudClient to send your queries to Zookeeper which will then forward your query to the correct shard among your cluster. SolrCloudClient will require the name and address of all your Zookeeper instances upon instantiation and your Load-Balancing will be handled as appropriate from there.
Please see the following excllent tutorial:
http://www.francelabs.com/blog/tutorial-solrcloud-amazon-ec2/
Solr Docs:
https://cwiki.apache.org/confluence/display/solr/Setting+Up+an+External+ZooKeeper+Ensemble
This quote refers to latest version of Solr, at time of writing was ver. 7.1
Solrcloud - Distributed Requests
When a Solr node receives a search request, the request is routed
behind the scenes to a replica of a shard that is part of the
collection being searched.
The chosen replica acts as an aggregator: it creates internal requests
to randomly chosen replicas of every shard in the collection,
coordinates the responses, issues any subsequent internal requests as
needed (for example, to refine facets values, or request additional
stored fields), and constructs the final response for the client.
Solrcloud - Read Side Fault Tolerance
In a SolrCloud cluster each individual node load balances read
requests across all the replicas in collection. You still need a load
balancer on the 'outside' that talks to the cluster, or you need a
smart client which understands how to read and interact with Solr’s
metadata in ZooKeeper and only requests the ZooKeeper ensemble’s
address to start discovering to which nodes it should send requests.
(Solr provides a smart Java SolrJ client called CloudSolrClient.)
I am in a similar situation where I can't rely on CloudSolrServer for loadbalancing, a possible solution that I am evaluating is to use Airbnb's synapse (http://nerds.airbnb.com/smartstack-service-discovery-cloud/) to reconfigure dynamically an existing haproxy loadbalancer based on the status of the SolrCloud cluster that we get from Zookeeper.

inserting data into a table concurrently - hibernate

I have an application that uses hibernate to inert data into a table.
Database is SQL server. The application itself is deployed in Tomcat 6.
To insert data into DB table - I am using BasicDataSource with minimum configurations for tomcat connection pool (like MaxActive=150, maxIdle =10....)
The problem now is that - I want to add concurrency to the application. In the process - I am making concurrent calls to the business layer method that calls the dao level methods that perform DB inserts. This is resulting in the below error:
Exception occurred java.util.concurrent.ExecutionException: org.hibernate.HibernateException:
Illegal attempt to associate a collection with two open sessions
When I monitor the database, I see that multiple threads are being created but are not being closed.
I am not sure how to proceed further to debug/fix this. Any pointers would be helpful.
If Hibernate is telling you:
Illegal attempt to associate a collection with two open sessions
Basically you are opening two sessions, and have a transaction each and you are trying to save a session in one transaction into another. Ya concurrency is you major problem here. Well That can be tackled if you design your application so as to handle sessions carefully.
Stack-Trace will give you which functions are causing the exceptions. See how long you unit of work lasts with sessions and try reducing those and make sure your sessions are always closed after use.
An application implemented in hibernate can have various patterns.
You need to have session-per-request pattern. In this model, a
request from the client is sent to the server, where the Hibernate
persistence layer runs. A new Hibernate Session is opened, and all
database operations are executed in this unit of work. On completion
of the work, and once the response for the client has been prepared,
the session is flushed and closed. Use a single database transaction
to serve the clients request, starting and committing it when you open
and close the Session. The relationship between the two is one-to-one
and this model is a perfect fit for many applications
Do not use the anti-patterns session-per-user-session or session-per-application
The Transaction And Concurrency in Hibernate Documentation gives in depth analysis and examples

Resources