We're setting up a MongoDB cluster with AWS' DocumentDB.
DocumentDB allows to set up replication in the cloud but we also want to have local replicas. Is it possible to have a local replica of the mongo cluster in the cloud?
Thanks ahead!
Yes. You can use DocumentDB Change Streams to achieve this. In this GitHub you can find sample implementations of it. In a series of DocumentDB from the dining table episode 1, you can also learn how to use this feature for your goal.
Related
I am trying to migrate Cassandra cluster onto AWS Keyspaces for Apache Cassandra.
After the migration is done how can I verify that the data has been migrated successfully as-is?
Many solutions are possible, you could simply read all rows of a partition and compute a checksum / signature and compare with your original data for instance.Then iterating through all your partitions, then doing it for all your tables. Checksums work.
You could use AWS Glue to perform an 'except' function. Spark has a lot of usefull functions for working with massive datasets. Glue is serverless spark. You can use the spark cassandra connector with Cassandra and Keyspaces to work with datasets in glue. For example you may want to see the data that is not in Keyspaces.
cassandraTableDataframe.except(keyspacesTableDateframe).
You could also do this by exporting both datasets to s3 and performing these queries in Athena.
Here is a helpful repository of Glue and Keyspaces functions including export, count, and distinct.
As a Web Developer everyday we are hearing about new technologies, recently I came to know about Elastic Search it is used to analyze the big volumes of data. I've my data in Mongo DB weather it is possible to use elastic search on it.
MongoDB Atlas has a feature called 'Atlas Search', which implements the Apache Lucene engine. This could be a solution for your search requirements.
See Atlas Search for details
Depends what you mean by "analyze the big volumes of data", what are your requirements? Don't pay to much attention on marketing slogans. Maybe you can connect Elasticsearch with MongoDB via an ODBC driver. Elasticsearch is a document oriented NoSQL database like MongoDB is. As usual both have their pros and cons.
MongoDB is more like a database, i.e. it supports CRUD (Create, Read, Update, Delete) operations and the Aggregation Framework is very powerful.
In Elasticsearch you can store data and analyze or query it. I remember in earlier releases it was not so easy to delete or update existing single documents.
I made a website that uses a sqlite3 database, and I'm trying to get my program on AWS using elastic beanstalk. I've been googling, but can't find any instructions/tutorials on how to get a sqlite3 database running on AWS. Does AWS support sqlite3? Is there some trick to making it work? And if not, what do you recommend? Many thanks.
You can refer to the documentation below which will help you to get to the Beanstalk console and add the SQLite3 to the AWS. This is for the MySQL but you can change the database engine to SQLite3 from Database settings.
https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features.managing.db.html
I am not entirely sure whether this is possible because I have not done it before, but I'll point you in the right direction.
There is documentation that shows you how to get started with a custom Amazon Machine Image (AMI) for your elastic beanstalk environment. So what I would recommend doing is:
install sqlite3 on an EC2 instance,
configure sqlite3 to your requirements,
ensure the instance starts the sqlite3 service on boot,
create an AMI of the instance,
follow this documentation:
https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features.customenv.html
Please let me know how you go and I may be able to help if you get stuck along the way.
It would be epic if AWS released a service/ intermediate server for it. I love SQLite.
However, the problem is that SQLite ~ does not support transactions over NFS. I actually tried storing SQLite on AWS EFS and then mounting EFS from both AWS Lambda & Batch, so I hit this wall organically.
Given that cloud environments are largely multi-machine/node, you really start to see the benefit of a server-based approach like PostgreSQL.
i want to implement web site with big database, its have about 5 million row for each table, and want to know that whats work is better for me, to have database in cloud or have local database. does cloud db have ping problem?
I'm using a cloud database for a similar thing and it work very well.
i think there's no problem to use a cloud database.
I'm planning to code a Web2 online application and I'm looking for best-practices.
I'm talking about online apps similar to web collaborative or billing apps. I'm wondering how they setup their DB? Do they put all the users info in the same DB, or does each user have their own DB?
Take a look at Multi-Tenant Data Architecture.
Generally, everyone is going to be in the same DB. Different users or accounts are differentiated by a unique identifier (obviously). Once they need to scale beyond a single database, they will set up replication or clustering to distribute the load across different DB servers. These servers are mirrors of each other data.