Which M.. tier do I need for my app (MongoDB)? - database

I’m new to MongoDB,
I have an Ionic App for a local restaurant where you have some products which you can order. The app also have a register to create some users. There is also a Angular Web App where you can put in products and look up users etc.
Both apps are connected to the MongoDB. Unfortunately I don’t have any clue which data plan is necessary for the deployment of these two apps.
Is it maybe better to switch to Firebase?
Can anybody help me please?
Best regards
Basti

Selecting a tier in MongoDB-Atlas depends on various factors like data size, IOPS, Price etc.. As you're saying this is for a local restaurant & I would assume there could be less traffic to the App, then in that case you can go with M10 cause that's where MongoDB Atlas really provide some valuable features to database which is used in production environment. For development environment you can give a try with M5 cluster. Some features you can enjoy using M10 or above are :
Dedicated Cluster : Clusters deploy each mongod process to its own instance, Where as M0, M2 & M5 are in shared environment, So Atlas will automatically upgrade the cluster to latest version upon availability which is not preferred in realtime Apps as there could be a functionality/package that can break with upgrades.
Queryable backups : You can take advantage of querying specific continuous backup snapshot, Which is really helpful to restore a part of data instead of entire dataset which is back'd up a day ago.
Supports Network Peering : As most of projects now a days use cloud platforms to deploy Apps, Clusters >= M10 supports network peering.
Metrics & Performance Advisor : This is one most important thing which you'll get benefited using clusters >= M10. Using alerts you'll get to know which kind of queries are taking much time, How many connections are open at a given time, monitor CPU threshold & get alerted, additionally MongoDB can suggest you with indexes to be created for better performance of queries being run on collections which fail to use index already present in.
At the end of the day, Remaining most other features remain almost same. From my experience usually you'll estimate & pre-pay certain amount for MongoDB Atlas account for around 3 years, Where you don't get back anything if you've not utilized all of it. Also you can upgrade & downgrade clusters at anytime manually or can be automatically scaled up or down based on incoming traffic.
Ref : cluster-tier

Related

estimate server cost using aws

pardon me if this isn't the place to ask such question. But I have finished my project and thinking to deploy it using amazon elastic beanstalk and got a huge worry. My worry is that my project's database can be humongous. It's a community website like a reddit that users can create a page that other people can post text,link,pic,video(youtube). Also users get a profile page, and are able to comment as well. This was my first big project, and I don't want to pay more than $200 for server fee every month.
should I still deploy this? or just be happy I proved myself I can make this? how much do you think I'll have to pay assuming I get about (max)100 users?
For starters, you can look at the costs for any AWS service by going to that services 'homepage' and clicking "Pricing" usually on the left side. I typically get to the pricing page by Googling "AWS <> Pricing" (e.g. "AWS EC2 Pricing").
Whether or not you incur any cost and what that cost is, really depends on how you deploy your website. Questions like, is your database self-managed (i.e. installed on your own EC2 instance) or are you using RDS? Are you using S3 to store static content? Will you be serving your web contents via Cloudfront (AWS' CDN)?
Many of the basic services (EC2, S3, RDS, etc.) have free-tiers which will allow you to use them for free, provided you stay within certain (and usually very low) levels of usage.
If your database is going to get VERY large, and cost is your primary concern, it's usually more cost-effective to manage it on your own EC2 server, however then things like updates, security, scaling, backups, etc. all become your problem to deal with and often can incur additional cost (i.e. your backups will likely require volume snapshots which will cost you vs. RDS' backups are free).
If you're going to have a significant amount of static content, it will be more cost effective to host it on your own EC2 server, however again, all maintenance will be your responsibility, such as backups and scaling (which can incur cost) to meet demand vs. all of that is taken care of by S3, though you pay each time a file is accessed.
If cost is your primary concern, my suggestion is to start out your development using the AWS services (RDS, S3 maybe Elastic Beanstalk), though that can often complexity to your development efforts (dealing with authentication, additional SDK's, etc.). You can typically and pretty easily roll out your own service (MySQL, EBS filesystem to replace S3, etc.) as replacement. Additionally, again depending on your roll-out, there can be network traffic costs. Usually, this isn't a problem if you're doing things the way Amazon wants you to, but...it wouldn't be unheard of.
To get you started:
https://aws.amazon.com/s3/pricing/
https://aws.amazon.com/ec2/pricing/
https://aws.amazon.com/rds/pricing/
Additionally, there is a nifty calculator which can help you estimate your costs. You will need to know what your traffic expectations as well as service requirements are, but you can play around with those numbers.
https://calculator.s3.amazonaws.com/index.html
You don't have to worry about charges. As AWS has a Free tier which offers most of the services free for 12 months.
https://aws.amazon.com/free/
You can have 01 t2.micro (1 vCpu + 1 gb RAM) instance for Elastic beanstalk (EBS) with auto scaling turned off, purchase Reserved instances by making 1 or 3 year all upfront payment to save more.
https://aws.amazon.com/ec2/purchasing-options/reserved-instances/
You must use Relational Database Service (RDS) for database, instead of installing db in your EBS instance & store files on Simple Storage Service (S3).
RDS Pricing is $0.100 per GB-month, after first 12 months. So you dont need to worry about the database size.
After first 12 months, your monthly bill would be less than USD 50.
On production, we are using 2 t2.micro instances (Windows) with 1 Ms Sql database on RDS & we only have to pay for the extra EC2 machine, about USD 14 per month.
I did find some relevant information in the below stackoverflow thread which talks about the capabilities of t2.micro,t2.small & t2.medium ec2 instances.Do have a look at it.
several t2.micro better than a single t2.small or t2.medium

Moving app to GAE

Context:
We have a web application written in Java EE on Glassfish + MySQL with JDBC for storage + XMPP connection for GCM CCS upstream messaging + Quartz scheduler library. Those are the main components of the app.
We're wrapping up our app development stage and we're trying to figure out the best option for deploying it, considering future scalability, both in number of users and their geographical location (ex, multiple VMS on multiple continents, if needed). Currently we're using a single VM instance from DigitalOcean for both the application server and the MySQL server, which we can then scale up with some effort (not much, but probably more than GAE).
Questions:
1) From what I understood, Cloud SQL is replicated in multiple datacenters across the world, thus storage-wise, the geographical scalability is assured. So this is a bonus. However, we would need to update the DB interaction code in the app to make use of Cloud SQL structure, thus being locked-in to their platform. Has this been a problem for you ? (I haven't looked at their DB interaction code yet, so I might be off on this one)
2) Do GAE instances work pretty much like a normal VM would ? Are there any differences regarding this aspect ? I've understood the auto-scaling capability, but how are they geographically scalable ? Do you have to select a datacenter location for each individual instance ? Or how can you achieve multiple worldwide instance locations as Cloud SQL does right out of the box ?
3) How would the XMPP connection fare with multiple instances ? Considering each instance is a separate VM, that means each instance will have a unique XMPP connection to the GCM CCS server. That would cause multiple problems, ex if more than 10 instances are being opened, then the 10 simultaneous XMPP connections limit cap will be hit.
4) How would the Quartz scheduler fare with the instances ? Right now we're saving the scheduled jobs in the MySQL database and scheduling them at each server start. If there are multiple instances, that means the jobs will be scheduled on each instance; so if there are multiple instances, the jobs will be scheduled multiple times. We wouldn't want that.
So, if indeed problems 3 & 4 are like I described, what would be the solution to those 2 problems ? Having a single instance for the XMPP connection as well as a single instance for the Quartz scheduler ?
1) Although CloudSQL is a managed replicated RDBMS, you still need to chose a region when creating an instance. That said, you cannot expect the latency to be great seamlessly globally. You would still need to design a proper architecture to achieve that.
2) GAE isn't a VM in any sense. GAP is a PaaS and, therefore, a runtime environment. You should expect several differences: you are restricted to Java, PHP, GO and Python, the mechanisms GAE provide do you out-of-the-box and compatible third-parties libraries. You cannot install a middleware there, for example. On the other hand, you don't have to deal with anything from the infrastructure standpoint, having transparent high-availability and scalability.
3) XMPP is not my strong suit, but GAE offers a XMPP service, you should take a look at it. More info: https://developers.google.com/appengine/docs/java/xmpp/?hl=pt-br
4) GAE offers a cron service that works pretty well. You wouldn't need to use Quartz.
My last advice is: if you want to migrate an existing application, your best choice would probably be GCE + CloudSQL, not GAE.
Good luck!
Cheers!

Is it a best practice in continuous deployment to deploy to all production servers immediately or just to a subset at first?

We use CD in our project and since the application is used world wide we use more than one data center (one per region). Each data center hosts an isolated instance of the application (each regional deployment uses its own DB, application server etc). Data is not shared between data centers.
There are two different approaches that we can take:
Deploy to integration server (I) where all tests are run, then
deploy to the first data center A and then (once the deployment to A
is finished) to a data center B.
Region A has a smaller user base and to prevent outage in both A
and B caused by a software bug that was not caught on the
integration server (I), an alternative is to deploy to the integration server and then "bake" the code in
region A for 24 hours and deploy the application to data center B
only after it was tested in production for 24 hours. Does this
alternative go against CI best practices since there is no
"continuous" deployment in this case?
There is a big difference between Continuous Integration and Continuous Deploy. CI is for a situation where multiple users work on the same code base and integration tests are run repetitevely for multiple check ins so that integration failures are handled quickly and programatically. Continuous deploy is a pradigm which encapsulate deploying quickly and programmatically accepting your acceptance tests so that you deploy as quick as feasible (instead of the usual ticketing delays that exist in most IT organizations). The question you are asking is a mix for both
As per your specific question, your practice does go against the best practices. If you have 2 different data centers, you have the chance of running into separate issues on the different data centers.
I would rather design your data centers to have the flexibility to switch between current and next version. That way , you can deploy your code to the "next" environment, run your tests there. Once your testing confirms that your new environment is good to go , you can switch your environments from current to next .
The best practice, as Paul Hicks commented, is probably to decouple deployment from feature delivery with feature flags. That said, organizations with many production servers usually protect their uptime by either deploying to a subset of servers ("canary deployment") and monitoring before deploying to all, or by using blue-green deployment. Once the code is deployed, one can hedge one's bet further by flipping a feature flag only for a subset of users and, again, monitoring before exposing the feature to all.

Nutch 2.1 (HBase, SOLR) with Amazon Web Services

I experienced Nutch 2.1 locally without any difficulty. I have also tried on a 3 machine distributed cluster. We're now discussing whether to run it with Amazon Web Services or not. I do not have much experience with AWS. My question is that, is it possible and neccessary to try Nutch2.1 crawling and indexing parts on the cloud. What possible advantages and disadvantages we will have?
Thanks.
If you have a cluster with same capacity as that of a AWS cluster (that you plan to invest in) then there is no advantage except for #1 below.
Here are several factors that you should think about before switching to AWS:
Locality of hosts crawled: If you are sitting in Europe and the websites that you want to crawl are hosted far away ... say Australia. If you buy AWS nodes located in Australia, it would be much faster for crawling that data rather than crawling from Europe.
Cost: For using AWS machines, you need to pay then on hourly basis. Can you afford that ? If not better use your own machines
Current cluster capacity : does your current cluster has ample capacity and space to handle the amount of crawled data ? I think there wont be problem in terms of computational speed as Nutch runs on Hadoop which was designed to run on commodity hardware. Can your cluster accommodate entire data that is being fetched by the crawler.
Volume of data : What is a rough estimate of the data that is being crawled ? If its less, then it makes no sense to have an AWS cluster.
Time constraints : Is there any time bound for completion for the crawl ?
If you are doing this for a professional project, then these factors must be given a thought.
If you are doing it for fun/hobby/learning, go ahead and use free tier nodes of AWS. Those are low capacity nodes given free by Amazon. Its fun to learn new things :)
Advantages of AWS:
No need to buy machines for setting up a cluster. get started without having any hardware except a terminal PC.
Locality
No need to look after machines. If a node crashes badly, leave it (its not your problem :P). Buy a new one, add it to the cluster and go ahead.
Disadvantages of AWS:
Costly.
Copying data to any machine outside AWS cluster is charged.
Your data is NOT persisted when u give up the procured AWS nodes. If u want to persist it, pay them and use the S3 storage service.

Will using a Cloud PaaS automatically solve scalability issues?

I'm currently looking for a Cloud PaaS that will allow me to scale an application to handle anything between 1 user and 10 Million+ users ... I've never worked on anything this big and the big question that I can't seem to get a clear answer for is that if you develop, let's say a standard application with a relational database and soap-webservices, will this application scale automatically when deployed on a Paas solution or do you still need to build the application with fall-over, redundancy and all those things in mind?
Let's say I deploy a Spring Hibernate application to Amazon EC2 and I create single instance of Ubuntu Server with Tomcat installed, will this application just scale indefinitely or do I need more Ubuntu instances? If more than one Ubuntu instance is needed, does Amazon take care of running the application over both instances or is this the developer's responsibility? What about database storage, can I install a database on EC2 that will scale as the database grow or do I need to use one of their APIs instead if I want it to scale indefinitely?
CloudFoundry allows you to build locally and just deploy straight to their PaaS, but since it's in beta, there's a limit on the amount of resources you can use and databases are limited to 128MB if I remember correctly, so this a no-go for now. Some have suggested installing CloudFoundry on Amazon EC2, how does it scale and how is the database layer handled then?
GAE (Google App Engine), will this allow me to just deploy an app and not have to worry about how it scales and implements redundancy? There appears to be some limitations one what you can and can't run on GAE and their price increase recently upset quite a large number of developers, is it really that expensive compared to other providers?
So basically, will it scale and what needs to be done to make it scale?
That's a lot of questions for one post. Anyway:
Amazon EC2 does not scale automatically with load. EC2 is basically just a virtual machine. You can achieve scaling of EC2 instances with Auto Scaling and Elastic Load Balancing.
SQL databases scale poorly. That's why people started using NoSQL databases in the first place. It's best to see which database your cloud provider offers as a managed service: Datastore on GAE and DynamoDB on Amazon.
Installing your own database on EC2 instances is very impractical as EC2 has ephemeral storage (it looses all data on "disk" when it reboots).
GAE Datastore is actually a one big database for all applications running on it. So it's pretty scalable - your million of users should not be a problem for it.
http://highscalability.com/blog/2011/1/11/google-megastore-3-billion-writes-and-20-billion-read-transa.html
Yes App Engine scales automatically, both frontend instances and database. There is nothing special you need to do to make it scale, just use their API.
There are limitations what you can do with AppEngine:
A. No local storage (filesystem) - you need to use Datastore or Blobstore.
B. Comet is only supported via their proprietary Channels API
C. Datastore is a NoSQL database: no JOINs, limited queries, limited transactions.
Cost of GAE is not bad. We do 1M requests a day for about 5 dollars a day. The biggest saving comes from the fact that you do not need a system admin on GAE ( but you do need one for EC2). Compared to the cost of manpower GAE is incredibly cheap.
Some hints to save money (an speed up) GAE:
A. Use get instead of query in Datastore (requires carefully crafting natiral keys).
B. Use memcache to cache data you got form datastore. This can be done automatically with objectify and it's #Cached annotation.
C. Denormalize data. Meaning you write data redundantly in various places in order to get to it in as few operations as possible.
D. If you have a lot of REST requests from devices, where you do not use cookies, then switch off session support ( or roll your own as we did). Sessions use datastore under the hood and for every request it does get and put.
E. Read about adjusting app settings. Try different settings (depending how tolerant your app is to request delay and your traffic patterns/spikes). We were able to cut down frontend instances by 70%.

Resources