I'm new to cloud programming but it's my understanding that the server I'm building can be run in multiple regions as multiple "instances" to improve user experience. In other words, my server code is just running on several different machines at once, all independently of each other. However, things like a database server for example should only be run as a single instance no matter how many server instances there are.
Is there a way to do this using Google App Engine? More specifically, is there a way to categorize portions of the project as scalable and others as non-scalable? My initial plan was simply to make two different projects - one which scales automatically and one which does not scale - and have them communicate through network requests. This could (potentially?) have the added benefit of spreading the resources used by my project across multiple cloud projects, reducing the per-project usage for billing purposes.
I'd love to know if I'm on the right track, or if what I'm doing is over-complicating things.
Yes, this is possible, you will need to create 2 separate App Engine service and set one with automatic scaling and the other with manual scaling and setup how many instances you want for that service up and running all the time. You can read this documentation for more details on the types of scaling and this documentation on how to set this up in your app.yaml file.
That being said I don't think this will reduce your cost. In fact the opposite might happen, as App Engine is designed to reduce wasted resources as much as possible with auto scaling, and if you were to use Manual Scaling with a bigger load of instances than what you actually need you will be charged more for that, so you need to consider this into the design of your infra, I also recomment taking a look at the App Engine pricing documentation.
Related
In Naming Developer Environments Google suggests 2 approaches for implementing different CI/CD environments for GAE apps
based on different services (which used to be called modules) inside the same project/app:
If you choose to create your microservices application by using only
multiple services, you can create a single App Engine project for each
of your environments and name them accordingly, such as web-app-dev,
web-app-qa, and web-app-prod.
based on different projects/apps:
Alternatively, if you choose to create your microservices application
by using multiple projects, you can achieve the same separation
between environments, but you'll need to use more projects, such as
web-app-dev, web-app-prod, user-service-dev, and
user-service-prod. You will need to use code patterns to ensure that
the dev projects only call other dev projects and the prod
projects only call other prod projects.
The phrasing in the above documentation snippets appears to suggest the 2 approaches would be roughly equivalent, but there is at least one significant difference between the 2 approaches: a project/app based approach ensures data isolation, while a service/module based one does not - the datastore and memcache are shared by all services.
A more detailed comparison between the 2 approaches from the isolation perspective is documented in Comparison of service isolation and project isolation:
The following table provides a comparison between using multiple
services and multiple projects in a microservices architecture:
My question is: apart from the above-mentioned differences, are there other advantages of using the project-based approach versus the service-based one? Or anything that may be considered a disadvantage?
The project based approach also allows you to separate billing concerns, and IAM roles.
You could go as far as having different credits cards charged, or just set billing limits independently (who wants prod to go down because a dev bug exceeded your billing limit?). You'll also get separate billing reports, so we can determine what prod vs dev cost you more easily.
The service based approach potentially minimizes additionally administrative work. For example, if for some reason you needed to set up VPNs or other networking aspects, a single project means you only need to configure it once, rather than once per project.
I have two parts of a Project that I think predate Google Cloud Console and now show up in Google Cloud Console separately:
An App Engine Project
A Google APIs and Google Cloud Storage Project
These two "Projects" are part of the same real-life software project.
Should I try to eventually migrate my API and Storage Project into the App Engine Cloud Project? Would there be any benefits?
There really isn't any easy way to do this, and the benefits will probably not outweigh the costs. Unless you're merging two app engine apps into one (that can result in significant cost benefits) it probably doesn't make a difference.
You should definitely try to migrate your API and Storage Project into the App Engine Cloud Project (by enabling the API in the associated Cloud Project, copying your resources and re-creating your credentials).
This will make it easier for you to use Google Cloud Datastore and other Cloud APIs in association with your App Engine application.
I think the question you should ask yourself is if these two components are distinct parts of your infrastructure or if they're effectively the same. It's kind of a subjective and abstract question, but ideally you want your software stack broken into logically cohesive portions.
There's also a more practical consideration related to the size of your organization. If you're working with a small organization with one or two teams, it's likely that you'll want to have a more "monolithic" infrastructure. Larger organizations will likely want an infrastructure based more on "microservices" where individual pieces of the pie are broken down into smaller pieces.
I suppose a good general rule of thumb you could use is that the number of projects you have should be on the order of the number of teams you have working on different components of your software stack. In other words, if you have a handful of teams working on a handful of components, you should have a handful of projects. If you have hundreds of teams working on hundreds of components, you should have hundreds of projects.
We are starting a new project that requires two main components:
Backend for task management, e.g retrieve a task from a queue and according to some specific logic validate it.
Run a real compiler on that specific task and create an executable that an end user should receive.
We love app engine, however the second part will require a concrete instance where an actual compiler will have to be installed, app engine is not capable here. We were thinking to mix both app engine and aws instances to accomplish the task (part 1 will be app engine and part 2 will be aws).
All of our senses say it's a bad idea:
unneeded traffic between the two providers, someone needs to pay for that unfortunately.
We'll have to deal with two systems, two deployments process, each system has its own quirks --> double the work.
But we love app engine.
Does anyone has any experience in combining the two systems? any recommendations ?
There's no reason why what you suggest won't work, especially if you separate your concerns well, by exposing a clean 'compiler' interface on AWS or a similar service. Yes, you will have to pay for traffic between the two services, but this is unlikely to be substantial. If you are serving up the end result to the user, you can link them directly to AWS, rather than fetching it with your app first.
AWS's EC2s are literally just vanilla linux boxes in the sky. I would also throw out the suggestion of just moving to it completely. Porting your system over may be easier than it sounds if you're unix savvy.
I am new to the area of web development and currently interviewing companies, the most favorite questions among what people ask is:
How do you scale your webserver if it
starts hitting a million queries?
What would you do if you have just one
database instance running at that
time? how do you manage that?
These questions are really interesting and I would like to learn about them.
Please pour in your suggestions / practices (that you follow) for such scenarios
Thank you
How to scale:
Identify your bottlenecks.
Identify the correct solution for the problem.
Check to see you you can implement the correct solution.
Identify alternate solution and check
Typical Scaling Options:
Vertical Scaling (bigger, faster server hardware)
Load balancing
Split tiers/components out onto more/other hardware
Offload work through caching/cdn
Database Scaling Options:
Vertical Scaling (bigger, faster server hardware)
Replication (active or passive)
Clustering (if DBMS supports it)
Sharding
At the most basic level, scaling web servers consists of writing your app in such a way that it can run on > 1 machine, and throwing more machines at the problem. No matter how much you tune them, the eventual scaling will involve a farm of web servers.
The database issue is way more sticky to deal with. What is your read / write percentage? What kind of application is this? OLTP? OLAP? Social Media? What is the database? How do we add more servers to handle the load? Do we partition our data across multiple dbs? Or replicate all changes to loads of slaves?
Your questions call more questions, i.e. in an interview, if someone just "has the answer" to a generic question like you've posted, then they only know one way of doing things, and that way may or may not be the best one.
There are a few approaches I'd take to the first question:
Are there hardware upgrades that may get things up enough to handle the million queries in a short time? If so, this is likely an initial point to investigate.
Are there software changes that could be made to optimize the performance of the server? I know IIS has a ton of different settings that could be used to improve performance to some extent.
Consider going into a web farm situation rather than use a single server. I actually did have a situation where I worked once where we did have millions of hits a minute and it was thrashing our web servers rather badly and taking down a number of sites. Our solution was to change the load balancer so that a few of the servers served up the site that would thrash the servers so that other servers could keep the other sites up as this was in the fall and in retail this is your big quarter. While some would start here, I'd likely come here last as this can be opening a bit can of worms compared to the other two options.
As for the database instance, it would be a similar set of options to my mind though I may do the multi-server option first as redundancy may be an important side benefit here that I'm not sure it is as easy with a web server. I may be way off, but that is how I'd initially tackle this.
Use a caching proxy
If you serve identical pages to all visitors (say, a news site) you can reduce load by an order of magnitude by caching generated content with a caching proxy such as Varnish or Apache Traffic Server.
The proxy will sit between your server and your visitors. If you get 10,000 hits to your front page it will only have to be generated once, the proxy will send the same response to the other 9999 visitors without asking your app server again.
probably before developer starting to develop the system,
they will consider the specification of the server
maybe you can decrease use of SEO and block it from search engine to craw it
(which is the task that taking a lot of resource)
try to index everything well and avoid to making search easily
Deploy it on the cloud, make sure your web server and webapp cloud ready and can scale across different nodes. I recommend cherokee web server (very easy to load balance across different servers, and benchmarks proves faster than Apache,). For ex, google cloud (appspot) needs your web app to be Python or Java
Use caching proxy eg. Nginx.
For database use memcache on some queries which are suppose to be repeated.
If the company wants data to be private , build a private cloud , Here , Ubuntu is doing very good job at it fully free and opensource : http://www.ubuntu.com/cloud/private
I am trying to figure out the best way to deploy a single Google App Engine application across multiple regions.
The same code is to be used, but the stored data is specific to each region. Motivating examples are hyperlocal review sites, like yelp.com or urbanspoon, where restaurants and other businesses to review are specific to a region (e.g. boston.app.com, seattle.app.com).
A couple options include:
Create multiple GAE applications,
and duplicate the code across them.
Create a single GAE application, and store all data for all regions
in the same Datastore, with a region
identifier field for each model
delimiting the relevant region.
Some of the trade-offs:
Option 2 seems like it will be increasingly inefficient (space: replicating a region identifier for each record of every model; time: filtering/indexing on the identifier for every query).
Option 1 requires an app ID for every region, while GAE only allows 10 apps per account. Moreover, deploying the code across every region, as well as Datastore migration, seems like it could be a pain to manage.
In the ideal world, I would have a single application instance. From that instance, I could route between subdomains (like here), as well as have a separate Datastore for each subdomain. But I believe GAE only allows a single datastore per application.
Does anyone have ideas on the best way to solve this problem? Or options that I am not considering?
Thanks for your time!
I would recommend your approach #2. Storage space is cheap (and region codes are short), and datastore performance does not degrade with size, unlike most databases. Using a single app also makes for easier management and upgrades, and avoids any issues with the TOS (which prohibit sharding your app to avoid billing charges).
If you use source code revision control, then it is not too bad to push identical code into multiple apps. You could set a policy whereby only full-fledged tags are allowed to be pushed up to GAE. Another option is to make your application version the same as the revision number.
With App Engine, I (and I believe most others) always migrate data from within my model code. You can't easily do bulk migrations in GAE and the usual solution is to migrate data as you come across it in code. In this way, you can keep your models pretty much identical across applications.
Having said that, I would probably still go with a unified application. It's more future-proof. What if users want to join their L.A. identity and their New York identity? Or what if an advertiser offers you a sweet deal for you to run some marketing reports on your own data?
Finally, a few bytes of data doesn't matter so much on App Engine. As your site grows, you will very quickly discover that you will always be bumping into ceilings. GAE limits are extremely small compared to a traditional web server and so you will have to work within those limits anyway. For example, you can only fetch 1,000 records at a time. So your architecture will already support a piecemeal paging solution. So don't worry too much about an extra field or two in your record.