Google apps engine vs virtual/dedicated server - google-app-engine

Hi I roll out some site on google apps engine but I'm not really happy about it for this reason:
django is not fully supported so no administration etc.
using django patch etc. sometimes give me some problem when new release on gae is out
a registering domain (with go daddy) and add it to google apps engine (no redirection)
seems take too times for display page
waste times to avoid some restrictions eg no sql, query number limited etc.
little scare because I've no great traffic but is rising up so I can predict how much
really cost
so my question is for your experience is better a slicehost style server (some dedicated server cost $50 and give you 1,5TB ) or other or google apps engine is economic and pro.
just for getting idea my site is yappiedo.com and sometimes seems slow
other Idea and suggerstions are welcome
thanks

In grab bag format:
As long as you have a model for making money off your visitors, you do not need to be scared about rising costs. See this blog for what happens when this guy was hit with major traffic. The costs are reasonable (for him at least) and any other non-GAE solution would have died a screaming death, costing him many thousands of dollars in lost revenue.
You are not working on a traditional relational system. You need to design your application to work with GAE's strengths, not to fight GAE's weaknesses. If you spend all your time working around datastore limitations, take your data model back to the drawing board and make one that plays well with GAE. De-normalization is usually the key word.
There are significant slowdowns associated with starting an instance of your application the first time. These go away when your app is hit often enough that it stays in memory
The should be no difference between hitting *.appspot.com and a properly setup google apps domain. Check your DNS configuration.

And the answer is: depends.

Related

Network latency with using different PaaS solutions

Not sure is this belongs to stackoverflow or stackexchange. Mods - please point me towards the right platform.
I am unable to find statistics for an average network latency (due to network calls) and average cost hike (because these platforms also charge for network ingress/egress) due to the amount of web-service calls involved - especially when we use different providers for webapp hosting and database hosting. Pretty much everything is on SSL to add more delay. Is this delay/cost noticeable to a consumer ? I understand caching will help, but there's a limit to that.
Just to add some context - I am wondering if it's a smart decision for a startup to go with PaaS (I am planning to use Cloudbees/mongolab); or prefer rolling out everything on IaaS (like EC2). I guess GAE will not have such issues because everything Datastore is a part of their cloud ?
Thanks !
Disclaimer: I'm working at CloudBees. Contact me ndeloof#cloudbees.com if you want to discuss specific application constraints
CloudBees (and probably other PaaS, can't tell) don't bill for network traffic. Compared to an IaaS that would bill I/O, network, CPU cycles, etc, a PaaS offer a higher level abstraction then pricing model.
Network latency indeed is a major topic being hosted on a PaaS, that may be hosted on another continent. CloudBees offer US-east and EU-west regions to host application. For European customers, being hosted in EU zone, with low-latency network connection is a major improvement.
Hosting on a IaaS vs a PaaS can make sense, but probably not as your startup is in early stage. Use the PaaS as a booster to get quicly online and deliver features to your customers. If/when you're successful, maybe you'd prefer for whatever reason to switch (partially?) to a IaaS, and even later follow Facebook and Google building your own DataCenter :P
We have many startups as CloudBees customers, that benefit high level service to reduce Time-to-market, and focus on company actual business. Even working on an IaaS is fun for engineers, from business perspective it's not really what you want developers to focus on when your company has to be quick on a competitive market - and there's lot's of other topics you can have engineers to have fun with ;)
I don't get your comment on GAE. Google is indeed hosting his own DataStore. A PaaS like CloudBees relies on partner SaaS for Mongo (mongoHQ.com) but as this one is hosted on AWS as well network latency is the same as if CloudBees hosted it's own mongo instances.

Final GAE vs AWS architectural decision

I know this has been asked one way or another before, but most of the main issues to do with GAE stability seem to have been asked around the end of 2008, early 2009, or aren't directly related to games at scale (which I'm interested in).
Basically, I have been arguing back and forth with my business partner about whether to use GAE or AWS for the back-end of our social game engine, and now it's crunch time. I love GAE (Java) for so many reasons, and although it used to be unstable, it's pretty good now. The main argument in favour of AWS is the fact that AWS has proven itself with multiple games running tens of millions of active users per day. The obvious pin-up child for AWS is Zynga, with its Farmville peaking at 80+million DAU. And that's just one of the hugely successful games running on the AWS infrastructure. Remarkable achievement.
So, one way or another it's KNOWN to work. GAE on the other hand doesn't have any examples that I could find doing these sorts of numbers. Not even close. So can I trust it? Is there a single example of a large social game with 2 million+ Daily Active Users, using GAE?
The main considerations for our social game back-end are:
Reliable CDN (Amazon CloudFront/S3 is excellent for this, as is Google's obviously excellent DataStore).
Ability to scale without falling over (AWS-EC2 is proven here, GAE doesn't seem to have examples of large game apps which can run into the 1000s of requests per second. GAE used to be quite unstable in this regard and so is my main concern).
Reliable no-SQL database. (AWS-SimpleDB and Google's DataStore are both excellent for this. We really don't need SQL).
Support/someone to call/contact if there is a problem. (This is one of the biggest worries with GAE. I have no idea who I can call, or if it's even possible. AWS has an SLA and support.)
I look forward to your thoughts, but please also note, this is not intended to start any sort of flame war. I love both systems, but both have their positives and negatives, but I'm about to make an architectural decision that likely won't be undone moving forward.
Regards,
Shane
I've never worked with AWS-EC2 so I'm going to share my knowledge just on the Google App Engine side.
Google App Engine is not meant to be a CDN; though it can serve static content through its powerful infrastructure providing caching close to the users, it does not guarantee the same kind of high quality and high availability service of a real CDN because it's not part of its duties.
Further data:
Maximum size of a file using the BlobStore service: 2 Gigabytes
Maximum size of a static file: 10 Megabytes
Currently App Engine always returns 200 status for static files even on Conditional gets (you have to rely on third party caching library like cirruxcache for example).
Recently Google App Engine team has shut down the App Gallery for one simple reason: too many Toy Apps!
Google wants to counteract this tendency showing successful businesses case studies; here are some of them:
BuddyPoke (viral Facebook app with 65 million installs)
WalkScore (serves 3 million request a day to thousands of real estate partner sites)
Webfillings
Snapabug
Optimizely
Ubisoft Facebook TikTok game
Other interesting case studies here
"We are well aware of downtimes and reliability issues, and are working hard to solve them: Improving App Engine reliability is our number one priority" was recently said by a Google Developer Relations Manager here.
App Engine is still in beta and is an evolving platform so you have to be prepared to deal with downtimes and issues.
Google App Engine team has just launched a preview of App Engine for Business providing 99.9% uptime service level agreement and premium developer support available.
Here is my opinion for what it's worth:
I'm aware that it's a tough call; having read a lot of articles about GAE I have mixed feelings about it because you can go from the recent catastrophic Carlos Ble report to the happy experience of Flower Garden or Gri.pe.
App Engine for Business looks promising and I would consider it in the case of a serious business project plan.
The fresh SDK 1.4.0 is huge and it clearly shows that the Team is really pushing hard to fix some annoying issues (Warmup requests) and relaxing some limitations (10 minutes process on TaskQueue).
Last thing to consider: if you are going to have big numbers, the Google App Engine Team will probably take your app as a successfull case study to follow with a boost of free and powerful Hype.
BuddyPoke is one example of a large-scale social app running on GAE. How large I'm not sure. This article says 30m daily page views (not users):
http://googleappengine.blogspot.com/2008/10/app-engine-case-studies.html
Their facebook page says 2.7 million monthly (not daily) users:
http://www.facebook.com/buddypoke
Although, they are also on a heap of other social networks:
http://www.buddypoke.com/
Personally I decided to go with GAE, for a couple of main reasons:
The unit of scalability is a single request, not a whole instance like it is with AWS.
I can work at a higher level, without having to worry about configuring instances.
If your point 4 is a big one for you, then you may be better off with AWS. With GAE there appears to be nothing you can do, and no-one you can contact.
About a week ago I had an issue with my app - it had suddenly started failing in Google's code, in a location which had been working fine for the last 5 days, ie since I had last uploaded my app. The only way to report issues to Google seems to be via their production issue template, here:
http://code.google.com/p/googleappengine/issues/entry?template=Production%20issue
I reported the issue, and didn't hear anything. Since it's running on Google's servers I was unable to resort to any 'usual' emergency tactics like restarting a server. An hour later and the problem resolved itself - I'm not sure if someone at Google saw my message and fixed something, or if it just went away. I updated my bug report to say the problem was fixed, but even now a week later the issue hasn't been closed or even acknowledged. Also since the issue has to be posted publicly, my app is now getting random hits from bots.
Admittedly my app is currently only in beta and so only has a hundred or so users, and so it wasn't a major incident for me. If I was getting thousands / millions of hits, maybe either Google would have noticed the problem themselves earlier, or they would have paid more attention to my bug report.
On your point 3, even my small app with a small amount of traffic throws occasional data store errors (even during times which aren't reported on the availability charts as outages).
Having said this, I still like GAE (I am using the Python version), and plan to stick with it. The promise of GAE is its scalability - although it falls over occasionally now for my small traffic, it shouldn't fall over any more when it scales to much more traffic (ie your point 2), provided I've coded it correctly to avoid contention. I'll see how it goes.
Finally regarding your point 1, the blobstore and/or static files are more like a CDN on GAE, than the datastore. However for very large amounts of traffic, a real CDN may be cheaper. It's also not necessarily a CDN, see Google app engine & CDN.

Resolving Overloaded Webserver Issues

I am new to the area of web development and currently interviewing companies, the most favorite questions among what people ask is:
How do you scale your webserver if it
starts hitting a million queries?
What would you do if you have just one
database instance running at that
time? how do you manage that?
These questions are really interesting and I would like to learn about them.
Please pour in your suggestions / practices (that you follow) for such scenarios
Thank you
How to scale:
Identify your bottlenecks.
Identify the correct solution for the problem.
Check to see you you can implement the correct solution.
Identify alternate solution and check
Typical Scaling Options:
Vertical Scaling (bigger, faster server hardware)
Load balancing
Split tiers/components out onto more/other hardware
Offload work through caching/cdn
Database Scaling Options:
Vertical Scaling (bigger, faster server hardware)
Replication (active or passive)
Clustering (if DBMS supports it)
Sharding
At the most basic level, scaling web servers consists of writing your app in such a way that it can run on > 1 machine, and throwing more machines at the problem. No matter how much you tune them, the eventual scaling will involve a farm of web servers.
The database issue is way more sticky to deal with. What is your read / write percentage? What kind of application is this? OLTP? OLAP? Social Media? What is the database? How do we add more servers to handle the load? Do we partition our data across multiple dbs? Or replicate all changes to loads of slaves?
Your questions call more questions, i.e. in an interview, if someone just "has the answer" to a generic question like you've posted, then they only know one way of doing things, and that way may or may not be the best one.
There are a few approaches I'd take to the first question:
Are there hardware upgrades that may get things up enough to handle the million queries in a short time? If so, this is likely an initial point to investigate.
Are there software changes that could be made to optimize the performance of the server? I know IIS has a ton of different settings that could be used to improve performance to some extent.
Consider going into a web farm situation rather than use a single server. I actually did have a situation where I worked once where we did have millions of hits a minute and it was thrashing our web servers rather badly and taking down a number of sites. Our solution was to change the load balancer so that a few of the servers served up the site that would thrash the servers so that other servers could keep the other sites up as this was in the fall and in retail this is your big quarter. While some would start here, I'd likely come here last as this can be opening a bit can of worms compared to the other two options.
As for the database instance, it would be a similar set of options to my mind though I may do the multi-server option first as redundancy may be an important side benefit here that I'm not sure it is as easy with a web server. I may be way off, but that is how I'd initially tackle this.
Use a caching proxy
If you serve identical pages to all visitors (say, a news site) you can reduce load by an order of magnitude by caching generated content with a caching proxy such as Varnish or Apache Traffic Server.
The proxy will sit between your server and your visitors. If you get 10,000 hits to your front page it will only have to be generated once, the proxy will send the same response to the other 9999 visitors without asking your app server again.
probably before developer starting to develop the system,
they will consider the specification of the server
maybe you can decrease use of SEO and block it from search engine to craw it
(which is the task that taking a lot of resource)
try to index everything well and avoid to making search easily
Deploy it on the cloud, make sure your web server and webapp cloud ready and can scale across different nodes. I recommend cherokee web server (very easy to load balance across different servers, and benchmarks proves faster than Apache,). For ex, google cloud (appspot) needs your web app to be Python or Java
Use caching proxy eg. Nginx.
For database use memcache on some queries which are suppose to be repeated.
If the company wants data to be private , build a private cloud , Here , Ubuntu is doing very good job at it fully free and opensource : http://www.ubuntu.com/cloud/private

Pros & Cons of Google App Engine [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
[An Updated List 21st Aug 09]
Help me Compile a List of all the Advantages & Disadvantages of Building an Application on the Google App Engine
Pros:
No need to buy servers or server space (no maintenance).
Makes solving the problem of scaling easier.
Free up to a certain level of consumed resources.
Cons:
Locked into Google App Engine ?
Developers have read-only access to the filesystem on App Engine.
App Engine can only execute code called from an HTTP request (except for scheduled background tasks).
Users may upload arbitrary Python modules, but only if they are pure-Python; C and Pyrex modules are not supported.
App Engine limits the maximum rows returned from an entity get to 1000 rows per Datastore call. (Update - App Engine now supports cursors for accessing larger queries)
Java applications may only use a subset (The JRE Class White List) of the classes from the JRE standard edition.
Java applications cannot create new threads.
Known Issues!! : http://code.google.com/p/googleappengine/issues/list
Hard limits
Apps per developer - 10
Time per request - 30 sec
Files per app - 3,000
HTTP response size - 10 MB
Datastore item size - 1 MB
Application code size - 150 MB
Update Blob store now allows storage of files up to 50MB
Pro or Con?
App Engine's infrastructure removes many of the system administration and development challenges of building applications to scale to millions of hits. Google handles deploying code to a cluster, monitoring, failover, and launching application instances as necessary.
While other services let users install and configure nearly any *NIX compatible software, App Engine requires developers to use Python or Java as the programming language and a limited set of APIs. Current APIs allow storing and retrieving data from a BigTable non-relational database; making HTTP requests; sending e-mail; manipulating images; and caching. Most existing Web applications can't run on App Engine without modification, because they require a relational database.
Pros:
Scalable
Easy and cheaper (in short term).
Nice option for start-ups/individuals.
Suitable for apps that just store and retrieve data.
Cons:
Not suitable for CPU intensive calculations. They are slower and expensive.
Scalability doesn't matter much cuz if an app works at Google scale then probably it makes enough money to run on its own servers.
They have lots of limitations thrown here and there, as a result deep data analysis is difficult. Like you cannot produce a social graph using GAE.
I would say its not meant for serious businesses and expensive in long run.
(A huge new) PRO: GAE now supports MySQL :
https://developers.google.com/cloud-sql/
Pros:
built-in ui for unified logs
built-in web interface for task queues
built-in indexes on list of primary objects.
Cons:
loose logs very fast
VERY expensive
VERY expensive
VERY expensive
Un-hackable. Scales because you're obligated to code in a way that scales.
Longer development cycles. Sometimes you just want to hack something together and throw it away after 5 hors. With appengine you have to proper code it and write a lot of stuff to make it sure it scales. You can't just do a "find . | grep .avi | xargs ffmpeg -compress ...." :)
You will loose hours trying to do the simplest tasks like sending push notifications to APNS (iPhone). Although it's fine if you only want to support android in the future.
Terrible to make cleanups on the database. It's a HUGE pain in the ass to fix rows in the database, mainly because terribly slow, but it also requires a lot of code to loop properly within it's time constraints.
It was a pain to port Lucene to work on it's "filesystem".
Slow for what you pay.
Even MORE expensive if your app has spikes of traffic. My app has those spikes if a user that has many followers makes an action and we have to push notifications to his followers. Because of that I have to keep 10 inactive servers always on ($$$$$) to handle spikes.
Appengine isn't too bad due to the fact that I have the option to burn $$$$ instead of being concerned about scalability and fixing bottlenecks to reduce server usage. Sometimes it worth it.
My advice to people starting new products is to go with hetzner.de which is where I host my other products servers. It's cheap and extremely hackable. I have one server at hetzner that is handling 3x more traffic than the product that I have on appengine. The difference in price is $100 a month versions $2700 a month!
I have system admin experience, so the bottom line is that I would never choose appengine over having my own ROOT server. Don't be that bored software engineer wanting to experiment new things instead of building great products!
Pro: Unlimited scalabity to your application and scales with demand.
Con: Not available in some countries (Argentina).
Edit
Available worldwide, but only through Google Groups for App Engine.
When assessing pros and cons, I think it is important to clarify the market for which one is representing. Developers looking for a cost-effective solution to help them with the steep part of their planned hockey-stick growth curve will weigh heavily the cons already listed. For a small business owner, however, GAE is a God-send. These folks most often are looking to "the cloud" as a means to more effectively run their business (i.e. sell physical product and services). For the SMB, GAE the pros already listed can be much more valuable compared to the hockey-stick seeking dev, whilst the cons weight in at a fraction of the devs' measure. I don't see the GAE team doing anything related to SMB positioning, so I guess answers like this are me just pulling on Superman's cape, and spitting into the wind. Really GAE should be absolutely ruling the SMB space now. If not (I have no insights re: user base), then its is a greatly lamentable failure.
I believe , GAE is yet to mature in terms of providing the basic features for serious business such as Datastore with complex primary key, java.awt.* support, these are just a few I'm naming.
Other than the free space and to build some "Hobby" websites, I strongly feel GAE is NOT the place java guys should looking into.
I'm having applications built on the JSP/Servlets and MySQL, thinking about migrating to GAE, but I find I will be spending more "value time" on the migration than just buying a space from some java hosting provider such as EATJ, etc (Sorry not marketing, just an experience).
Another big issue I've got is migration of my existing mySQL data into GAE, bulkupload is really pathetic and has no client support.
No support for Local Db to Server DB upload.
Once the GAE is ready with "all the Cons" mentioned by above, then I'll think we can look in to this migration.
You are force to own a cell phone line, and your country+carrier must be able to receive international SMSs.
(I hate cell phones, and my mom's or co-workers won't get the SMSs)
Con: No Other RDBMS or NoSQL databases are not possible ....
Con: All your base are belong to us
... On a serious note:
Con: You don't control the environment your application runs in. The same cons as with outsourcing any component. Fun for toys, not for business (yet) IMHO.
Various things like API for Google proprietary backends such as their database system and other 'lockdowns' and frameworks that mean your code is tied, in some loose sense to their system can create cost issues later if you want to migrate from GAE. Of course, you could abstract these.
I like GAE, AppJet and others. They are cool. But everything has its place. If you want freedom and the ability to control your language's modules, API, syntax/stdlib versions and whatnot ... don't relinquish control to a service provider.
The lack of standards for environments and specifications for what your app can expect worries me in the cloud arena.
common sense stuff really.
Con: Limited to Java and Python

Is Google App Engine good for scalablity and portability?

I'm evaluating hosted production environments and currently have interest in Google App Engine.
Currently I'm enjoying the free quotas. I'm concerned if it is efficient to scale up using
Google App Engine. Portability is being analyzed as well.
Please advise if Google App Engine is good for scalability and portability.
Thank you in advance.
Portability is guaranteed by the fact that Google has open-sourced all the parts of App Engine that live "in front of" the RPC layer, thus facilitating the work (which would happen anyway of course!-) of third party like appcelerator and bdbdatastore that implement compatible environment running on different infrastructure -- you only need to stay on Google's systems if Google gives you better ROI for your apps, else can easily migrate them to alternative implementations (I'm sure many more third-parties will join the ranks of these two, offering a variety of such alternatives).
Scalability, when the apps are programmed appropriately, seems proven eg. by the Obama's Town Hall Meeting example -- the app, using an open-sourced Google codebase known as "Moderator", was handling 700 QPS for a total of many millions of visits in a few hours, and maintaining excellent latency and impeccable uptime.
A LOT has been written (and recorded on video) about the right techniques needed to obtain such seamless scalability with App Engine -- there's really no way to summarize all of the hits in this google search! Suffice it to say, it's not trivial, but in the end it's easier (for suitable kinds of apps, at least -- ones that are "front-end heavy" as opposed to ones focused on huge "batch" jobs) than with any other technology I know of.

Resources