How Can You Determine When a Request Started on GAE Managed VM? - google-app-engine

On Google App Engine, there are multiple ways a request can start: a web request, a cron job, a taskqueue, and probably others as well.
How could you (especially on Managed VM) determine the time when your current request began?
One solution is to instrument all of your entry points, and save the start time somewhere, but it would be nice if there was an environment variable or something that told when the request started. The reason this is important is because many GAE requests have deadlines (either 60 seconds or 10 minutes in various scenarios), and it's helpful to determine how much time you have left in a request when you are doing some additional work.

We don't specifically expose anything that lets you know how much time is left on the current request. You should be able to do this by recording the time at the entrypoint of a request, and storing it in a thread local static.
The need for this sounds... questionable. Why are you doing this? It may be a better idea to use a worker / queue pattern with polling for something that could take a long time.

You can see all this information in the logs in your Developer console. You can also add more data to the logs in your code, as necessary.
See Writing Application Logs.

Related

Appengine responses becoming slower?

my ajax calls to AppEngine doing some very basic logic (and doing all the actual processing in the background, isolated from the frontend) tend to be at least 200% slower than they used to. Like taking 3 seconds instead of one out of a sudden since a week or so.
I am wondering if you guys had a similar experience or something changed in the meantime I am not aware of, quota wise maybe. I am using the free quota.
Thanks
Zac
To my knowledge there is no particular change going on, but we can't be sure. However slow response time can have multiple root causes.
If you have no traffic on your application then you might have zero instance running, therefore when you make your request there is the time for an instance to start up.
If you have a lot of traffic, depending on your configuration the request can take more time. You need to fine tune wether the request waits to be handled by an "overloaded" instance or if another instance should start.
If you use an API maybe there is something wrong with it.
I would suggest you enable appstats in your app, it will show you what takes time in your request: you will definitely see if this is something on your side or not.

Identify why Google app engine is slow

I developed an application for client that uses Play framework 1.x and runs on GAE. The app works great, but sometimes is crazy slow. It takes around 30 seconds to load simple page but sometimes it runs faster - no code change whatsoever.
Are there any way to identify why it's running slow? I tried to contact support but I couldnt find any telephone number or email. Also there is no response on official google group.
How would you approach this problem? Currently my customer is very angry because of slow loading time, but switching to other provider is last option at the moment.
Use GAE Appstats to profile your remote procedure calls. All of the RPCs are slow (Google Cloud Storage, Google Cloud SQL, ...), so if you can reduce the amount of RPCs or can use some caching datastructures, use them -> your application will be much faster. But you can see with appstats which parts are slow and if they need attention :) .
For example, I've created a Google Cloud Storage cache for my application and decreased execution time from 2 minutes to under 30 seconds. The RPCs are a bottleneck in the GAE.
Google does not usually provide a contact support for a lot of services. The issue described about google app engine slowness is probably caused by a cold start. Google app engine front-end instances sleep after about 15 minutes. You could write a cron job to ping instances every 14 minutes to keep the nodes up.
Combining some answers and adding a few things to check:
Debug using app stats. Look for "staircase" situations and RPC calls. Maybe something in your app is triggering RPC calls at certain points that don't happen in your logic all the time.
Tweak your instance settings. Add some permanent/resident instances and see if that makes a difference. If you are spinning up new instances, things will be slow, for probably around the time frame (30 seconds or more) you describe. It will seem random. It's not just how many instances, but what combinations of the sliders you are using (you can actually hurt yourself with too little/many).
Look at your app itself. Are you doing lots of memory allocations in the JVM? Allocating/freeing memory is inherently a slow operation and can cause freezes. Are you sure your freezing is not a JVM issue? Try replicating the problem locally and tweak the JVM xmx and xms settings and see if you find similar behavior. Also profile your application locally for memory/performance issues. You can cut down on allocations using pooling, DI containers, etc.
Are you running any sort of cron jobs/processing on your front-end servers? Try to move as much as you can to background tasks such as sending emails. The intervals may seem random, but it can be a result of things happening depending on your job settings. 9 am every day may not mean what you think depending on the cron/task options. A corollary - move things to back-end servers and pull queues.
It's tough to give you a good answer without more information. The best someone here can do is give you a starting point, which pretty much every answer here already has.
By making at least one instance permanent, you get a great improvement in the first use. It takes about 15 sec. to load the application in the instance, which is why you experience long request times, when nobody has been using the application for a while

Is it possible to track only slow requests with appstats?

Our application processes several dozens of requests per second and small portion of them takes significantly more time to process than others. We are interested to 'profile' those slow requests, however appstats seems to keep just small window of processed requests, so the ones we are interested in fades out very fast. Is it possible to configure appstats somehow to keep log of just requests taking more time than specified threshold ?
A detailed list of configuration options regarding Appstats, is presented at Sample Appstats Configuration Example in the GAE SDK. Based on this file, it seems that currently it's not possible to capture requests based on their execution time.
Hope this helps.
Unfortunately, there's no built in mechanism for this.
You could add it yourself by monkeypatching the end_recording method, or the Recorder.save method.

how many users in a GAE instance?

I'm using the Python 2.5 runtime on Google App Engine. Needless to say I'm a bit worried about the new costs so I want to get a better idea of what kind of traffic volume I will experience.
If 10 users simultaneously access my application at myapplication.appspot.com, will that spawn 10 instances?
If no, how many users in an instance? Is it even measured that way?
I've already looked at http://code.google.com/appengine/docs/adminconsole/instances.html but I just wanted to make sure that my interpretation is correct.
"Users" is a fairly meaningless term from an HTTP point of view. What's important is how many requests you can serve in a given time interval. This depends primarily on how long your app takes to serve a given request. Obviously, if it takes 200 milliseconds for you to serve a request, then one instance can serve at most 5 requests per second.
When a request is handled by App Engine, it is added to a queue. Any time an instance is available to do work, it takes the oldest item from the queue and serves that request. If the time that a request has been waiting in the queue ('pending latency') is more than the threshold you set in your admin console, the scheduler will start up another instance and start sending requests to it.
This is grossly simplified, obviously, but gives you a broad idea how the scheduler works.
First, no.
An instance per user is unreasonable and doesn't happen.
So you're asking how does my app scale to more instances? Depends on the load.
If you have much much requests per second then you'll get (automatically) another instance so the load is distributed.
That's the core idea behind App Engine.

crawler on appengine

i want to run a program continiously on appengine.This program will automatically crawl some website continiously and store the data into its database.Is it possible for the program to
continiously keep doing it on appengine?Or will appengine kill the process?
Note:The website which will be crawled is not stored on appengine
i want to run a program continiously
on appengine.
Can't.
The closest you can get is background-running scheduled tasks that last no more than 30 seconds:
Notably, this means that the lifetime
of a single task's execution is
limited to 30 seconds. If your task's
execution nears the 30 second limit,
App Engine will raise an exception
which you may catch and then quickly
save your work or log process.
A friend of mine suggested following
Create a task queue
Start the queue by passing some data.
Use an Exception handler and handle DeadlineExceededException.
In your handler create a new queue for same purpose.
You can run your job infinitely. You only need to consider used CPU Time and storage.
You might want to consider Backends introduced in the newer version of GAE.
These run continuous processes
Is Possible Yes, I have already build a solution on Appengine - wowprice
Sharing all details here will make my answer lengthy,
Problem - Suppose I want to crawl walmart.com, As i known that I cant crawl in one shot(millions products)
Solution - I have designed my spider to break the task in smaller task.
Step 1 : I input job for walmart.com, Job scheduler will create a task.
Step 2 : My spider will pick the job and its notice that Its index page, now my spider will create more jobs as starting page as categories page, Now its enters 20 more tasks
Step 3 : now spider make more smaller jobs for subcategories, and its will go till it gets product list page and create task for it.
Step 4 : for product list pages, its get the product and make call to to stores the product data and in case of next page It ll make one task to crawl them.
Advantages -
We can crawl without breaking 30 seconds rules, and speed of crawling will depends backend machine, It will provide parallel crawling for single target.
they fixed it for you.
you can run background threads on a manual scaled instance.
check https://developers.google.com/appengine/docs/python/modules/#Python_Background_threads
You cannot literally run one continuous process for more than 30 seconds. However, you can use the Task Queue to have one process call another in a continuous chain. Alternatively you can schedule jobs to run with the Cron service.
Use a cron job to periodically check for pages which have not been scraped in the past n hours/days/whatever, and put scraping tasks for some subset of these pages onto a task queue. This way your processes don't get killed for taking too long, and you don't hammer the server you're scraping with excessive bursts of traffic.
I've done this, and it works pretty well. Watch out for task timeouts; if things take too long, split them into multiple phases and be sure to use memcached liberally.
Try this:
on appengine run any program. You connect from browser, click for start url during ajax. Ajax call server, download some data from internet and return you (your browser) next url. This is not one request, each url is one diferent request. You mast only resolve in JS how ajax is calling url un cycle.
You can using lasted GAE service called backends . Check this http://code.google.com/appengine/docs/java/backends/
Backends are special App Engine instances that have no request deadlines, higher memory and CPU limits, and persistent state across requests. They are started automatically by App Engine and can run continously for long periods. Each backend instance has a unique URL to use for requests, and you can load-balance requests across multiple instances.

Resources