Bandwidth usage in Google App Engine - google-app-engine

How can I find out how my bandwidth is used in Google App Engine? I want to extract the top bandwidth hogs so I can cut down on my outgoing bandwidth usage.

App engine logs all requests. This includes information about the request (path, query string, wall/cpu/api time, and approximate data transfer out in kb) and the requester (IP address and (if the user is logged in) google account name). You should be able to compute a reasonable estimate based on this information.
You can periodically download your app's logs with appcfg. How often you need to do this will be based on how much traffic your site handles.

Also it may be helpful for you to review the usage from the admin console (all up and via the logs) as shown below - the link to the admin console is at https://appengine.google.com/

Related

How can I enforce rate limit for users downloading from Google Cloud Storage bucket?

I am implementing a dictionary website using App Engine and Cloud Storage. App Engine controls the backend, like user authentication etc., and Cloud Storage is used to store a JSON file for each dictionary entry.
I would like to rate limit how much a user can download in a given time period so they can't bulk download the JSON files and result in a big charge for me. Ideally, the dictionary would display a captcha if a user downloads too much at once, and allow them to keep downloading if they pass the captcha. What is the best way to achieve this?
Is there a specific service for rate limiting based on IP address or authenticated user? Should I do this through App Engine and only access Cloud Storage through App Engine (perhaps slower since it's using some of my dynamic resources to serve static content)? Or is it possible to have the frontend access Cloud Storage and implement the rate limiting on Cloud Storage directly? Is a Cloud bucket the right service for storage, here? And how can I allow search engine indexing bots to bypass the rate limiting?
As explained by Doug Stevenson in this post
"There is no configuration for limiting the volume of downloads for
files stored in Cloud Storage."
and explaining further:
"If you want to limit what end users can do, you will need to route
them through some middleware component that you build that tracks how
they're using your provided API to download files, and restrict what
they can do based on their prior behavior. This is obviously
nontrivial to implement, but it's possible."

My google app instances does not seem to be on correct region

I have just created one google app engine application and one 2nd Generation MySQL instance in eu-west2 region. In GCP Console they both seems to be in eu-west2 region.
However when I try to gelocate my ip's they seem to be in somewhere in US.
What should I do to use GCP in eu-west2 region?
my GCP instances:
their locations:
Google has an extensive world wide network. What you are seeing is us routing you to Google's closest Point of Presence (POP), which from that point on you're on a software defined network (SDN). What this means is we get your traffic on to our fast network as quickly as possible and abstract away the details of getting you to the machine in question.
Check latency from you to these hosts, then spin up a VM in Europe and check latency from that VM to these hosts - you'll find the numbers will confirm they really are in eu-west2.
I faced the same issue, you can find more about it here: Outgoing HTTP Request Location on Google App Engine
Is about the Google Network usage, the outgoing traffic come from the "Point of Presence" instead of the location and it can be dynamic. In my case, I have no solution, since it's mandatory to my API to make the requests from Brazil =\

Stackdriver vs ELK for app engine

Im a little confused about this because the docs say I can use stackdriver for "Request logs and application logs for App Engine applications" so does that mean like web requests? Does that mean like millions of web requests?
Stackdriver's pricing is per resource so does that mean I can log all of my web servers web request logs (which would be HUGE) for no extra cost (meaning I would not be charged by the volume of storage the logs use)?
Does stackdriver use GCP cloud storage as a backend and do I have to pay for the storage? It just looks like I can get hundreds of gigabytes of log aggregation for virtually no money just want to make sure Im understanding this.
I bring up ELK because elastic just partnered with google so it must not do everything elasticsearch does (for almost no money) otherwise it would be a competitor?
Things definitely seem to be moving quickly at Google's cloud division and documentation does seem to suffer a bit.
Having said that, the document you linked to also details the limitations -
The request and application logs for your app are collected by a Cloud
Logging agent and are kept for a maximum of 90 days, up to a maximum
size of 1GB. If you want to store your logs for a longer period or
store a larger size than 1GB, you can export your logs to Cloud
Storage. You can also export your logs to BigQuery and Pub/Sub for
further processing.
It should work out of the box for small to medium sized projects. The built in log viewer is also pretty basic.
From your description, it sounds like you may have specific needs, so you should not assume this will be free. You should factor in costs for Cloud Storage for the logs you want to retain and BigQuery depending on your needs to crunch the logs.

Stack Exchange API compliant request throttle implementation on Google App Engine Cloud infrastructure

I have been writing a Google Chrome extension for Stack Exchange. It's a simple extension that allows you to keep track of your reputation and get notified of comments on Stack Exchange sites.
Currently I've encountered with some issues that I can't handle myself.
My extension uses Google App Engine as its back-end to make external requests to Stack Exchange API. Each single client request from extension for new comments on single site can cause plenty of requests to api endpoint to prepare response even for non-skeetish user. Average user has accounts at least on 3 sites from Stack Exchange network, some has > 10!
Stack Exchange API has request limits:
A single IP address can only make a certain number of API requests per day (10,000).
The API will cut my requests off if I make more than 30 requests over 5 seconds from single IP address.
It's clear that all requests should be throttled to 30 per 5 seconds and currently I've implemented request throttle logic based on a distributed lock with memcached. I'm using memcached as a simple lock manager to coordinate the activity of GAE instances and throttle UrlFetch requests.
But I think it's a big failure to limit such powerful infrastructure to issue no more than 30 requests per 5 sec. Such api request rate does not allow me to continue development of new interesting and useful features and one day it will stop working properly at all.
Now my app has 90 users and growing and I need come up with solution how to maximize request rate.
As known App Engine makes external UrlFetch requests via the same pool of different IP's.
My goal is to write request throttle functionality to ensure compliance with the api terms of usage and to utilize GAE distributed capabilities.
So my question is how-to provide maximum practical API throughput while complying with api terms of usage and utilizing GAE distributed capabilities.
Advise to use another platform/host/proxy is just useless in my mind.
If you are searching a way to programmatically manage Google App Engine shared pool of IPs, I firmly believe that you are out of luck.
Anyway, quoting this advice that is part of the faq, I think you have more than a chance to keep on running your awesome app:
What should I do if I need more
requests per day?
Certain types of applications -
services and websites to name two -
can legitimately have much higher
per-day request requirements than
typical applications. If you can
demonstrate a need for a higher
request quota, contact us.
EDIT:
I was wrong, actually you don't have any chance.
Google App Engine [app]s are doomed.
First off: I'm using your extension and it rocks!
Have you consider using memcached and caching the results?
Instead of taking the results from the API directly, try first to find them on the cache if they are use it and if they are not: retrieve them and cache them and let them expire after X minutes.
Second, try to batch up users requests, instead of asking the reputation of a single user ask the reputation of several users together.

Can Google App Engine site be blacklisted for exceeding Twitter API rate limit?

I have a website that uses Twitter API. The thing is that site becomes blank once the API limit is reached (I think) and then after a while it starts displaying the results.
I am running on GAE appspot. Because I have the appspot subdomain, does this mean that I can never be blacklisted?
Also what is the use of a Twitter API, when I can directly search Twitter publicly?
No, your application can be blacklisted.
The REST API does account- and
IP-based rate limiting.
You cant even be in their whitelist in this situation (being in Google App Engine), acoording to their documentation:
(...) This works in most situations but for
cloud platforms like Google App
Engine, applications without a static
IP addresses cannot receive Search
whitelisting. (...)
(emphasis is mine)
Read Twitter Rate Limit for complete information about other limits and information.
If your application is being blocked due to exceeding the limit, then you should get a 400 HTTP response code. If you've written your application such that it generates a blank page when it gets an HTTP failure, then you have your answer. (How you check for HTTP errors in your particular development framework is a separate matter.)
You should use the API instead of scraping the public Twitter pages because IP addresses are subject to API rate limiting just like authenticated API accounts. When you authenticate with your account, you're not subject to the IP limit, so other people abusing Twitter from the same IP address (as might happen from a shared server environment like Google's) won't limit your use. This is all explained in the Rate limiting documentation from Twitter.

Resources