My app looks like it is logging a significant amount of GET requests to my homepage (for an app which is still in test), about 1-2 per minute. Is there a builtin mechanism in Google Cloud to figure out where these are coming from, IP-wise?
Related
I followed this guide: Quickstart for Python. After launching the "hello, world" app to app engine (flex) I went to [project].appspot.com and noticed that it is very slow. I tried testing it in different devices and network conditions and I still have the same issue. I went to Cloud Trace and can't build a report due to a lack of traces. It is also slow in both http and https. I deployed to us-central and I am in Texas.
I have attached some logs from Logging and a snippet from Google Chrome's Dev Tools to show the slowness.
Logs from Logging:
Chrome Dev Tools:
The images don’t show anything especially unexpected. Response time will vary depending on the location of the client and its distance to the region of the App Engine Flex instance. Responses that take an especially long time are likely due to a cold boot.
You probably use a free instance of app engine. Because it's free the lifespan is very short, therefor it shuts down after a short amount of time without requests. When you send a new request after some time, the instance has to set up and then process the request, which takes time. You can keep pinging the app to keep the instance alive. Similiar question is anwered here.
Im a little confused about this because the docs say I can use stackdriver for "Request logs and application logs for App Engine applications" so does that mean like web requests? Does that mean like millions of web requests?
Stackdriver's pricing is per resource so does that mean I can log all of my web servers web request logs (which would be HUGE) for no extra cost (meaning I would not be charged by the volume of storage the logs use)?
Does stackdriver use GCP cloud storage as a backend and do I have to pay for the storage? It just looks like I can get hundreds of gigabytes of log aggregation for virtually no money just want to make sure Im understanding this.
I bring up ELK because elastic just partnered with google so it must not do everything elasticsearch does (for almost no money) otherwise it would be a competitor?
Things definitely seem to be moving quickly at Google's cloud division and documentation does seem to suffer a bit.
Having said that, the document you linked to also details the limitations -
The request and application logs for your app are collected by a Cloud
Logging agent and are kept for a maximum of 90 days, up to a maximum
size of 1GB. If you want to store your logs for a longer period or
store a larger size than 1GB, you can export your logs to Cloud
Storage. You can also export your logs to BigQuery and Pub/Sub for
further processing.
It should work out of the box for small to medium sized projects. The built in log viewer is also pretty basic.
From your description, it sounds like you may have specific needs, so you should not assume this will be free. You should factor in costs for Cloud Storage for the logs you want to retain and BigQuery depending on your needs to crunch the logs.
Is it possible to route HTTP traffic between google app engine applications without going through the public internet?
For example, if I'm running a Web Service API on one application and want to build a second application on top of it without traffic going through the internet - for performance reasons.
Between separate apps running on different domains? I suspect not.
But you can use backends to do different work behind the scenes:
Backends are special App Engine instances that have no request deadlines, higher memory and CPU limits, and persistent state across requests. They are started automatically by App Engine and can run continously for long periods. Each backend instance has a unique URL to use for requests, and you can load-balance requests across multiple instances.
When I look at the logs between the backend and the front end instances I see IPs like
0.1.0.3
So yes, those communication paths are internal. I'd hazard a guess that as so much of the internet is google you could say requests between different apps might not travel on the public internet.
Logs indicate low latency communication between front and back ends, not under any particular load however. Your milage may vary.
Backends in Python
How can I find out how my bandwidth is used in Google App Engine? I want to extract the top bandwidth hogs so I can cut down on my outgoing bandwidth usage.
App engine logs all requests. This includes information about the request (path, query string, wall/cpu/api time, and approximate data transfer out in kb) and the requester (IP address and (if the user is logged in) google account name). You should be able to compute a reasonable estimate based on this information.
You can periodically download your app's logs with appcfg. How often you need to do this will be based on how much traffic your site handles.
Also it may be helpful for you to review the usage from the admin console (all up and via the logs) as shown below - the link to the admin console is at https://appengine.google.com/
I have a website that uses Twitter API. The thing is that site becomes blank once the API limit is reached (I think) and then after a while it starts displaying the results.
I am running on GAE appspot. Because I have the appspot subdomain, does this mean that I can never be blacklisted?
Also what is the use of a Twitter API, when I can directly search Twitter publicly?
No, your application can be blacklisted.
The REST API does account- and
IP-based rate limiting.
You cant even be in their whitelist in this situation (being in Google App Engine), acoording to their documentation:
(...) This works in most situations but for
cloud platforms like Google App
Engine, applications without a static
IP addresses cannot receive Search
whitelisting. (...)
(emphasis is mine)
Read Twitter Rate Limit for complete information about other limits and information.
If your application is being blocked due to exceeding the limit, then you should get a 400 HTTP response code. If you've written your application such that it generates a blank page when it gets an HTTP failure, then you have your answer. (How you check for HTTP errors in your particular development framework is a separate matter.)
You should use the API instead of scraping the public Twitter pages because IP addresses are subject to API rate limiting just like authenticated API accounts. When you authenticate with your account, you're not subject to the IP limit, so other people abusing Twitter from the same IP address (as might happen from a shared server environment like Google's) won't limit your use. This is all explained in the Rate limiting documentation from Twitter.