Why is my Google App Engine site over quota? - google-app-engine

I'm getting "Over Quota
This application is temporarily over its serving quota. Please try again later." on my GAE app. It's not billing-enabled. I ran a security scan against it today, which presumably triggered the over quota, but I can't explain why based on the information in the console.
Note that 1.59G has been used responding to 4578 requests. That's an average of about 347k per request, but none of my responses should ever be that large.
By filtering my logs I can see that there was no request today whose response size was greater than 25k. So although the security scan generated a lot of small requests over its 14 minute run, it couldn't possibly account for 1.59G. Can anyone explain this?

Note: mostly suppositions ...
The Impact of Security Scanner on logs section mentions:
Some traces of the scan will appear in your log files. For instance,
the security scanner generates requests for unlikely strings such as
"~sfi9876" and "/sfi9876" in order to examine your application's error
pages; these intentionally invalid page requests will show up in your
logs.
My interpretation is that some of the scan requests will not appear in the app's logs.
I guess it's not impossible for some of the scanner's requests to similarly not be counted in the app's request stats, which might explain the suspicious computation results you reported. I don't see any mention of this in the docs to validate or invalidate this theory. However...
In the Pricing, costs, and traffic section I see:
Currently, a large scan stops after 100,000 test requests, not
including requests related to site crawling. (Site crawling requests
are not capped.)
A couple of other quotes from Google Cloud Security Scanner doc:
The Google Cloud Security Scanner identifies security vulnerabilities
in your Google App Engine web applications. It crawls your
application, following all links within the scope of your starting
URLs, and attempts to exercise as many user inputs and event handlers
as possible.
Because the scanner populates fields, pushes buttons, clicks links,
and so on, it should be used with caution. The scanner could
potentially activate features that change the state of your data or
system, with undesirable results. For example:
In a blog application that allows public comments, the scanner may post test strings as comments on all your blog articles.
In an email sign-up page, the scanner may generate large numbers of test emails.
These quotes suggest that, depending on your app's structure and functionality, the number of requests can be fairly high. Your app would need to be really basic for the quoted kinds of activities to be achieved in 4578 requests - kinda supporting the above theory that some scanner requests might not be counted in the app's stats.

Related

Redirect 404 page rendering to CDN?

Is it possible to offload all 404 page renders to a CDN instead of an origin web server to prevent DDoS attacks? It seems like if you are getting DDoS'd and the attack is tying up compute resources by forcing your web server to render 404 pages, those renders would be better served through a CDN. Does anyone have any experience around this that they would be willing to share?
CloudFront automatically protects against DDoS using Shield Standard.
If you'd like to prevent requests made to CloudFront from reaching your origin, a few options:
Geographic restrictions: allow or block requests only from specific countries where you do not expect your viewers to be located. You can configure this within the CloudFront console, and there is no cost to use.
Add AWS WAF. You can block common application-layer attacks, as well as create specific rules to block requests (for example, files ending in extensions you do not use - e.g., .php) or add rate limiting.
Write a CloudFront Function (javascript function that executes at CloudFront's edge locations) to inspect the request and block any that do not match requests your application is capable of serving (for example, you could check that the incoming request matches one of the routes accepted by your application. If not, return a 404).
Both WAF and CloudFront Functions may add an additional cost. CloudFront has a perpetual free tier (meaning it is applied every month) that includes 1TB of data transfer out and 2 million CloudFront Function executions each month. Function executions beyond that are priced at $0.10 per 1 million invocations.
https://aws.amazon.com/cloudfront/pricing/
Several examples of CloudFront Functions to get you started available here - https://github.com/aws-samples/amazon-cloudfront-functions

Google App Engine: debugging Dashboard > Traffic > Sent

I have an GAE app PHP72, env: standard which is hanging intermittently (once or twice a day for about 5 mins).
When this occurs I see a large spike in GAE dashboard's Traffic Sent graph.
I've reviewed all uses of file_get_contents and curl_exec within the app's scripts, not including those in /vendor/, and don't believe these to be the cause.
Is there a simple way in which I can review more info on these outbound requests?
There is no way to get more details in that dashboard. You're going to need to check your logs at the corresponding times. Obscure things to check for:
Cron jobs coming in at the same times
Task Queues spinning up

How to programmatically scale up app engine?

I have an application which uses app engine auto scaling. It usually runs 0 instances, except if some authorised users use it.
This application need to run automated voice calls as fast as possible on thousands of people with keypad interactions (no, it's not spam, it's redcall!).
Programmatically speaking, we ask Twilio to initialise calls through its Voice API 5 times/sec and it basically works through webhooks, at least 2, but most of the time 4 hits per call. So GAE need to scale up very quickly and some requests get lost (which is just a hang up on the user side) at the beginning of the trigger, when only one instance is ready.
I would like to know if it is possible to programmatically scale up App Engine (through an API?) before running such triggers in order to be ready when the storm will blast?
I think you may want to give warmup requests a try. As they load your app's code into a new instance before any live requests reach that instance. Thus reducing the time it takes to answer while you GAE instance has scaled down to zero.
The link I have shared with you, includes the PHP7 runtime as I see you are familiar with it.
I would also like to agree with John Hanley, since finding a sweet spot on how many idle instances you have available, would also help the performance of your app.
Finally, the solution was to delegate sending the communication through Cloud Tasks:
https://cloud.google.com/tasks/docs/creating-appengine-tasks
https://github.com/redcall-io/app/blob/master/symfony/src/Communication/Processor/QueueProcessor.php
Tasks can try again hitting the app engine in case of errors, and make the app engine pop new instances when the surge comes.

How to prevent use my API which gives data for my React app

I make a web service and I'm going to use a React. A data for the service will be fetch from my API.
But there is a simple way to find out which endpoints I'm using, and what data I'm sending. This knowledge gives a lot options to make bots for my service.
Is there any option to prevent this?
I know, I can require a signing all requests, but it's also easy to get to know.
This cannot be done. Whatever is done in client-side JavaScript, can be reverse-engineered and simulated.
Efforts should be focused on preventing API from being abused, i.e. throttling or blacklisting clients based on their activity or available information (user agent, suspicious request, generated traffic). If the use of API allows captcha, suspicious clients can be asked for proving their humaneness.
There are half-measures that can be applied to client side application and make it less advantageous for abuse (and also for development).
Prevent unauthorized access to unminified/unobfuscated JS AND source maps. There may be a need to authorize them on per user basis. This will make debugging and bug reporting more difficult
Hard-code parts that are involved in request signing to browser APIs, e.g.:
apiKey = hash(NOT_SO_SECRET_KEY + document.querySelector('.varyingBlock').innerHTML)
This requires bots to emulate browser environment and makes their work much less efficient. This also affects the design of the application in negative way. Obviously, there will be additional difficulties with SSR and it won't translate to native platforms easily.
here two basic preventive measures that you can use.
Captcha
Use a captcha service like recaptcha. so that user can use your website only after passing the captcha test. Its highly difficult for bots to pass the captchas.
Rate Limit Api usage.
Add rate limiting to your api. so that a logged in user can only make 100 requests in 10 minutes, the numbers will depend on you use case

Paying for crawlers on AppEngine

Yesterday my app has been visited 35 times by HUMANS. It seems however that a machine was crawling the website. I was overquota in a few hours (mostly frontend instance hours).
Today i pay max 5USD per day. For 35 real people it seems way too much.
I dont feel really good paying for crawlers that block the access to my website to regular users. 2 questions for you guys :
Is it normal that it happens?
What can i do to invest money in the real users instead of crawlers ? (and i am not talking about not referencing my app)
app : www.conceptstore.me
A well-behaved crawler should:
follow the rules in /robots.txt - so upload one. This alone should be enough.
provide a distinct User-Agent HTTP request header - so look at the User Agents automatically recorded in the App Engine logs, then return error pages for User-Agents you don't like.

Resources