Why the same IP of Cron Job in a App Engine is used by malicius requests? - google-app-engine

I noticed this:
(Note that I deployed this service just 2 days ago and no one know it!!)
The only valid request here is the one to "/extract" at 4:10, because it is correct and it is at the scheduled time (of the cron job).
(You can also notice that the user agent is "AppEngine")
I created some routing for the "hacking try" paths to reply with a "F*** Y**".
Then I decided to use a dos.yaml file to define a blacklist.
I was taking note of the IPs and I noticed that the IP of the valid request is also used in this request (and probably other "malicious" ones):
I noticed that every execution of the my scheduled "cron job" (automated one or using "Run now") has a different IP. Also all the malicious requests have different IPs but you can see that some of them are executed programmatically:
How is it possible?
My explanation is that Google Cloud Platform is hosting a lot of malicious services.
Also, the documentation says that cron jobs are issued by the IP 0.1.0.1.
Someone can explain me this unexpected behavior or where is the fault in my argument?
Alex

Related

Deploying a python bot script on Google Cloud Run (GCR)

I have been racking my brains on this for a few weeks now, trying different variations from Google Cloud service offerings but can't seem to find the proper one.
I have a python script with dependencies etc, that I have containerized, pushed, and deploy to GCR.
The script is a bot that connects to an external websocket receiving signals perpetually to then do other processing via API against another external service.
What would be the best service offering from Google Cloud to run this?
So far, I've tried:
GCR Web Service - requires listening service (:8080) which I do not provide in this use case, and, it scales down your service when there is no traffic so no go.
GCR Job Service - Seems like the next ideal option (no HTTP port requirement) - however, since the script (my entry point), upon launch, doesn't 'return' unless it quits, the service launch just allows it to run for a minute or so, until the jobs API declares it as 'failed' - basically, it is launching it via my entry point which just executes the script as if it was running locally and my script isn't meant to return anything.
To try and get around this, I went the google's recommended way and built a main.py with they standard boilerplate, and built it as a wrapper to act as a launcher for the actual script. I did this via a simple subprocess.Popen using their sample main.py as shown below.
main.py
import json
import os
import sys
import subprocess
# Retrieve Job-defined env vars
TASK_INDEX = os.getenv("CLOUD_RUN_TASK_INDEX", 0)
TASK_ATTEMPT = os.getenv("CLOUD_RUN_TASK_ATTEMPT", 0)
# Define main script
def main():
print(f"Starting Task #{TASK_INDEX}, Attempt #{TASK_ATTEMPT}...")
subprocess.Popen(["python3", "myscript.py"], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
print(f"Completed Task #{TASK_INDEX}.")
# Start script
if __name__ == "__main__":
try:
main()
except Exception as err:
message = f"Task #{TASK_INDEX}, " \
+ f"Attempt #{TASK_ATTEMPT} failed: {str(err)}"
print(json.dumps({"message": message, "severity": "ERROR"}))
sys.exit(1) # Retry Job Task by exiting the process
My thinking being, this would allow the job to execute my script and mark the job as completed, while the actual script remains running. Also, since subprocess.Popen sets its stdout and stderr to PIPE, my thinking is it would get caught by the google logging and I would see the output.
The job runs and marks it as succeed, however, I see no indication of the actual script executing anywhere.
I had similar issue with Google Cloud functions. Jobs seemed like an ideal option since I can run on their scheduler to make sure it is launching after saying, every hour (my script uses a lock file so it doesn't run again if running).
Am I just missing the point on how these cloud services run?
Do offerings like google cloud run jobs/functions, etc meant to execute jobs that return and quit until launched again by however scheduled?
Do I need to consider Google Computing engine as an option for this use case that is, a full running VM instead of stateless/serverless options?
I am trying to use this in a containerized, scalable as needed, fashion to both make my project portable and minimize costs as much as possible given the always running nature of the job.
Lastly, I know services like pythonanywhere as I am sure others, make this kinda stuff easier, but I would like to learn how to do this via standard cloud offerings like GCR, AWS, etc.
thanks for any insight / advice!
Cloud Run best fit is for HTTP Rest APIs serving (stateless services). There are also Jobs in beta.
One of the top feature of Run is that it scales to 0, when there are not requests to your service (your service instance gets totally destroyed).
If your bot needs to stay alive "for ever", Run is not for you... (Even if you can configure Run to keep at least one instance live).
I would consider instead AppEngine or Compute.

Google App Engine: debugging Dashboard > Traffic > Sent

I have an GAE app PHP72, env: standard which is hanging intermittently (once or twice a day for about 5 mins).
When this occurs I see a large spike in GAE dashboard's Traffic Sent graph.
I've reviewed all uses of file_get_contents and curl_exec within the app's scripts, not including those in /vendor/, and don't believe these to be the cause.
Is there a simple way in which I can review more info on these outbound requests?
There is no way to get more details in that dashboard. You're going to need to check your logs at the corresponding times. Obscure things to check for:
Cron jobs coming in at the same times
Task Queues spinning up

Using 1 intance of google-app-engine to monitor external service

I planning to create a NodeJS program, that work 24/7, that ping and make requests to an external server (outside of google cloud) every minute. Just to see that it the external services are are live.
If there is any error it will notify me by SMS & Email.
I don't need any front-end for this app, and no one needs to connect to it. Just simple NodeJS program.
The monitoring and configuration will be by texts files.
Now the questions:
It looks like it will cost me just $1.64. It sounds very cheap. Am I missing something?
It needs to work around the clock, I will request it to start it once, and it need to continue working, (by using setInterval). Is it will be aborted?
What it is exactly mean buy 1 instance. What an instance can do? Only respond to one request or what?
I tried to search in Google: appengine timeout, but didn't found anything that helps.
Free Quota
If you write your application in Python, PHP, Go or Java it can fit in free usage quota:
https://cloud.google.com/appengine/docs/quotas
So there will be absolutely no costs to run it on Google App Engine platform.
There are limit of 657,000 UrlFetch API Calls per day (more than 450 calls per minute in 24/7 mode) for free apps. 4GB traffic may also be sufficient for this kind of work.
Keep in mind there is no SMS sending services provided by Google App Engine and you will need to spend additional UrlFetch API calls to use external SMS services.
Email sending is also limited to 100 Emails per day (or 5000 Emails to admin address), so try not so send repeated notifications about same monitored server every minute, or you'll deplete your Email quote in 1.5 hours.
Scheduled Tasks
There is no way to run single process indefinitely without interruption on App Engine. But you don't have to!
You'll need to encapsulate all the work you're planning to execute in every iteration into single task and then schedule it to run every minute with Cron. See this documentation for Python: https://cloud.google.com/appengine/docs/python/config/cron
It is recommended to have some configuration page where you can set some internal configuration or see monitoring statistics, at least manage flag to temporarily pause tasks execution without redeploying your app.

Why is C1 requesting my admin page remotely?

Some time ago I had Composite C1 installed on a public url to test it (for example http://c1.mydomain.com). But I did remove it already some time ago.
I checked my firewall logs recently and I discovered requests for http://c1.mydomain.com/Composite/top.aspx every single night from IP address 109.238.52.32. (Composite.net's ip address is 109.238.52.24, which is almost the same, so I assume the requests are comming from Composite.net.)
So the question is: Why is Composite.net requesting my admin page every single day?
What you are seeing is a process that check long-term usage of the software in order to generate statistics. The daily URL request looks at the HTTP status code (HTTP 200 vs other) to determine if your website is still online and using the CMS. Since the URL is unique and likely not used by any other system it is a good indication.
When that is said it is probably silly to keep on requesting the same URL once the check starts reporting 404 or similar.

GAE urlfetch Host header set to IP address instead of hostname

I am calling a third party web service from app engine. This particular service is picky. I ran into an issue where calls would work fine for a while, then stop working, then start working again. I realized that if I manually stopped all instances in the admin console, that the calls would work again.
I setup a proxy to route the calls through that so I could see the headers and all detail. I think I have tracked the issue down to the following. After an instance has been up for a while (the app usually just needs 1 to 3 instances right now) app engine will start using the IP address of the destination as the value for the host header instead of the hostname. Well the service doesn't like that. Whether it should care is another matter.
So my question is, why does app engine use the ip address for the host header eventually instead of the hostname? And, of course, is there anything I can do about it? I know that I cannot set the host header, but maybe there is something else that can be done.
Thanks for any insight.
First, thank you for finding this behavior. We have had intermittent issues with urlfetch for a long time, and will try to detect if this is the issue.
One thing you could try is to target a specific instance/module:
http://instance.version.module.app-id.appspot.com
and cycle through the instances. If you just target the module, it will kill the instance after some inactivity. So, perhaps that would not trigger the GAE DNS shortcut.
Another trick would be to add a fake, random, query string after your url: ?foo=D7hfka67h. Perhaps that would prevent GAE from recognizing the repeat url, and trying to shortcut the DNS.

Resources