I have this code:
// main.go
package magnum
import (
"net/http"
"google.golang.org/appengine"
"google.golang.org/appengine/log"
)
func init() {
http.HandleFunc("/tasks/backup", handler)
}
func handler(w http.ResponseWriter, r *http.Request) {
ctx := appengine.NewContext(r)
log.Debugf(ctx, "Testing cron tasks using Go")
}
// cron.yaml
cron:
- description: extraction
url: /tasks/backup
schedule: every 5 minutes
And I'm not seeing "Testing cron tasks using Go" text when I check the logs on GAE dashboard. Facts:
Locally, I can see that text when I test the cron task using development environment
On deployed, I can see a successful request on GAE dashboard when task is performed
Standar Environment
What could I be missing?
I believe the local development server doesn't by default display logged messages, try starting your local development server with this additional argument:
dev_appserver.py --log_level=info
You can also pass the --logs_path=... argument if you want to log the output to a file in addition to the console. Depending on the amount of logging you need to do that might be useful as well.
For your deployed application, you can find your logged application messages in the Logging section of the Cloud Platform Console. If you don't see them ensure that the drop-down filter has GAE Application selected.
It's been a long time since the first post, but recently I had the same problem with go 1.11 GAE Standard, and had to switch to use cloud.google.com/go/logging package.
Related
I have been racking my brains on this for a few weeks now, trying different variations from Google Cloud service offerings but can't seem to find the proper one.
I have a python script with dependencies etc, that I have containerized, pushed, and deploy to GCR.
The script is a bot that connects to an external websocket receiving signals perpetually to then do other processing via API against another external service.
What would be the best service offering from Google Cloud to run this?
So far, I've tried:
GCR Web Service - requires listening service (:8080) which I do not provide in this use case, and, it scales down your service when there is no traffic so no go.
GCR Job Service - Seems like the next ideal option (no HTTP port requirement) - however, since the script (my entry point), upon launch, doesn't 'return' unless it quits, the service launch just allows it to run for a minute or so, until the jobs API declares it as 'failed' - basically, it is launching it via my entry point which just executes the script as if it was running locally and my script isn't meant to return anything.
To try and get around this, I went the google's recommended way and built a main.py with they standard boilerplate, and built it as a wrapper to act as a launcher for the actual script. I did this via a simple subprocess.Popen using their sample main.py as shown below.
main.py
import json
import os
import sys
import subprocess
# Retrieve Job-defined env vars
TASK_INDEX = os.getenv("CLOUD_RUN_TASK_INDEX", 0)
TASK_ATTEMPT = os.getenv("CLOUD_RUN_TASK_ATTEMPT", 0)
# Define main script
def main():
print(f"Starting Task #{TASK_INDEX}, Attempt #{TASK_ATTEMPT}...")
subprocess.Popen(["python3", "myscript.py"], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
print(f"Completed Task #{TASK_INDEX}.")
# Start script
if __name__ == "__main__":
try:
main()
except Exception as err:
message = f"Task #{TASK_INDEX}, " \
+ f"Attempt #{TASK_ATTEMPT} failed: {str(err)}"
print(json.dumps({"message": message, "severity": "ERROR"}))
sys.exit(1) # Retry Job Task by exiting the process
My thinking being, this would allow the job to execute my script and mark the job as completed, while the actual script remains running. Also, since subprocess.Popen sets its stdout and stderr to PIPE, my thinking is it would get caught by the google logging and I would see the output.
The job runs and marks it as succeed, however, I see no indication of the actual script executing anywhere.
I had similar issue with Google Cloud functions. Jobs seemed like an ideal option since I can run on their scheduler to make sure it is launching after saying, every hour (my script uses a lock file so it doesn't run again if running).
Am I just missing the point on how these cloud services run?
Do offerings like google cloud run jobs/functions, etc meant to execute jobs that return and quit until launched again by however scheduled?
Do I need to consider Google Computing engine as an option for this use case that is, a full running VM instead of stateless/serverless options?
I am trying to use this in a containerized, scalable as needed, fashion to both make my project portable and minimize costs as much as possible given the always running nature of the job.
Lastly, I know services like pythonanywhere as I am sure others, make this kinda stuff easier, but I would like to learn how to do this via standard cloud offerings like GCR, AWS, etc.
thanks for any insight / advice!
Cloud Run best fit is for HTTP Rest APIs serving (stateless services). There are also Jobs in beta.
One of the top feature of Run is that it scales to 0, when there are not requests to your service (your service instance gets totally destroyed).
If your bot needs to stay alive "for ever", Run is not for you... (Even if you can configure Run to keep at least one instance live).
I would consider instead AppEngine or Compute.
I successfully deployed a twitter screenshot bot on Google App Engine.
This is my first time deploying.
First thing I noticed was that the app didn't start running until I clicked the link.
When I did, the app worked successfully (replied to tweets with screenshots) as long as the tab was loading and open.
When I closed the tab, the bot stopped working.
Also, in the cloud shell log, I saw:
Handling signal: term
[INFO] Worker exiting (pid 18)
This behaviour surprises me as I expect it to keep running on google server indefinitely.
My bot works by streaming with Twitter api. Also the "worker exiting" line above surprises me.
Here is the relevant code:
def get_stream(set):
global servecount
with requests.get(f"https://api.twitter.com/2/tweets/search/stream?tweet.fields=id,author_id&user.fields=id,username&expansions=author_id,referenced_tweets.id", auth=bearer_oauth, stream=True) as response:
print(response.status_code)
if response.status_code == 429:
print(f"returned code 429, waiting for 60 seconds to try again")
print(response.text)
time.sleep(60)
return
if response.status_code != 200:
raise Exception(
f"Cannot get stream (HTTP {response.status_code}): {response.text}"
)
for response_line in response.iter_lines():
if response_line:
json_response = json.loads(response_line)
print(json.dumps(json_response, indent=4))
if json_response['data']['referenced_tweets'][0]['type'] != "replied_to":
print(f"that was a {json_response['data']['referenced_tweets'][0]['type']} tweet not a reply. Moving on.")
continue
uname = json_response['includes']['users'][0]['username']
tid = json_response['data']['id']
reply_tid = json_response['includes']['tweets'][0]['id']
or_uid = json_response['includes']['tweets'][0]['author_id']
print(uname, tid, reply_tid, or_uid)
followers = api.get_follower_ids(user_id='1509540822815055881')
uid = int(json_response['data']['author_id'])
if uid not in followers:
try:
client.create_tweet(text=f"{uname}, you need to follow me first :)\nPlease follow and retry. \n\n\nIf there is a problem, please speak with my creator, #JoIyke_", in_reply_to_tweet_id=tid, media_ids=[mid])
except:
print("tweet failed")
continue
mid = getmedia(uname, reply_tid)
#try:
client.create_tweet(text=f"{uname}, here is your screenshot: \n\n\nIf there is a problem, please speak with my creator, #JoIyke_", in_reply_to_tweet_id=tid, media_ids=[mid])
#print(f"served {servecount} users with screenshot")
#servecount += 1
#except:
# print("tweet failed")
editlogger()
def main():
servecount, tries = 1, 1
rules = get_rules()
delete = delete_all_rules(rules)
set = set_rules(delete)
while True:
print(f"starting try: {tries}")
get_stream(set)
tries += 1
If this is important, my app.yaml file has only one line:
runtime: python38
and I deployed the app from cloud shell with gcloud app deploy app.yaml
What can I do?
I have searched and can't seem to find a solution. Also, this is my first time deploying an app sucessfully.
Thank you.
Google App Engine works on demand i.e. when it receives an HTTP(s) request.
Neither Warmup requests nor min_instances > 0 will meet your needs. A warmup tries to 'start up' an instance before your requests come in. A min_instance > 0 simply says not to kill the instance but you still need an http request to invoke the service (which is what you did by opening a browser tab and entering your Apps url).
You may ask - since you've 'started up' the instance by opening a browser tab, why doesn't it keep running afterwards? The answer is that every request to a Google App Engine (Standard) app must complete within 1 - 10 minutes (depending on the type of scaling) your App is using (see documentation). For Google App Engine Flexible, the timeout goes up to 60 minutes. This tells you that your service will timeout after at most 10 minutes on GAE standard or 60 minutes on GAE Flexible.
I think the best solution for you on GCP is to use Google Compute Engine (GCE). Spin up a virtual server (pick the lowest configuration so you can stick within the free tier). If you use GCE, it means you spin up a Virtual Machine (VM), deploy your code to it and kick off your code. Your code then runs continuously.
App Engine works on demand, i.e, only will be up if there are requests to the app (this is why when you click on the URL the app works). As well you can set 1 instance to be "running all the time" (min_instances) it will be an anti-pattern for what you want to accomplish and App Engine. Please read How Instances are Managed
Looking at your code you're pulling data every minute from Twitter, so the best option for you is using Cloud Scheduler + Cloud Functions.
Cloud Scheduler will call your Function and it will check if there is data to process, if not the process is terminated. This will help you to save costs because instead of have something running all the time, the function will only work the needed time.
On the other hand I'm not an expert with the Twitter API, but if there is a way that instead of pulling data from Twitter and Twitter calls directly your function it will be better since you can optimize your costs and the function will only run when there is data to process instead of checking every n minutes.
As an advice, first review all the options you have in GCP or the provider you'll use, then choose the best one for your use case. Just selecting one that works with your programming language does not necessarily will work as you expect like in this case.
I recently updated my cron.yaml file and now my cron tasks fail with no entries in the logs.
It is acting like the java servlet at the url is not being run.
I can paste the url into a browser and the servlet runs fine.
My cron.yaml file:
cron:
- description: Daily revenues report
url: /revenues
schedule: every day 07:35
timezone: America/Denver
Using below deploycron.sh
PROJECT_ID='my-project-id'
gcloud config set project ${PROJECT_ID}
gcloud info
gcloud app deploy cron.yaml
Is there an error in my .yaml?
Is there a special task queue set up required?
Is some other configuration or permissions piece missing?
It was running fine last week. I have tried deleting and starting over to no avail.
https://console.cloud.google.com/cloudscheduler?project=project-id
Shows the job. Result column 'Failed'.
Logs 'View' link shows:
protoPayload.taskName="01661931846119241031" protoPayload.taskQueueName="__cron"
with no log entries.
Is __cron not automatic?
I am at a loss.
App Engine Standard. Java 8.
After installing the latest update of GCloud locally and re-running the deploy cron script. The cron jobs now run as before. 02/02/2021.
'Failed' means that the endpoint /revenues is not returning a success http status code.
Logs 'View' link shows: protoPayload.taskName="01661931846119241031" protoPayload.taskQueueName="__cron" with no log entries
Maybe don't use the premade filter, and just try filtering for /revenues or viewing all the logs at 07:35 am (when it was supposed to have run)
Is there an error in my .yaml?
if there was then gcloud app deploy cron.yaml would fail
Is there a special task queue set up required?
you shouldn't need to do anything, i didn't
I can paste the url into a browser and the servlet runs fine.
When you paste the url into the browser, is there any redirecting (like from /revenues to /revenues/) or anything that your browser is handling for you. Maybe /revenues is expecting there to be cookies present now.
What are there any special app.yaml or dispatch.yaml rules that /revenues would be hitting?
Is /revenues being handled by a service other than the default service?
I had a similar problem: CRON tasks fail without any logs.
The root cause was that the IP address of App Engine was blocked by the App Engine Firewall. Thus I had to update the allow-list, as described here: https://cloud.google.com/appengine/docs/standard/nodejs/scheduling-jobs-with-cron-yaml#validating_cron_requests
I started having the same problem a few days ago on my existing CRON schedules. I've tried everything including tearing my code down to the bare minimum and creating a new GAE project with the Hello World quick start. It still fails. Nothing useful in the logs and the UI just says 'Failed'. I'm pulling my hair out.
Sorry I don't have an answer to contribute but your post makes me think it's on Google's side. I know they're moving CRON jobs to Cloud Scheduler->App Engine Cron Jobs. My gut tells me it's a permissions issue related to this move and IAM. I'm really at a loss.
I cannot get the appengine taskqueue to accept any context I throw at it:
import (
"context"
"google.golang.org/appengine"
"google.golang.org/appengine/taskqueue"
)
/* snip */
ctx:= context.Background()
task := taskqueue.NewPOSTTask("/b/mytask", params)
_, err = taskqueue.Add(ctx, task, "")
if err != nil {
return fmt.Errorf("adding background task with path %s: %v", task.Path, err)
}
I'm calling appengine.Main() in my main.go main func as stated by the go111 migration docs (But this line is missing in go112 migration docs so I'm not sure it's required).
I've tried:
context.Background()
request.Context()
appengine.NewContext(r)
appengine.BackgroundContext()
context.TODO()
All result in error:
not an App Engine context
except for appengine.BackgroundContext() which gets:
service bridge HTTP failed: Post
http://appengine.googleapis.internal:10001/rpc_http: dial tcp
169.254.169.253:10001: i/o timeout
I experienced the same problems when migrating a GAE standard project from go19 to go112 in order to use go modules. In addition I got a lot of "502 bad gateway" messages.
Replacing http.ListenAndServe() in main() with appengine.Main() fixed the context problem. Moving to go111 instead of 112 took care of the other issue. The docs and examples are not very clear on this.
The documentation for migration to 1.12 states:
Use Cloud Tasks to enqueue tasks from Go 1.12 using the cloudtasks package. You can use any App Engine service as the target of an App Engine task.
But the cloudtasks package documentation (as at today’s date) is clearly marked as beta and unstable. So the answer here is probably. This feature is unsupported.
That said, I’m using it in production under go111 without any serious issue that I have noticed so far.
You're seeing internal.flushLog: Flush RPC: service bridge HTTP failed because you have appengine.Main() or other appengine lib calls while trying to run a Go 1.12+ runtime. (My guess is that the older runtime had to call into some Google-internal accounting infrastructure and that's not available for the 1.12 "next gen" systems.)
The solution isn't to downgrade your Go versions -- you're missing a ton of performance and security improvements doing that, and you can't take advantage of the new hardware -- the solution is to remove all the calls to the appengine lib and use GCP's cloud libraries instead (see https://godoc.org/cloud.google.com/go)
Am doing development on google app engine and i want to print out values that are sent to the server from the browser. When i insert the print commands in my handlers nothing get printed on the console running the local server!
How do i get around this?
Gath
Rather than just printing out values, use the logging module. This is generally good practice anyway as it means you can potentially leave in some handy debug output even when you are running in production mode.
import logging
logging.error("This is an error message that will show in the console")
By default dev_appserver.py will not show any messages below logging-level INFO, but you can run dev_appserver.py with the --debug flag and then you will be able to see output from the lower logging.DEBUG level and scatter various logging calls in your code like:
logging.debug("some helpful debug info")
If you leave these in once your app is in production, then these will be visible in your app's admin area in the logs section.
As #Chris Farmiloe suggested you should use the logging module . Google appengine by default supports logging modlue. You can see various levels of logs for your applications in appengine console. Logging module comes in handy in the production enviroment. Where you can see all your logs.
For general info use
logging.info('This is just for information')
self.response.out.write("Hallo World")
Using Go Language (I don't know if this works with the other languages), you can do that through the appengine.Context.Debugf function
https://developers.google.com/appengine/docs/go/reference#Context
func yourfunction (r *http.Request){
c := appengine.NewContext(r)
c.Debugf("%v\n","Your message")
....
}
You may have to set the minimum log_level in the development server:
dev_appserver.py --log_level debug [your application]
When I want a quick and dirty way to print out some info, I do this:
assert False, "Helpful information"
after getting the needed info, I delete the line.