How to use Google App Engine Cron jobs in a cheaper way? - google-app-engine

I have a twitter bot which tweets out content a few times a day. here is my cron.yaml file.
cron:
- description: "twitter instagram scraper"
url: /scrape/twitter_intra
schedule: every 24 hours
target: scraper
- description: "USD to LKR scraper"
url: /scrape/exRates
schedule: every 1 hours
target: scraper
- description: "last 24 hour weather"
url: /scrape/weather_last24hours
schedule: every day 12:00
target: scraper
- description: "tweet out last 24 hour weather"
url: /tweet/weather_last24hours
schedule: every day 13:00
target: twitter
- description: "tweet out exchange Rate USD to LKR"
url: /tweet/exRates
schedule: every day 7:00
target: twitter
here is an example of one request method,
app.get(`/tweet/weather_last24hours`, async (req, res, next) => {
console.log(`Tweet!! last 24 hours`);
try {
//await tweetText('This is a test');
const report = await getWeatherLast24Hours();
const content = makeTweetLast24HourWeather(report);
console.log(content);
await tweetText(content);
res.status(200)
.set('Content-Type', 'text/plain')
.send(`Completed Successfully...!`)
.end();
} catch (error) {
next(error);
}
});
right now It's working fine except it cost me 2.2$ a day because I have to keep an instance up all day by doing the following.
const PORT = process.env.PORT || 8080;
app.listen(PORT, () => {
console.log(`App listening on port ${PORT}`);
console.log('Press Ctrl+C to quit.');
});
I try removing this part of the code assuming that GCP cron job can up an instance on its own and run the task. Even though cron job itself was able to up the instance, Task was a failure with 500 Error code and following message.
This request caused a new process to be started for your application,
and thus caused your application code to be loaded for the first time.
This request may thus take longer and use more CPU than a typical
request for your application.
error message in the logs. I try it a few times and got the same result.
Is there a solution to this, by starting an instance just before cron job executes?
Can newly introduced cloud scheduler can help?
am I doing something wrong?
Thank you in advance.

Your 500 is a warning to tell you something you already know: that bootup time will be longer due to needing to warm up a new instance. This is not necessarily a problem in itself, unless you notice that your tasks are not running properly.
To use Cloud Scheduler, consider decomposing your application into Cloud Functions. You can build a separate function for each of your 5 endpoints, plus a sixth that contains the shared logic (e.g. getWeatherLast24Hours()). You can then invoke them on a schedule using Cloud Scheduler.
The cost of running with Cloud Functions + Scheduler will be near-zero, so you'll want to do your own ROI evaluation to determine whether the development effort is worth the savings.

Related

Google Cloud Run pubsub pull listener app fails to start

I'm testing pubsub "pull" subscriber on Cloud Run using just listener part of this sample java code (SubscribeAsyncExample...reworked slightly to fit in my SpringBoot app):
https://cloud.google.com/pubsub/docs/quickstart-client-libraries#java_1
It fails to startup during deploy...but while it's trying to start, it does pull items from the pubsub queue. Originally, I had an HTTP "push" receiver (a #RestController) on a different pubsub topic and that worked fine. Any suggestions? I'm new to Cloud Run. Thanks.
Deploying...
Creating Revision... Cloud Run error: Container failed to start. Failed to start and then listen on the port defined
by the PORT environment variable. Logs for this revision might contain more information....failed
Deployment failed
In logs:
2020-08-11 18:43:22.688 INFO 1 --- [ main] o.s.web.context.ContextLoader : Root WebApplicationContext: initialization completed in 4606 ms
2020-08-11T18:43:25.287759Z Listening for messages on projects/ce-cxmo-dev/subscriptions/AndySubscriptionPull:
2020-08-11T18:43:25.351650801Z Container Sandbox: Unsupported syscall setsockopt(0x18,0x29,0x31,0x3eca02dfd974,0x4,0x28). It is very likely that you can safely ignore this message and that this is not the cause of any error you might be troubleshooting. Please, refer to https://gvisor.dev/c/linux/amd64/setsockopt for more information.
2020-08-11T18:43:25.351770555Z Container Sandbox: Unsupported syscall setsockopt(0x18,0x29,0x12,0x3eca02dfd97c,0x4,0x28). It is very likely that you can safely ignore this message and that this is not the cause of any error you might be troubleshooting. Please, refer to https://gvisor.dev/c/linux/amd64/setsockopt for more information.
2020-08-11 18:43:25.680 WARN 1 --- [ault-executor-0] i.g.n.s.i.n.u.internal.MacAddressUtil : Failed to find a usable hardware address from the network interfaces; using random bytes: ae:2c:fb:e7:92:9c:2b:24
2020-08-11T18:45:36.282714Z Id: 1421389098497572
2020-08-11T18:45:36.282763Z Data: We be pub-sub'n in pull mode2!!
Nothing else after this and the app stops running.
#Component
public class AndyTopicPullRecv {
public AndyTopicPullRecv()
{
subscribeAsyncExample("ce-cxmo-dev", "AndySubscriptionPull");
}
public static void subscribeAsyncExample(String projectId, String subscriptionId) {
ProjectSubscriptionName subscriptionName =
ProjectSubscriptionName.of(projectId, subscriptionId);
// Instantiate an asynchronous message receiver.
MessageReceiver receiver =
(PubsubMessage message, AckReplyConsumer consumer) -> {
// Handle incoming message, then ack the received message.
System.out.println("Id: " + message.getMessageId());
System.out.println("Data: " + message.getData().toStringUtf8());
consumer.ack();
};
Subscriber subscriber = null;
try {
subscriber = Subscriber.newBuilder(subscriptionName, receiver).build();
// Start the subscriber.
subscriber.startAsync().awaitRunning();
System.out.printf("Listening for messages on %s:\n", subscriptionName.toString());
// Allow the subscriber to run for 30s unless an unrecoverable error occurs.
// subscriber.awaitTerminated(30, TimeUnit.SECONDS);
subscriber.awaitTerminated();
System.out.printf("Async subscribe terminated on %s:\n", subscriptionName.toString());
// } catch (TimeoutException timeoutException) {
} catch (Exception e) {
// Shut down the subscriber after 30s. Stop receiving messages.
subscriber.stopAsync();
System.out.printf("Async subscriber exception: " + e);
}
}
}
Kolban question is very important!! With the shared code, I would like to say "No". The Cloud Run contract is clear:
Your service must answer to HTTP request. Out of request, you pay nothing and no CPU is dedicated to your instance (the instance is like a daemon when no request is processing)
Your service must be stateless (not your case here, I won't take time on this)
If you want to pull your PubSub subscription, create an endpoint in your code with a Rest controller. While you are processing this request, run your pull mechanism and process messages.
This endpoint can be called by Cloud Scheduler regularly to keep the process up.
Be careful, you have a max request processing timeout at 15 minutes (today, subject to change in a near future). So, you can't run your process more than 15 minutes. Make it resilient to fail and set your scheduler to call your service every 15 minutes

how to reuse config in cron.yaml?

I have a lot of cron jobs with same config. I want to use vars to reuse some configs.
Here is my try.
cron.yaml:
cron:
- description: 'a'
url: /cron/events/a/b
schedule: &schedule every 1 hours
target: &target reuse-cron-config
- description: 'b'
url: /cron/events/a/c
schedule: *schedule
target: *target
But when I ran gcloud app deploy ./cron.yaml. It thrown an error:
ERROR: (gcloud.app.deploy) An error occurred while parsing file: [/Users/ldu020/workspace/nodejs-gcp/src/app-engine/standard-environment/reuse-cron-config/cron.yaml]
Anchors not supported in this handler
in "/Users/ldu020/workspace/nodejs-gcp/src/app-engine/standard-environment/reuse-cron-config/cron.yaml", line 4, column 15
All of my cron jobs have same target and schedule. How can I solve this? thanks.
update
I have a route like this to get params for each cron url:
app.get('/cron/events/:topic/:retryTopic', (req, res) => {
console.log(req.params); // {topic: 'a', retryTopic: 'b'}
})
You could wrap up all of these cron entries into a single entry called 'Hourly tasks' or 'daily tasks' and then the request handler could then launch all of these tasks via the task queue.
This would also help you stay well under the the cap imposed on the total number of cron tasks youre allowed to have
https://cloud.google.com/appengine/docs/standard/python/config/cronref#limits
Free applications can have up to 20 scheduled tasks. Paid applications can have up to 250 scheduled tasks.

AWS: Why does my RDS instance keep starting after I turned it off?

I have an RDS database instance on AWS and have turned it off for now. However, every few days it starts up on its own. I don't have any other services running right now.
There is this event in my RDS log:
"DB instance is being started due to it exceeding the maximum allowed time being stopped."
Why is there a limit to how long my RDS instance can be stopped? I just want to put my project on hold for a few weeks, but AWS won't let me turn off my DB? It costs $12.50/mo to have it sit idle, so I don't want to pay for this, and I certainly don't want AWS starting an instance for me that does not get used.
Please help!
That's a limitation of this new feature.
You can stop an instance for up to 7 days at a time. After 7 days, it will be automatically started. For more details on stopping and starting a database instance, please refer to Stopping and Starting a DB Instance in the Amazon RDS User Guide.
You can setup a cron job to stop the instance again after 7 days. You can also change to a smaller instance size to save money.
Another option is the upcoming Aurora Serverless which stops and starts for you automatically. It might be more expensive than a dedicated instance when running 24/7.
Finally, there is always Heroku which gives you a free database instance that starts and stops itself with some limitations.
You can also try saving the following CloudFormation template as KeepDbStopped.yml and then deploy with this command:
aws cloudformation deploy --template-file KeepDbStopped.yml --stack-name stop-db --capabilities CAPABILITY_IAM --parameter-overrides DB=arn:aws:rds:us-east-1:XXX:db:XXX
Make sure to change arn:aws:rds:us-east-1:XXX:db:XXX to your RDS ARN.
Description: Automatically stop RDS instance every time it turns on due to exceeding the maximum allowed time being stopped
Parameters:
DB:
Description: ARN of database that needs to be stopped
Type: String
AllowedPattern: arn:aws:rds:[a-z0-9\-]+:[0-9]+:db:[^:]*
Resources:
DatabaseStopperFunction:
Type: AWS::Lambda::Function
Properties:
Role: !GetAtt DatabaseStopperRole.Arn
Runtime: python3.6
Handler: index.handler
Timeout: 20
Code:
ZipFile:
Fn::Sub: |
import boto3
import time
def handler(event, context):
print("got", event)
db = event["detail"]["SourceArn"]
id = event["detail"]["SourceIdentifier"]
message = event["detail"]["Message"]
region = event["region"]
rds = boto3.client("rds", region_name=region)
if message == "DB instance is being started due to it exceeding the maximum allowed time being stopped.":
print("database turned on automatically, setting last seen tag...")
last_seen = int(time.time())
rds.add_tags_to_resource(ResourceName=db, Tags=[{"Key": "DbStopperLastSeen", "Value": str(last_seen)}])
elif message == "DB instance started":
print("database started (and sort of available?)")
last_seen = 0
for t in rds.list_tags_for_resource(ResourceName=db)["TagList"]:
if t["Key"] == "DbStopperLastSeen":
last_seen = int(t["Value"])
if time.time() < last_seen + (60 * 20):
print("database was automatically started in the last 20 minutes, turning off...")
time.sleep(10) # even waiting for the "started" event is not enough, so add some wait
rds.stop_db_instance(DBInstanceIdentifier=id)
print("success! removing auto-start tag...")
rds.add_tags_to_resource(ResourceName=db, Tags=[{"Key": "DbStopperLastSeen", "Value": "0"}])
else:
print("ignoring manual database start")
else:
print("error: unknown database event!")
DatabaseStopperRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Action:
- sts:AssumeRole
Effect: Allow
Principal:
Service:
- lambda.amazonaws.com
ManagedPolicyArns:
- arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
Policies:
- PolicyName: Notify
PolicyDocument:
Version: '2012-10-17'
Statement:
- Action:
- rds:StopDBInstance
Effect: Allow
Resource: !Ref DB
- Action:
- rds:AddTagsToResource
- rds:ListTagsForResource
- rds:RemoveTagsFromResource
Effect: Allow
Resource: !Ref DB
Condition:
ForAllValues:StringEquals:
aws:TagKeys:
- DbStopperLastSeen
DatabaseStopperPermission:
Type: AWS::Lambda::Permission
Properties:
Action: lambda:InvokeFunction
FunctionName: !GetAtt DatabaseStopperFunction.Arn
Principal: events.amazonaws.com
SourceArn: !GetAtt DatabaseStopperRule.Arn
DatabaseStopperRule:
Type: AWS::Events::Rule
Properties:
EventPattern:
source:
- aws.rds
detail-type:
- "RDS DB Instance Event"
resources:
- !Ref DB
detail:
Message:
- "DB instance is being started due to it exceeding the maximum allowed time being stopped."
- "DB instance started"
Targets:
- Arn: !GetAtt DatabaseStopperFunction.Arn
Id: DatabaseStopperLambda
It has worked for at least one person. If you have issues please report here.

Google Cloud Storage (gcs) Error 200 on non-final Chunk

I'm running into the following error when running an export to CSV job on AppEngine using the new Google Cloud Storage library (appengine-gcs-client). I have about ~30mb of data I need to export on a nightly basis. Occasionally, I will need to rebuild the entire table. Today, I had to rebuild everything (~800mb total) and I only actually pushed across ~300mb of it. I checked the logs and found this exception:
/task/bigquery/ExportVisitListByDayTask
java.lang.RuntimeException: Unexpected response code 200 on non-final chunk: Request: PUT https://storage.googleapis.com/moose-sku-data/visit_day_1372392000000_1372898225040.csv?upload_id=AEnB2UrQ1cw0-Jbt7Kr-S4FD2fA3LkpYoUWrD3ZBkKdTjMq3ICGP4ajvDlo9V-PaKmdTym-zOKVrtVVTrFWp9np4Z7jrFbM-gQ
x-goog-api-version: 2
Content-Range: bytes 4718592-4980735/*
262144 bytes of content
Response: 200 with 0 bytes of content
ETag: "f87dbbaf3f7ac56c8b96088e4c1747f6"
x-goog-generation: 1372898591905000
x-goog-metageneration: 1
x-goog-hash: crc32c=72jksw==
x-goog-hash: md5=+H27rz96xWyLlgiOTBdH9g==
Vary: Origin
Date: Thu, 04 Jul 2013 00:43:17 GMT
Server: HTTP Upload Server Built on Jun 28 2013 13:27:54 (1372451274)
Content-Length: 0
Content-Type: text/html; charset=UTF-8
X-Google-Cache-Control: remote-fetch
Via: HTTP/1.1 GWA
at com.google.appengine.tools.cloudstorage.oauth.OauthRawGcsService.put(OauthRawGcsService.java:254)
at com.google.appengine.tools.cloudstorage.oauth.OauthRawGcsService.continueObjectCreation(OauthRawGcsService.java:206)
at com.google.appengine.tools.cloudstorage.GcsOutputChannelImpl$2.run(GcsOutputChannelImpl.java:147)
at com.google.appengine.tools.cloudstorage.GcsOutputChannelImpl$2.run(GcsOutputChannelImpl.java:144)
at com.google.appengine.tools.cloudstorage.RetryHelper.doRetry(RetryHelper.java:78)
at com.google.appengine.tools.cloudstorage.RetryHelper.runWithRetries(RetryHelper.java:123)
at com.google.appengine.tools.cloudstorage.GcsOutputChannelImpl.writeOut(GcsOutputChannelImpl.java:144)
at com.google.appengine.tools.cloudstorage.GcsOutputChannelImpl.waitForOutstandingWrites(GcsOutputChannelImpl.java:186)
at com.moose.task.bigquery.ExportVisitListByDayTask.doPost(ExportVisitListByDayTask.java:196)
The task is pretty straightforward, but I'm wondering if there is something wrong with the way I'm using waitForOutstandingWrites() or the way I'm serializing my outputChannel for the next task run. One thing to note, is that each task is broken into daily groups, each outputting their own individual file. The day tasks are scheduled to run 10 minutes apart concurrently to push out all 60 days.
In the task, I create a PrintWriter like so:
OutputStream outputStream = Channels.newOutputStream( outputChannel );
PrintWriter printWriter = new PrintWriter( outputStream );
and then write data out to it 50 lines at a time and call the waitForOutstandingWrites() function to push everything over to GCS. When I'm coming up to the open-file limit (~22 seconds) I put the outputChannel into Memcache and then reschedule the task with the data iterator's cursor.
printWriter.print( outputString.toString() );
printWriter.flush();
outputChannel.waitForOutstandingWrites();
This seems to be working most of the time, but I'm getting these errors which is creating ~corrupted and incomplete files on the GCS. Is there anything obvious I'm doing wrong in these calls? Can I only have one channel open to GCS at a time per application? Is there some other issue going on?
Appreciate any tips you could lend!
Thanks!
Evan
A 200 response indicates that the file has been finalized. If this occurs on an API other than close, the library throws an error, as this is not expected.
This is likely occurring do to the way you are rescheduling the task. It may be that when you reschedule the task, the task queue is duplicating the delivery of the task for some reason. (This can happen) and if there are no checks to prevent this, there could be two instances attempting to write to the same file at the same time. When one closes the file the other sees an error. The net result is a corrupt file.
The simple solution is not to re-schedule the task. There is no time limit on how long a file can be held open with the GCS client. (Unlike the deprecated Files API.)

Google App Engine: Cron handler is called by cron, but no code is run

I have a following problem. I have defined a cron job in Google App Engine, but my get method is not called (or to be precise it is called every other time - if I run it manually it doesn't do anything at the first time, but at the second it works flawlessly). This is output from logging for the call made by cron:
2011-07-04 11:39:08.500 /suggestions/ 200 489ms 70cpu_ms 0kb AppEngine-Google; (+http://code.google.com/appengine)
0.1.0.1 - - [04/Jul/2011:11:39:08 -0700] "GET /suggestions/ HTTP/1.1" 200 0 - "AppEngine-Google; (+http://code.google.com/appengine)" "bazinga-match.appspot.com" ms=489 cpu_ms=70 api_cpu_ms=0 cpm_usd=0.001975 queue_name=__cron task_name=a449e27ff383de24ff8fc5d5f05f2aae
As you can see it makes GET request on /suggestions/, but nothing happens, including my log messages (they are printed, when I run it second time manually). Do you have any idea, why this might be happening?
My handler:
class SuggestionsHandler(RequestHandler):
def get(self):
logging.debug('Creating suggestions')
for key in db.Query(User, keys_only=True).order('last_suggestion'):
make_suggestion(key)
logging.debug('Done creating suggestions')
print
print('Done creating suggestions')
This is my cron.yaml:
cron:
- description: daily suggestion creation
url: /suggestions/
schedule: every 6 hours
and proper section of my app.yaml:
- url: /suggestions/
script: cron.py
login: admin
You're missing this declaration from the bottom of your handler file, after main:
if __name__ == '__main__':
main()
The first time a handler script is run, App Engine simply imports it, and this snippet runs your main in that situation. On subsequent requests, App Engine runs the main function, if you defined one.

Resources