I'm considering moving from AppEngine to EC2/Elastic Beanstalk as I need my servers located within the EU [AppEngine doesn't offer a server location option AFAIK]. I've run the Elastic Beanstalk sample application, which is good as far as it goes; however one of the AppEngine features I rely on heavily is the offline task queues / cron facility, as I periodically fetch a lot of data from other sites. I'm wondering what I would need to setup on Elastic Beanstalk / EC2 to replicate this task queue facility, whether there are any best practices yet, how much work it would take etc.
Thanks!
A potential problem with cron services in Beanstalk is that a given scheduled command might be invoked by more than one service if the application is running on more than one instance. Coordination is needed between the running Tomcat instances to ensure that jobs are run by only one, and that if one of them dies the cron service doesn't get interrupted.
How I'm implementing it is like this:
Package the cron job "config file" with the WAR. This file should contain frequencies and URLs (as each actual cron is simply an invocation of a specific URL, as AE does it)
Use a single database table to maintain coordination. It requires at least two columns.
a primary or unique key which (string) to hold the command along with its frequency. (e.g. "#daily http://your-app/some/cron/handler/url")
a second column which holds the last execution time.
each tomcat instance will run a cron thread which should read the configuration from the WAR and schedule itself to sleep as long as needed until the next service invocation. once the time hits, the instance should first attempt to "claim" the invocation by first grabbing the last invocation time for that command from the database, then updating it to get the "lock".
query(SELECT last_execution_time FROM crontable WHERE command = ?)
if(NOW() - last_execution_time < reasonable window) skip;
query(UPDATE crontable SET last_execution_time = NOW() WHERE command = ? AND last_execution_time = ?)
if(number of rows updated == 0) skip;
run task()
The key element here is that we also include the last_execution_time in the WHERE clause, ensuring that if some other instance updates it between when we SELECT and UPDATE, the update will return that no rows were affected and this instance will skip executing that task.
If you're moving your app, you're probably better off simply using TyphoonAE or AppScale. Both are alternate environments in which you can run your App Engine app unmodified, and both support EC2.
Related
On our development environment, we have been charged about USD100 every month for an instance we didn't know existed (and of course we are not using), and we cant find in the entire AppEngine or Console Engine.
Also, the usage report shows no activity for the whole month, but we are still getting the charges.
The instance is: Flex Instance Core Hours Sao Paulo
I found similar posts in stackoverflow, so, here are the questions:
- is this some bad strategy from Google???
- where can I see this instance to stop it or delete it?
- where can I see who started this instance and when?
Of course, I called google support and no answer received.
Many thanks!
Google Cloud Platform Support here! I found your ticket and see that you were provided an answer there already. In addition to what Dan described in his answer, if your app has currently the "Serving" status it will still run with the corresponding instances regardless of any requests incoming or not. As long as the version is serving it will continue to bill for hours that you are using. Also, if you are using automatic scaling with a minimum number of instances:
that specified number of instances run as resident instances while any
additional instances are dynamic
(Instance scaling description in GCP docs)
You can use basic or manual scaling if this is not what you're interested in.
Check the App Engine Versions pages for all your projects, you should find at least one with Flexible environment. The Deployed column should indicate who deployed it and when.
Based on that information you can decide to keep or delete the respective version(s). Simply stopping the instance may not be sufficient, depending on the scaling configuration for that service version GAE may automatically start one or more new instances.
You should also check the App Engine Instances pages for your projects and cross-reference that with the versions info to make sure no undesired instances are accidentally left behind (at least in the standard environment they are normally stopped when the respective versions are deleted, not entirely certain the same is true for the flex environment)
The running flexible environment instances are billed by the hour, even if they receive no requests, which could explain why you're seeing charges without any activity.
Apparently, the source of this instance was a firebase setting we made to make some test, and it automatically creates this instance. I shut off the billing account for this space, and instantly I received an email from firebase saying it detected changes that make some functions unavailable.
Background
From the docs, at https://github.com/GoogleCloudPlatform/gradle-appengine-plugin
I see that by putting my functionalTests in /src/functionalTests/java does the following:
Starts the Local GAE instance
runs tests in the functionalTests directory
Stops the Local instance after the tests are complete
My Issue
For my microservices, I need to have 2 local servers for running my tests. 1 server is responsible for a lot of auth operations, and the other microservices talk to this server for some verification operations.
I've tried
appengineFunctionalTest.dependsOn ':authservice:appengineRun'
this does start the dependent server, but then it hangs and the tests don't continue. I see that I can set deamon = true and start the server on a background thread, but I can only seem to do that in isolation.
Is there a way to have a 'dependsOn' also be able to pass parameters to the dependent task? I haven't found a way to make that happen.
Or perhaps there is another way to accomplish this.
Any help appreciated
I have developed an app in Twilio which I would like to run from the cloud. I tried learning about AWS and Google App Engine but am quite confused at this stage:
I have 2 questions which I hope to get your help on:
1) How can I store my scripts and database in the cloud? Right now, everything is running out of my local machine but I would like to transfer the scripts and db to another server and run my app at a predetermined time of day. What would be the best way to do this?
2) How can I write a batch file to run my app at a predetermined time of day in the cloud?
I understand this does not have code, but I really hope someone can point me to the right direction. I have spent lots of time trying to understand this myself but still am unsure. Tks in adv.
Update: The application is a Twilio app that makes calls to people, the script simply applies an algorithm to make calls in a certain fashion and the database is a mysql db that provides the details of people to be called.
This is quite difficult to provide an exact answer without understanding what is the application, what is the DB or what is the script that you wish to run.
I can give you a couple of ideas that might be helpful in such cases.
OpsWorks (http://aws.amazon.com/opsworks/) is a managed service for managing applications. You can define your stack (multiple layers like web, workers, DB...) and what are the chef recipes that should run in various points in the life of the instances in each layer (startup, shutdown, app deployment or stack modification..). Then you can use the ability to add instances to each layer in specific days and hours, to implement the functionality of running at predetermined times as you requested.
In such a solution you can either have some of your instances (like DB) always on, or even to bootstrap them using the chef recipes every day, with restore from snapshot on start and create snapshot on shutdown.
Another AWS service that you use is Data Pipeline (http://aws.amazon.com/datapipeline/). It is designed to move data periodically between data sources, for example from a MySQL database to Amazon Redshift, the Data warehouse service. But you can use it to trigger scripts and run random shell scripts that you wish (http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-shellcommandactivity.html), and schedule it to run in various conditions like every hour/day or specific times (http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-concepts-schedules.html).
A simple path here would be just to create an EC2 instance in AWS, and put the components needed to run your app there. A thorough walk through is here:
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/get-set-up-for-amazon-ec2.html
Essentially you will create an EC2 virtual machine, which you can for most purposes treat just like any other Linux server. You can install MySQL on it, copy your script there, and run it. Of course whatever container or support libraries your code requires will need to be installed as well.
You don't say what OS you are using locally, but if it is Mac or Linux, you should be able to follow almost the same process to get your script running on an EC2 instance that you used on your local machine.
As you get to know AWS, there are sophisticated services you can use for deployment, infrastructure orchestration, database services, and so on. But just to get started running a script from a virtual machine should be pretty straightforward.
I recently developed a Twilio application using Ruby on Rails for the backend and found Heroku extremely simple to setup and launch. While Heroku does cost more than AWS, I found that the time I saved using Heroku more than made up this. As an early stage startup, we wanted to spend our time developing important features, and not "wasting" time optimizing our AWS cloud.
However, while I believe Heroku is ideal for early-stage websites/startups I do believe hosting should be reevaluated once a company reaches a certain size. At some point it becomes economically viable to devote resources into optimizing an AWS cloud solution because it will be cheaper than Heroku in the long run.
The default configuration for solr of /admin/ping provided for load balancer health check integrates well with the Amazon ELB load balancer health checks.
However since we're using master-slave replication when we provision a new node, Solr starts up, and replication happens, but in the meantime /admin/ping return success before the index has replicated across from master and there are documents.
We'd like nodes to only be brought live once they have done the first replication and have documents. I don't see any way of doing this with /admin/ping PingRequestHandler - it always return success if the search succeeds, even with zero results.
Nor is there anyway of matching/not matching expected text in the response with the ELB health check configuration.
How can we achieve this?
To expand on the nature of the problem here, the PingRequestHandler will always return a success unless....
Its query results in an exception being thrown.
It is configured to use a healthcheck file, and that file is not found.
Thus my suggestion is that you configure the PingRequestHandler handler to use a healthcheck file. You can then use a cron job on your Solr system whose job is to check for the existence of documents and create (or remove) the healthcheck file accordingly. If the healthcheck file is not present, the PingRequestHandler will throw a HTTP 503 which should be sufficient for ELB.
The rough algorithm that I'd use...
Every minute, query http://localhost:8983/solr/select?q=*:*
If numDocs > 0 then touch /path/to/solr-enabled
Else rm /path/to/solr-enabled (optional, depending on your strictness)
The healthcheck file can be configured in the <admin> block, and you can use an absolute path, or a filename relative to the directory from which you have started Solr.
<admin>
<defaultQuery>solr</defaultQuery>
<pingQuery>q=*:*</pingQuery>
<healthcheck type="file">/path/to/solr-enabled</healthcheck>
</admin>
Let me know how that works out! I'm tempted to implement something similar for read slaves at Websolr.
I ran into an interesting solution here: https://jobs.zalando.com/tech/blog/zookeeper-less-solr-architecture-aws/?gh_src=4n3gxh1
It's basically a servlet that you could add to the Solr webapp and then check all of the cores to make sure they have documents.
I'm toying with a more sophisticated solution but haven't tested it/made much progress yet: https://gist.github.com/er1c/e261939629d2a279a6d74231ce2969cf
What I like about this approach (in theory) is the ability to check the replication status/success for multiple cores. If anyone finds an actual implementation of this approach please let me know!
Recently I've started using limited staging on my Google App Engine project. The data is still shared between all versions, but behaviour (especially user facing behaviour) is different.
Naturally when I implement something incredibly new it only runs on the latest version of my code and I don't feel like it should be backported to the older versions.
Some of this new functionality requires cron jobs to be run periodically, but I'm hitting a problem. I have to run a cron job to call the latest code, but this is what Google's documentation has to say about the issue:
Cron requests are always sent to the default version of the application.
The default version is the oldest because the first versions of the client code that went out to users weren't future proof and don't know how to select which API version to call.
So my question is, how can I get around this limitation and make a cron job that will call the latest rather than the default version of the application?
You can now specify a version using the target tag.
<target>version-2</target>
You can't change the cron jobs to run on a different version then the default.
Depending on how much time your cron job takes to run you could change your cron job script to to do a URLFetch to "http://latest.appname.appspot.com/cron_job_endpoint".
If you're cron job takes longer then 10 minutes to run, then I would design it in a way that you can chain the different tasks using task queues.