We have an application that is currently migrated to WebSphere 8.5 from WebSphere 6. The application uses EJB annotations and EJB timers. The timers are set to execute every 5 minutes. This feature was working for years without any problems on WebSphere 6. After migrating to WebSphere 8.5 the EJB timers are indefinitely getting triggered every millisecond as supposed to trigger every 5 minutes(a predefined value). Can anybody please help me find the root cause for this problem.
If you are using the same database tables before and after the migration, such that pre-existing timer tasks remain scheduled, and there was a period of time during which they were unable to run, the behavior you describe could be due to catching up on missed executions.
If this is is the case, try querying the table (documented here) for the NEXTFIRETIME. If the number of milliseconds represented by this value is for a date in the past, then you can expect to be running missed executions. One option would be letting it run and allowing it to catch up to the current time. Otherwise, you could cancel and reschedule the timer tasks.
Related
I'm developing my first App Engine Flexible Environment application.
The docs explain that virtual machines are restarted weekly:
VM instances are restarted on a weekly basis. During restarts
Google's management services will apply any necessary operating system
and security updates.
Will restarts result in downtime for apps with automatic scaling enabled? If so, are there any steps I can take to avoid downtime?
For example, I could frequently migrate traffic to new instances so that no instance runs for more than one week.
Well, Later I checked with the Google support team and here the recommendation from them to avoid the downtime.
My questions are:
The weekly update is not fixed in time. Maybe there is a range in time in which I should expect the reboot of the instances? (ie: every Friday during the night).
The weekly update involves all the instances, independently from when they were created? (ie: an instance created 1 hour or 1 day before the weekly update will be restarted?).
How do we suppose to handle such a problem? it returns 502 for all request in the meantime.
1.- At this moment there is no way to know when the weekly restart is going to happen. GCP determine when is necessary and it does the restart of certain instances (once per week).
2.- No, as long as you have more than 1 one instance running you won’t see all of them being restarted at the same time.
3.- What we recommend to avoid downtime due to weekly restarts is having more than 1 instance as a minimum instance. Try to set at least 2 instances as a minimum.
I hope, this information is useful to others.
The answer to your question is in the docs:
App Engine attempts to keep manual scaling instances running indefinitely, but there is no uptime guarantee. Hardware or software failures that cause early termination or frequent restarts can occur without warning and can take considerable time to resolve. Your application should be able to handle such failures.
Here are some good strategies for avoiding downtime due to instance restarts:
Use load balancing across multiple instances.
Configure more instances than required to handle normal traffic.
Write fall-back logic that uses cached results when a manual scaling instance is unavailable.
Reduce the amount of time it takes for your instances to start up and shutdown.
Duplicate the state information across more than one instance.
For long-running computations, checkpoint the state from time to time so you can resume it if it doesn't complete.
On the .NET Framework version 4
I'm seeing a possible concurrency issue in the SQL Server ADO.NET implementation on mono 4.2.2 that manifests when queries are cancelled or time out on the client, using the SqlCommand.ExecuteReader api.
To reproduce the issue seen in the field:
I run start 3 new timed tasks concurrently every second that run 3 - 5 relatively small queries that complete and return (all using SqlCommand.ExecuteReader), this runs as expected.
Then I add a long running query to the test run, set to execute every 65 seconds but cancel after 60 seconds. It would take longer than 60 seconds for the query to complete so gets cancelled every time (using SqlCommand.Cancel()).
After running for several minutes, suddenly most of the attempts to iterate the SqlDataReader returned error because the expected fields are not present on the returned rows, so when the data layer tries to access them by name, there is an exception.
Adding logging code to print the fields on the row indicate that they are from another query that is being run as part of the test, so one that is either running concurrently or very recently.
Once this problem occurs for one query, it happens very frequently indeed, in fact most queries fail. In the field, even services that were only trying to service 5 or so queries a minute were returning the wrong recordsets back for most queries.
Restarting the process fixes the problem.
FYI
A new connection, command and reader object are instantiated per query, and are used withing their own 'using' blocks.
Default connection pooling for ADO.NET is being used
Most connections are to the same DB, there is a seperate connection made to another DB on another server once per task run but this is always completed successfully.
The code is mature and in use in production on Windows .NET framework systems without issue, and running the same tests on Windows .NET framework cannot reproduce the problem, so it is unlikely to be an issue across both platforms.
Has anyone else seen this and can tell me what I might be doing wrong? Would simply disabling connection pooling be a (temporary) workaround for this problem?
Following further testing, explicitly disabling connection pooling does in fact work around this issue, but of course this comes with additional overhead, especially for applications with a high frequency of queries
I have 2 instances running independently and using the same database. I want to run the timer on one instance and disable on another instance. What should i do to achieve?
I have also tried to configure my batch to run only on one instance. Unfortunately, I am not aware of a way to explicitely disable the batch on certain nodes.
But as shi suggests, it is possible to keep you batch processes on all instances and synchronize them via DB which has e.g. the failover advantage. However, for EJB timer this is available only in Widfly 9 (see the issue).
I solved it by using Quartz Scheduler in clustered configuration which uses an approach very similar to the clustered EJB timers.
I am using Google App Engine Task push queues to schedule future tasks that i'd like to occur within second precision of their scheduled time.
Typically I would schedule a task 30 seconds from now, that would trigger a change of state in my system, and finally schedule another future task.
Everything works fine on my local development server.
However, now that I have deployed to the GAE servers, I notice that the scheduled tasks run late. I've seen them running even two minutes after they have been scheduled.
From the task queues admin console, it actually says for the ETA:
ETA: "2013/11/02 22:25:14 0:01:38 ago"
Creation Time: "2013/11/02 22:24:44 0:02:08 ago"
Why would this be?
I could not find any documentation about the expectation and precision of tasks scheduled by ETA.
I'm programming in python, but I doubt this makes any difference.\
In the python code, the eta parameter is documented as follows:
eta: A datetime.datetime specifying the absolute time at which the task
should be executed. Must not be specified if 'countdown' is specified.
This may be timezone-aware or timezone-naive. If None, defaults to now.
My queue Settings:
queue:
- name: mgmt
rate: 30/s
The system is under no load what so ever, except for 5 tasks that should run every 30 seconds or so.
UPDATE:
I have found https://code.google.com/p/googleappengine/issues/detail?id=4901 which is an accepted feature request for timely queues although nothing seems to have been done about it. It accepts the fact that tasks with ETA can run late even by many minutes.
What other alternative mechanisms could I use to schedule a trigger with second-precision?
GAE makes no guarantees about clock synchronization within and across their data centers; see UTC Time on Google App engine? for a related discussion. So you can't even specify the absolute time accurately, even if they made the (different) guarantee that tasks are executed within some tolerance of the target time.
If you really need this kind of precision, you could consider setting up a persistent GAE "backend" instance that synchronizes itself with a trusted external clock, and provides task queuing and execution services.
(Aside: Unfortunately, that approach introduces a single point of failure, so to fix that you could just take the next steps and build a whole cluster of these backends... But at that point you may as well look elsewhere than GAE, since you're moving away from the GAE "automatic transmission" model, toward AWS's "manual transmission" model.)
I reported the issue to the GAE team and I got the following response:
This appears to be an isolation issue. Short version: a high-traffic user is sharing underlying resources and crowding you out.
Not a very satisfying response, I know. I've corrected this instance, but these things tend to revert over time.
We have a project in the pipeline that will correct the underlying issue. Deployment is expected in January or February of 2014.
See https://code.google.com/p/googleappengine/issues/detail?id=10228
See also thread: https://code.google.com/p/googleappengine/issues/detail?id=4901
After they "corrected this instance" I did some testing for a few hours. The situation improved a little especially for tasks without ETA. But for tasks with ETA I still see at least half of them running at least 10 seconds late. This is far from reliable for my requirements
For now I decided to use my own scheduling service on a different host, until the GAE team "correct the underlying issue" and have a more predictable task scheduling system.
I am using Task Queue in GAE for performing some background work for my application. I have come to know that there is a 10 minute time limit for a particular task. My concern is how do I test this thing in my local environment. I tried thread sleep but it didn't throw any exception as mentioned in google app engine docs. Also is this time limit is measured by CPU time or the actual time.
Thanks.
The time is measured in wall clock time. The development server doesn't enforce time limits, although it's unclear why you'd want to test it because it's unlikely your tests will perform the same as they will in production, so trying to guess how much you'll be able to accomplish in 10 minutes on the production servers by seeing how much you can accomplish in 10 minutes on the development server will fail horribly.
For your development server, start a timer when a task is initiated. keep checking in your code if you reached 10 mins wall clock time. When you reach, throw a DeadlineExceededError. It would be better to have the try and except statements in the class handlers which call a particular function of your code.