wildfly 10 final: Error invoking timeout for timer - timer

i have
#Stateless
public class TimerMonitoraggioDatabase {
#Schedule(hour="5", minute="10", dayOfWeek="Mon-Fri",
dayOfMonth="*", month="*", year="*", info="MyTimer", persistent=false)
private void scheduledTimeout(final Timer t) {
but if the activity exceeds 10 minutes, i have this error (first problem):
2017-03-20 05:20:51,097 WARN [com.arjuna.ats.arjuna] (EJB default -
1) ARJUNA012077: Abort called on already aborted atomic action
0:ffff0a93a0e9:-c2465a:58cbcab4:37e3 2017-03-20 05:20:51,099 ERROR
[org.jboss.as.ejb3.timer] (EJB default - 1) WFLYEJB0020: Error
invoking timeout for timer: [id=e2ecbbbf-f339-431d-a031-4f02ea8f67fb
timedObjectId=Utopia-ear.Utopia-ejb.TimerMonitoraggioDatabase
auto-timer?:true persistent?:false
timerService=org.jboss.as.ejb3.timerservice.TimerServiceImpl#188c2824
initialExpiration=null intervalDuration(in milli sec)=0
nextExpiration=Tue Mar 21 05:10:00 CET 2017 timerState=IN_TIMEOUT
info=MyTimer]: javax.ejb.EJBTransactionRolledbackException:
Transaction rolled back
at org.jboss.as.ejb3.tx.CMTTxInterceptor.handleEndTransactionException(CMTTxInterceptor.java:137)
at org.jboss.as.ejb3.tx.CMTTxInterceptor.endTransaction(CMTTxInterceptor.java:117)
and then starts the timer again (second problem ?) without throw the excption at the end of 10 minutes

I did have the same problem with Wildfly 10.
Since #Schedule is just a short and easy way to package an implicit EJB Timer with a #Timeout method, all other EJB Timer restrictions apply.
I am no expert on EJB, but it seems that there is a default timeout, when the #Schedule method is expected to finish - which seems quite short (i.e. mere minutes).
Additionally - from empirical evidence - it looks like if the timout is reached, it tries to trigger the method again (hence the second run after the first terminates).
I read that you can configure the default timeout directly in JBoss, but this was no option for me - so I did not persue this lead (JBoss transaction timeout setting?).
What I did was setting the timeout for the #Schedule method manually.
Since Wildfly 10 is using EJB3 as default, make sure that you use the EJB3 version of the TransactionTimeout annotation.
If not already implicit, please add the following to your pom.xml:
<dependency>
<groupId>org.jboss.ejb3</groupId>
<artifactId>jboss-ejb3-ext-api</artifactId>
<version>2.2.0.Final</version>
<scope>provided</scope>
</dependency>
Now you can set a timout just for your #Schedule method, like:
import org.jboss.ejb3.annotation.TransactionTimeout:
#Schedule(hour="5", minute="10", dayOfWeek="Mon-Fri",
dayOfMonth="*", month="*", year="*", info="MyTimer", persistent=false)
#TransactionTimeout(value = 23, unit = TimeUnit.HOURS)
private void scheduledTimeout(final Timer t) {

Related

Strange transactional id errors when using the kafka sink

I had a Flink 1.15.1 job configured with
execution.checkpointing.mode='EXACTLY_ONCE'
that was failing with the following error
Sink: Committer (2/2)#732 (36640a337c6ccdc733d176b18adab979) switched from INITIALIZING to FAILED with failure cause: java.lang.IllegalStateException: Failed to commit KafkaCommittable{producerId=4521984, epoch=0, transactionalId=}
...
Caused by: org.apache.kafka.common.config.ConfigException: Invalid value for configuration transactional.id: String must be non-empty
that happened after the first checkpoint was triggered. The strange thing about it is that the KafkaSinkBuilder was used without calling setDeliverGuarantee, and hence the default delivery guarantee was expected to be used, which is NONE 1.
Is that even possible to start with? Shouldn't kafka transactions be involved only when one follows this recipe in 2?
* <p>One can also configure different {#link DeliveryGuarantee} by using {#link
* #setDeliverGuarantee(DeliveryGuarantee)} but keep in mind when using {#link
* DeliveryGuarantee#EXACTLY_ONCE} one must set the transactionalIdPrefix {#link
* #setTransactionalIdPrefix(String)}.
So, in my case, without calling setDeliverGuarantee (nor setTransactionalIdPrefix), I cannot understand why I was seeing these errors. To avoid the problem, I temporarily relaxed the checkpointing settings to
execution.checkpointing.mode='AT_LEAST_ONCE'
but I'd like to understand what was happening.
Like the JavaDoc mentions, if you enable exactly-once, you must set a transactionalIdPrefix. A complete recipe on how-to configure exactly-once with Apache Kafka can be found in this recipe: https://www.docs.immerok.cloud/docs/cookbook/exactly-once-with-apache-kafka-and-apache-flink/
Disclaimer: I work for Immerok

Google Cloud Run pubsub pull listener app fails to start

I'm testing pubsub "pull" subscriber on Cloud Run using just listener part of this sample java code (SubscribeAsyncExample...reworked slightly to fit in my SpringBoot app):
https://cloud.google.com/pubsub/docs/quickstart-client-libraries#java_1
It fails to startup during deploy...but while it's trying to start, it does pull items from the pubsub queue. Originally, I had an HTTP "push" receiver (a #RestController) on a different pubsub topic and that worked fine. Any suggestions? I'm new to Cloud Run. Thanks.
Deploying...
Creating Revision... Cloud Run error: Container failed to start. Failed to start and then listen on the port defined
by the PORT environment variable. Logs for this revision might contain more information....failed
Deployment failed
In logs:
2020-08-11 18:43:22.688 INFO 1 --- [ main] o.s.web.context.ContextLoader : Root WebApplicationContext: initialization completed in 4606 ms
2020-08-11T18:43:25.287759Z Listening for messages on projects/ce-cxmo-dev/subscriptions/AndySubscriptionPull:
2020-08-11T18:43:25.351650801Z Container Sandbox: Unsupported syscall setsockopt(0x18,0x29,0x31,0x3eca02dfd974,0x4,0x28). It is very likely that you can safely ignore this message and that this is not the cause of any error you might be troubleshooting. Please, refer to https://gvisor.dev/c/linux/amd64/setsockopt for more information.
2020-08-11T18:43:25.351770555Z Container Sandbox: Unsupported syscall setsockopt(0x18,0x29,0x12,0x3eca02dfd97c,0x4,0x28). It is very likely that you can safely ignore this message and that this is not the cause of any error you might be troubleshooting. Please, refer to https://gvisor.dev/c/linux/amd64/setsockopt for more information.
2020-08-11 18:43:25.680 WARN 1 --- [ault-executor-0] i.g.n.s.i.n.u.internal.MacAddressUtil : Failed to find a usable hardware address from the network interfaces; using random bytes: ae:2c:fb:e7:92:9c:2b:24
2020-08-11T18:45:36.282714Z Id: 1421389098497572
2020-08-11T18:45:36.282763Z Data: We be pub-sub'n in pull mode2!!
Nothing else after this and the app stops running.
#Component
public class AndyTopicPullRecv {
public AndyTopicPullRecv()
{
subscribeAsyncExample("ce-cxmo-dev", "AndySubscriptionPull");
}
public static void subscribeAsyncExample(String projectId, String subscriptionId) {
ProjectSubscriptionName subscriptionName =
ProjectSubscriptionName.of(projectId, subscriptionId);
// Instantiate an asynchronous message receiver.
MessageReceiver receiver =
(PubsubMessage message, AckReplyConsumer consumer) -> {
// Handle incoming message, then ack the received message.
System.out.println("Id: " + message.getMessageId());
System.out.println("Data: " + message.getData().toStringUtf8());
consumer.ack();
};
Subscriber subscriber = null;
try {
subscriber = Subscriber.newBuilder(subscriptionName, receiver).build();
// Start the subscriber.
subscriber.startAsync().awaitRunning();
System.out.printf("Listening for messages on %s:\n", subscriptionName.toString());
// Allow the subscriber to run for 30s unless an unrecoverable error occurs.
// subscriber.awaitTerminated(30, TimeUnit.SECONDS);
subscriber.awaitTerminated();
System.out.printf("Async subscribe terminated on %s:\n", subscriptionName.toString());
// } catch (TimeoutException timeoutException) {
} catch (Exception e) {
// Shut down the subscriber after 30s. Stop receiving messages.
subscriber.stopAsync();
System.out.printf("Async subscriber exception: " + e);
}
}
}
Kolban question is very important!! With the shared code, I would like to say "No". The Cloud Run contract is clear:
Your service must answer to HTTP request. Out of request, you pay nothing and no CPU is dedicated to your instance (the instance is like a daemon when no request is processing)
Your service must be stateless (not your case here, I won't take time on this)
If you want to pull your PubSub subscription, create an endpoint in your code with a Rest controller. While you are processing this request, run your pull mechanism and process messages.
This endpoint can be called by Cloud Scheduler regularly to keep the process up.
Be careful, you have a max request processing timeout at 15 minutes (today, subject to change in a near future). So, you can't run your process more than 15 minutes. Make it resilient to fail and set your scheduler to call your service every 15 minutes

AggregationStrategy warns on timeout all the time

Why does AggregationStrategy implementation always log a warning when it times out? I do not see any exchange/data loss in the aggregation, when this happens.
AggregateProcessor calls this timeout method when the completionTimeout requirement has been met. Any logging of that event could be debug or informational, but shouldn't rise to a warning.
2020-06-25 16:06:54.454 WARN 1 --- [eTimeoutChecker] o.e.s.e.a.ElasticBulkAggregationStrategy : Parallel processing timed out after 1000 millis for number -1. This task will be cancelled and will not be aggregated.
Here is the aggregate portion of my route.
from (direct:...)
...
.aggregate(constant(true)).id("aggregator"+id)
.aggregationStrategyRef("elasticAggregationStrategy")
.completionSize(aggregatorbatchSize)
.completionTimeout(aggregatorbatchTimeout)
.to("seda:aggregatedPayload")
.end()
It is harmless. There is open Jira CAMEL-15244 to remove/reduce severity of the message.
You have following options:
Ignore the warning
Submit PR - please add Jira comment before working on task
Wait until someone resolves that

Flink1.5.4 exception: Corrupt stream, found tag: 105

My program wants to join two streams without Flink Window.
I connect two streams and define a class A extends RichCoFlatMapFunction to handle them.
In class A, I use a Guava cache to hold all the data from flatmap1/2 method, and join them by a tag from streams.
Then Guava cache has a remove listener to collect joined&expired data to next Flink Function.
private synchronized void collect(ReqFeatures features) {
feaCollector.collect(features);
}
Each time at the beginning, it runs well, but a few hours later, it's always dead because of this exception.
java.io.IOException: Corrupt stream, found tag: 105
at org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.deserialize(StreamElementSerializer.java:220)
at org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.deserialize(StreamElementSerializer.java:49)
at org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55)
at org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:106)
at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:172)
at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:104)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:306)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:712)
at java.lang.Thread.run(Thread.java:748)
And sometimes there's another error log:
java.lang.IllegalStateException: When there are multiple buffers, an unfinished bufferConsumer can not be at the head of the buffers queue.
at org.apache.flink.util.Preconditions.checkState(Preconditions.java:195)
at org.apache.flink.runtime.io.network.partition.PipelinedSubpartition.pollBuffer(PipelinedSubpartition.java:158)
at org.apache.flink.runtime.io.network.partition.PipelinedSubpartitionView.getNextBuffer(PipelinedSubpartitionView.java:51)
at org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel.getNextBuffer(LocalInputChannel.java:186)
at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:551)
at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:508)
at org.apache.flink.streaming.runtime.io.BarrierTracker.getNextNonBlocked(BarrierTracker.java:94)
at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:209)
at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:104)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:306)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:712)
at java.lang.Thread.run(Thread.java:748)
If I use Flink Window Function instead, this exception doesn't occur.
Why does this exception occur, and how can I resolve it?
I can confirm this also happens in Flink 1.9.1 (albeit for us, it happens when we run flink stop <job-id>)
I fixed the same problem with getting checkpointing lock while collecting output. The users flatMap function already hold the checkpointing lock, so if u collect output in flatMap function could also fix this problem.
in flink's code:
synchronized (checkpointingLock) {
numRecordsIn.inc();
streamOperator.setKeyContextElement1(record);
streamOperator.processElement(record);
}

Occasional, unpredictable exceptions with multi-threaded solrj queries

I have a multi-threaded application using solrj 4. There are a maximum of 25 threads. Each thread creates a connection using HttpSolrServer, and runs one query. Most of the time this works just fine. But occasionally I get the following exception:
Jan 10, 2013 9:29:07 AM org.apache.http.impl.client.DefaultRequestDirector tryConnect
INFO: I/O exception (java.net.NoRouteToHostException) caught when connecting to the target host: Cannot assign requested address
Jan 10, 2013 9:29:07 AM org.apache.http.impl.client.DefaultRequestDirector tryConnect
INFO: Retrying connect
Exception in thread "main" java.util.concurrent.ExecutionException: java.lang.NullPointerException
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
at java.util.concurrent.FutureTask.get(FutureTask.java:111)
I wrote some code to retry the query, if it fails:
while(!querySuccess && queryAttempts<m_MaxQueryAttempts ){
try{
queryAttempts++;
rsp = m_Server.query( query );
querySuccess = true;
}catch(SolrServerException e){
querySuccess = false;
}
}
After one or more retries the query usually works. But sometimes it fails even after 100 retries. Either way, I'd like to understand what the cause of the problem is. Why does it work some of the time? Is it an issue with concurrent access to solr? Apart from this process, I only have one other process that it continually writing to the index using a single connection. The default server settings are below - so I don't think it's because of too many simultaneous connections.
INFO: Creating new http client, config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false
Any suggestions on how to diagnose this would be much appreciated.

Resources