Camel ActiveMQ client blocking, temp storage usage immediately hits 100% - apache-camel

I'm seeing 100% utilisation of activemq's temp storage (configured to be 100mb), and the activemq client is blocking. This 100% usage remains permanently, and I have no idea what's going on
I have a camel route, which consumes from a queue (QUEUE.IN) using the JmsTransactionManager.
public final class RouteUnderTest extends RouteBuilder {
#Override
public void configure() throws Exception {
from("activemq-transacted:QUEUE.IN")
.bean(myBean)
.to("activemq:QUEUE.OUT");
}
}
While processing the message from this queue I'm invoking a spring-integration client (myBean) which is configured as follows
<int:gateway id="myBean" service-interface="MyBean">
<int:method name="request" request-channel="channel"/>
</int:gateway>
<int:chain input-channel="channel">
<int:transformer ref="transformedToJsonHere"/>
<jms:outbound-gateway request-destination-name="QUEUE.MYBEAN"
receive-timeout="5000"
explicit-qos-enabled="true"
time-to-live="5000"
delivery-persistent="false"/>
<int:transformer ref="transformedToAnObjectHere"/>
</int:chain>
My broker is configured to use LevelDB, and with the following usage limits:
<persistenceAdapter>
<levelDB directory="${activemq.data}/leveldb"/>
</persistenceAdapter>
<systemUsage>
<systemUsage>
<memoryUsage>
<memoryUsage percentOfJvmHeap="70"/>
</memoryUsage>
<storeUsage>
<storeUsage limit="500 mb"/>
</storeUsage>
<tempUsage>
<tempUsage limit="100 mb"/>
</tempUsage>
</systemUsage>
</systemUsage>
When my route consumes the message and then attempts to put a non-persistent message on QUEUE.OUT the client is blocked and my broker shows 100% usage of temp storage.
And I see the following activemq logs
2015-07-28 15:44:59,678 | INFO | Usage(default:temp:queue://QUEUE.MYBEAN:temp) percentUsage=0%, usage=104857600, limit=104857600, percentUsageMinDelta=1%;Parent:Usage(default:temp) percentUsage=100%, usage=104857600, limit=104857600, percentUsageMinDelta=1%: Temp Store is Full (0% of 104857600). Stopping producer (ID:orbit-vm-55561-1438094698190-1:1:3:1) to prevent flooding queue://QUEUE.MYBEAN. See http://activemq.apache.org/producer-flow-control.html for more info (blocking for: 1s) | org.apache.activemq.broker.region.Queue | ActiveMQ NIO Worker 6
The queues look like (You can see that the QUEUE.IN message has been not been dequeued because it's still being processed transactionally, and no message has gone to QUEUE.MYBEAN)
I can fix this problem with any one of the following approaches:
Use KahaDB instead of LevelDB
Increase temp storage limit (150MB seems to do it but I haven't experimented a great deal)
Configure tempDataStore in activemq.xml (see below)
When configuring the tempDataStore it looks like:
<tempDataStore>
<bean xmlns="http://www.springframework.org/schema/beans" class="org.apache.activemq.leveldb.LevelDBStore">
<property name="directory" value="${activemq.data}/tmp" />
</bean>
</tempDataStore>
I should add, we were using KahaDB previously and this worked fine, but the upgrade to LevelDB has exposed this issue. Reverting to KahaDB is not an option.
I'm hoping someone could explain what we're seeing here, as the results are really difficult to understand. Why does using LevelDB necessitate a higher temp usage limit?, and why does configuring the tempDataStore explicitly also fix the problem?
I don't fully understand what's going on here so I'm worried that simply increasing the temp usage limit a little will just hide the problem until a later date.
Versions:
ActiveMQ: 5.11.1
Camel: 2.14.0
Spring: 4.0.8.RELEASE
Spring Integration: 4.0.5.RELEASE

We ran into exactly the same issue with ActiveMQ 5.13.2
The solution when using LevelDB is to explicitly configure a dedicated tempDataStore as you did.
If not, the broker uses the same store (LevelDB) for both persistent (persistent usage) and non-persistent messages (temp usage). You may therefore end-up in situations where the broker doesn't accept any non-persistent message anymore just because the store already holds persistent ones up to the configured tempUsage limit. It will however accept persistent ones if your storeUsage limit is set higher...
When using KahaDB, the broker automatically uses another store for the non-persistent messages (created in the tmp directory). So you don't have the problem...
Look at the following code for more indepth information: https://github.com/apache/activemq/blob/activemq-5.13.2/activemq-broker/src/main/java/org/apache/activemq/broker/BrokerService.java#L1739
When reading that code, remember LevelDBStore implements PListStore, but KahaDBStore doesn't...

Related

Strange transactional id errors when using the kafka sink

I had a Flink 1.15.1 job configured with
execution.checkpointing.mode='EXACTLY_ONCE'
that was failing with the following error
Sink: Committer (2/2)#732 (36640a337c6ccdc733d176b18adab979) switched from INITIALIZING to FAILED with failure cause: java.lang.IllegalStateException: Failed to commit KafkaCommittable{producerId=4521984, epoch=0, transactionalId=}
...
Caused by: org.apache.kafka.common.config.ConfigException: Invalid value for configuration transactional.id: String must be non-empty
that happened after the first checkpoint was triggered. The strange thing about it is that the KafkaSinkBuilder was used without calling setDeliverGuarantee, and hence the default delivery guarantee was expected to be used, which is NONE 1.
Is that even possible to start with? Shouldn't kafka transactions be involved only when one follows this recipe in 2?
* <p>One can also configure different {#link DeliveryGuarantee} by using {#link
* #setDeliverGuarantee(DeliveryGuarantee)} but keep in mind when using {#link
* DeliveryGuarantee#EXACTLY_ONCE} one must set the transactionalIdPrefix {#link
* #setTransactionalIdPrefix(String)}.
So, in my case, without calling setDeliverGuarantee (nor setTransactionalIdPrefix), I cannot understand why I was seeing these errors. To avoid the problem, I temporarily relaxed the checkpointing settings to
execution.checkpointing.mode='AT_LEAST_ONCE'
but I'd like to understand what was happening.
Like the JavaDoc mentions, if you enable exactly-once, you must set a transactionalIdPrefix. A complete recipe on how-to configure exactly-once with Apache Kafka can be found in this recipe: https://www.docs.immerok.cloud/docs/cookbook/exactly-once-with-apache-kafka-and-apache-flink/
Disclaimer: I work for Immerok

How to configure checkpointing on an XTDB node using AWS S3

I am using XTDB 1.21.0 deployed on AWS/ECS (Fargate) with checkpoints configured (frequency 30 minutes) and stored on an S3 bucket (RocksDB). After a couple of successful checkpoints, they seem to be constantly failing with an XTDB warning due to an exception in the HTTP request to AWS, as shown below:
This leaves the S3 buckets with incomplete checkpoints (i.e., a Folder containing a set of SSTs and other RocksDB files and no associated EDN index file):
XTDB documentation mentions the fact that an optional S3configurator can be passed to the node configuration and after a bit of Googling around I figured that makeClient should be overridden so that connectionAcquisitionTimeout can be set:
NettyNioAsyncHttpClient.builder()
.maxConcurrency(200)
.connectionAcquisitionTimeout(Duration.ofMillis(20000))
I am not too familiar with NETTY so would appreciate if someone could help with the right incantation.
Also I am configuring the XT node from an EDN file, and haven't figure out how to write a S3 configurator in an EDN file (or if it is even possible).
Thanks in advance!
This can happen for large datasets where the default S3 client used will create a new async request for each object (for which the number of objects may be very large, particularly if using the RockDBs index). Internally it uses the connectionAcquisitionTimeout as a type of backpressure to ensure that incoming requests don't wait indefinitely for a connection from the connection pool, however, in this case we're the only source of these requests and we definitely want the requests to complete before starting the nodes so it's reasonable to set the connectionAcquisitionTimeout to something very high (the default is only 10 seconds). A good choice of limit might be something like the maximum amount of time you want to wait for the node to start before failing.
This appears to be a non-optional parameter of the SDK for what I can only assume is a sensible default strategy for requests coming from an external source, in our case we essentially want it to behave as if it was a synchronous operation.
Configuring this in Clojure with xtdb would look something like this:
(ns foo.db
(:require
[xtdb.api :as xtdb]
[xtdb.checkpoint]
[xtdb.rocksdb]
[xtdb.s3.checkpoint])
(:import
(java.time Duration)
(software.amazon.awssdk.http.nio.netty NettyNioAsyncHttpClient)
(software.amazon.awssdk.services.s3 S3AsyncClient)
(xtdb.checkpoint Checkpointer)
(xtdb.s3 S3Configurator)))
(def s3-configurator
(reify S3Configurator
(makeClient [this]
(.. (S3AsyncClient/builder)
(httpClientBuilder
(.. (NettyNioAsyncHttpClient/builder)
(connectionAcquisitionTimeout
(Duration/ofSeconds 600)) ;; Set a high limit here
;; We can rely on the defaults for maxConcurrency and
;; maxPendingConnectionAcquires
;; (maxConcurrency (Integer. 200))
;; (maxPendingConnectionAcquires (Integer. 10000))
))
(build)))))
(defn start-node!
[]
(xtdb/start-node
{:xtdb/index-store
{:kv-store {:xtdb/module 'xtdb.rocksdb/->kv-store
:db-dir "/var/xtdb/idxs"
:checkpointer {:xtdb/module 'xtdb.checkpoint/->checkpointer
:store {:xtdb/module 'xtdb.s3.checkpoint/->cp-store
:configurator (constantly s3-configurator)
:bucket "checkpoints"}
:approx-frequency "PT3H"}}}}))

Polling byte / large messages from ActiveMQ Artemis using Apache Camel ConsumerTemplate

I'm struggling with a problem in an application based on Apache Camel when connecting to ActiveMQ Artemis via JMS. At the end of one of the Camel routes, messages are stored in an Artemis JMS queue. A legacy component running in the same application picks them up from there periodically using a ConsumerTemplate.
This works fine for Camel messages with plain text bodies, but causes errors when using byte array bodies: It seems Artemis treats any message with byte body as a "large message", which are streamed instead of kept in memory. Receiving via the ConsumerTemplate works, but as soon as the body or headers are accessed, an exception as follows is raised:
org.apache.camel.RuntimeCamelException: Failed to extract body due to: javax.jms.IllegalStateException: AMQ119023: The large message lost connection with its session, either because of a rollback or a closed session. Message: ActiveMQMessage[ID:90c4d1d5-3233-11ea-b0cc-44032c68a56f]:PERSISTENT/ClientLargeMessageImpl[messageID=2974, durable=true, address=mytest,userID=90c4d1d5-3233-11ea-b0cc-44032c68a56f,properties=TypedProperties[firedTime=Wed Jan 08 17:26:03 CET 2020,__AMQ_CID=90b4f34e-3233-11ea-b0cc-44032c68a56f,breadcrumbId=ID-NB045-evolit-co-at-1578500762151-0-1,_AMQ_ROUTING_TYPE=1,_AMQ_LARGE_SIZE=3]]
at org.apache.camel.component.jms.JmsBinding.extractBodyFromJms(JmsBinding.java:172) ~[camel-jms-2.22.1.jar:2.22.1]
at org.apache.camel.component.jms.JmsMessage.createBody(JmsMessage.java:221) ~[camel-jms-2.22.1.jar:2.22.1]
at org.apache.camel.impl.MessageSupport.getBody(MessageSupport.java:54) ~[camel-core-2.22.1.jar:2.22.1]
at org.apache.camel.example.cdi.JmsPoller.someMethod(JmsPoller.java:36) ~[classes/:?]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_171]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_171]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_171]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_171]
at org.apache.camel.component.bean.MethodInfo.invoke(MethodInfo.java:481) ~[camel-core-2.22.1.jar:2.22.1]
at org.apache.camel.component.bean.MethodInfo$1.doProceed(MethodInfo.java:300) ~[camel-core-2.22.1.jar:2.22.1]
at org.apache.camel.component.bean.MethodInfo$1.proceed(MethodInfo.java:273) ~[camel-core-2.22.1.jar:2.22.1]
at org.apache.camel.component.bean.AbstractBeanProcessor.process(AbstractBeanProcessor.java:188) ~[camel-core-2.22.1.jar:2.22.1]
at org.apache.camel.component.bean.BeanProcessor.process(BeanProcessor.java:53) ~[camel-core-2.22.1.jar:2.22.1]
at org.apache.camel.component.bean.BeanProducer.process(BeanProducer.java:41) ~[camel-core-2.22.1.jar:2.22.1]
at org.apache.camel.processor.SendProcessor.process(SendProcessor.java:148) ~[camel-core-2.22.1.jar:2.22.1]
at org.apache.camel.processor.RedeliveryErrorHandler.process(RedeliveryErrorHandler.java:548) [camel-core-2.22.1.jar:2.22.1]
at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:201) [camel-core-2.22.1.jar:2.22.1]
at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:201) [camel-core-2.22.1.jar:2.22.1]
at org.apache.camel.component.timer.TimerConsumer.sendTimerExchange(TimerConsumer.java:197) [camel-core-2.22.1.jar:2.22.1]
at org.apache.camel.component.timer.TimerConsumer$1.run(TimerConsumer.java:79) [camel-core-2.22.1.jar:2.22.1]
at java.util.TimerThread.mainLoop(Timer.java:555) [?:1.8.0_171]
at java.util.TimerThread.run(Timer.java:505) [?:1.8.0_171]
Caused by: javax.jms.IllegalStateException: AMQ119023: The large message lost connection with its session, either because of a rollback or a closed session
at org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl.saveBuffer(LargeMessageControllerImpl.java:273) ~[artemis-core-client-2.6.2.jar:2.6.2]
at org.apache.activemq.artemis.core.client.impl.ClientLargeMessageImpl.saveToOutputStream(ClientLargeMessageImpl.java:115) ~[artemis-core-client-2.6.2.jar:2.6.2]
at org.apache.activemq.artemis.jms.client.ActiveMQMessage.saveToOutputStream(ActiveMQMessage.java:853) ~[artemis-jms-client-2.6.2.jar:2.6.2]
at org.apache.activemq.artemis.jms.client.ActiveMQMessage.setObjectProperty(ActiveMQMessage.java:693) ~[artemis-jms-client-2.6.2.jar:2.6.2]
at org.apache.camel.component.jms.JmsBinding.createByteArrayFromBytesMessage(JmsBinding.java:251) ~[camel-jms-2.22.1.jar:2.22.1]
at org.apache.camel.component.jms.JmsBinding.extractBodyFromJms(JmsBinding.java:163) ~[camel-jms-2.22.1.jar:2.22.1]
... 21 more
Caused by: org.apache.activemq.artemis.api.core.ActiveMQIllegalStateException: AMQ119023: The large message lost connection with its session, either because of a rollback or a closed session
at org.apache.activemq.artemis.core.client.impl.LargeMessageControllerImpl.saveBuffer(LargeMessageControllerImpl.java:273) ~[artemis-core-client-2.6.2.jar:2.6.2]
at org.apache.activemq.artemis.core.client.impl.ClientLargeMessageImpl.saveToOutputStream(ClientLargeMessageImpl.java:115) ~[artemis-core-client-2.6.2.jar:2.6.2]
at org.apache.activemq.artemis.jms.client.ActiveMQMessage.saveToOutputStream(ActiveMQMessage.java:853) ~[artemis-jms-client-2.6.2.jar:2.6.2]
at org.apache.activemq.artemis.jms.client.ActiveMQMessage.setObjectProperty(ActiveMQMessage.java:693) ~[artemis-jms-client-2.6.2.jar:2.6.2]
at org.apache.camel.component.jms.JmsBinding.createByteArrayFromBytesMessage(JmsBinding.java:251) ~[camel-jms-2.22.1.jar:2.22.1]
at org.apache.camel.component.jms.JmsBinding.extractBodyFromJms(JmsBinding.java:163) ~[camel-jms-2.22.1.jar:2.22.1]
... 21 more
The problem also occurs for messages that do not exceed the minLargeMessageSize of Artemis, in a test program even for 3 bytes.
Coincidentally, the same problem occurred in a standalone application used for testing the application. There, I was able to solve the issue by keeping the JMS session and receiver open until the JMS message body and headers were completely read. With Camel, that's abstracted away in the Spring JmsTemplate that Camel is based on.
I consulted the user documentation of the Camel JMS component to find configuration options that might help me. I've tried the following:
eagerLoadingOfProperties=true on consumer side: no effect, only seems to affect MessageListenerContainer. The documentation says:
It uses [...] Spring’s JmsTemplate for sending and a MessageListenerContainer for consuming.
However, while debugging it seemed that MessageListenerContainer is only used when consuming messages from an JMS endpoint in a Camel route. Using a ConsumerTemplate like in my case uses a JmsTemplate for consuming.
messageConverter and mapJmsMessage on consumer side: no effect, they are executed when the session has already been closed
alwaysCopyMessage on producer side: I thought maybe copying prevents use of streamed large messages, no effect
streamMessageTypeEnabled=false on producer side: no effect
jmsMessageType=Bytes on both producer and consumer side: no effect
transferExchange=true on both producer and consumer side: this does seem to solve my specific case, but it feels like a workaround. Documentation advises to use the option with caution.
So right now, transferExchange seems to be my best bet, assuming it really solves my issue in all test cases. Nevertheless, I'd be glad to get better understanding on the issue or different solutions:
Why does Artemis treat small byte array messages as large messages anyway?
Does Camel ConsumerTemplate support streamed large messages at all?
My versions are Camel 2.22.1 and Artemis 2.10.1.
I've been able to reproduce my problem by modifying the Camel Example camel-example-cdi from the release package of Camel to have the minimal classes shown below.
In addition I've added camel-jms and Artemis dependencies and started Artemis locally, both like described in the camel-example-artemis-large-messages example.
public class MyRoutes extends RouteBuilder {
#Override
public void configure() {
setupJmsComponent();
from("timer:writeTimer?period=6000")
.log("writing to JMS")
.setBody(() -> new byte[]{0,1,2})
.to(JmsPoller.ENDPOINT);
from("timer:pollTimer?period=3000")
.to("bean:jmsPoller");
}
private void setupJmsComponent() {
ActiveMQJMSConnectionFactory connectionFactory = new ActiveMQJMSConnectionFactory("tcp://localhost:61616");
JmsComponent jmsComponent = new JmsComponent();
jmsComponent.setConnectionFactory(connectionFactory);
getContext().addComponent("jms", jmsComponent);
}
}
#Singleton
#Named("jmsPoller")
public class JmsPoller {
static final String ENDPOINT = "jms:queue:mytest";
#Inject
private ConsumerTemplate consumerTemplate;
public void someMethod(String body) {
Exchange exchange = consumerTemplate.receive(ENDPOINT, 1000L);
System.out.println("Received " + (exchange == null ? null : exchange.getIn().getBody()));
}
}
ActiveMQ Artemis doesn't treat just any message with a byte body as a "large" message. It's worth noting that the broker ultimately treats all message bodies as an array of bytes because that's exactly what they are. However, in order to be considered "large" the message has to exceed a certain size. The documentation states:
Any message larger than a certain size is considered a large message. Large messages will be split up and sent in fragments. This is determined by the URL parameter minLargeMessageSize.
Note:
Apache ActiveMQ Artemis messages are encoded using 2 bytes per character so if the message data is filled with ASCII characters (which are 1 byte) the size of the resulting Apache ActiveMQ Artemis message would roughly double. This is important when calculating the size of a "large" message as it may appear to be less than the minLargeMessageSize before it is sent, but it then turns into a "large" message once it is encoded.
The default value is 100KiB.
It looks like the application's use-case simply doesn't fit with the semantics of large message support in ActiveMQ Artemis since the session which the message came from is being closed before the message's body is fully received.
Therefore, I recommend that you either keep the session open until the body is read or increase the minLargeMessageSize on the URL of the application which is sending the message so that no messages are ever considered "large." The latter option may result in greater memory usage on the broker since the entire message body will be held in memory at once.

App Engine Java and EclipseLink: Deadlock on shared-cache access

I'm have a Java Maven Google App Engine project configured like follows:
I'm using EclipseLink as JPA persistence-manager for Cloud SQL. My object contains some simple fields (string, date, ...) and a ManyToMany relationship, which is configured as Lazy-Load
#Entity
#Table(name = "mytable")
public class MyObject1 {
private String nome;
private String descrizione;
#ManyToMany
#JoinTable
(
name = "myobject1_has_myobject2",
joinColumns = { #JoinColumn(name = "object1_id", referencedColumnName = "id") },
inverseJoinColumns = { #JoinColumn(name = "object2_id", referencedColumnName = "id") }
)
private List<MyObject2> relationshipObjects;
}
The project flow works like this:
- A query is made that retrieve x results of type MyObject1 (let's say 10 results)
- The query results list is iterated and each result is given to a different Thread for processing
- Each thread iterate the ManyToMany relationship (the relationshipObjects object), which is Lazy and this in confirmed because the code calls IndirectList.iterator, and do some processing for each MyObject2 item of the list
- When all the threads have finished, the query result of MyObject1 is iterated once again to create a request response
This kind of implementation is giving some trouble regarding the multi-thread implementation and a some sort of deadlock.
Here is the stacktrace
Caused by: Exception [EclipseLink-2001] (Eclipse Persistence Services - 2.6.4.v20160829-44060b6): org.eclipse.persistence.exceptions.ConcurrencyException
Exception Description: Wait was interrupted.
Message: [null]
at org.eclipse.persistence.exceptions.ConcurrencyException.waitWasInterrupted(ConcurrencyException.java:108)
at org.eclipse.persistence.internal.helper.ConcurrencyManager.acquireDeferredLock(ConcurrencyManager.java:187)
at org.eclipse.persistence.internal.identitymaps.CacheKey.acquireDeferredLock(CacheKey.java:210)
at org.eclipse.persistence.internal.identitymaps.AbstractIdentityMap.acquireDeferredLock(AbstractIdentityMap.java:84)
at org.eclipse.persistence.internal.identitymaps.IdentityMapManager.acquireDeferredLock(IdentityMapManager.java:146)
at org.eclipse.persistence.internal.sessions.IdentityMapAccessor.acquireDeferredLock(IdentityMapAccessor.java:81)
at org.eclipse.persistence.internal.sessions.AbstractSession.retrieveCacheKey(AbstractSession.java:5200)
at org.eclipse.persistence.internal.descriptors.ObjectBuilder.buildObject(ObjectBuilder.java:965)
at org.eclipse.persistence.internal.descriptors.ObjectBuilder.buildWorkingCopyCloneNormally(ObjectBuilder.java:899)
at org.eclipse.persistence.internal.descriptors.ObjectBuilder.buildObjectInUnitOfWork(ObjectBuilder.java:852)
at org.eclipse.persistence.internal.descriptors.ObjectBuilder.buildObject(ObjectBuilder.java:735)
at org.eclipse.persistence.internal.descriptors.ObjectBuilder.buildObject(ObjectBuilder.java:689)
at org.eclipse.persistence.queries.ObjectLevelReadQuery.buildObject(ObjectLevelReadQuery.java:805)
at org.eclipse.persistence.queries.ReadAllQuery.registerResultInUnitOfWork(ReadAllQuery.java:962)
at org.eclipse.persistence.queries.ReadAllQuery.executeObjectLevelReadQuery(ReadAllQuery.java:573)
at org.eclipse.persistence.queries.ObjectLevelReadQuery.executeDatabaseQuery(ObjectLevelReadQuery.java:1175)
at org.eclipse.persistence.queries.DatabaseQuery.execute(DatabaseQuery.java:904)
at org.eclipse.persistence.queries.ObjectLevelReadQuery.execute(ObjectLevelReadQuery.java:1134)
at org.eclipse.persistence.queries.ReadAllQuery.execute(ReadAllQuery.java:460)
at org.eclipse.persistence.queries.ObjectLevelReadQuery.executeInUnitOfWork(ObjectLevelReadQuery.java:1222)
at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.internalExecuteQuery(UnitOfWorkImpl.java:2896)
at org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1857)
at org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1839)
at org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1804)
at org.eclipse.persistence.internal.jpa.QueryImpl.executeReadQuery(QueryImpl.java:258)
at org.eclipse.persistence.internal.jpa.QueryImpl.getResultList(QueryImpl.java:473)
I cannot reproduce all the time the problem, besides that I can give you all the information I can gather.
Looking inside the EclipseLink documentation I found a section related to this matter
Cache - If using a shared cache, EclipseLink requires locking the cache on reads and writes to ensure consistency.
You will see cache access, such as IdentityMapManager acquireLock or acquireDeferredLock, or WriteLockManager as the last call on the stack.
In my persistence-unit I did not configure the shared-cache behaviour, so it is running on default which is enabled.
Here is my persistence-unit properties
<properties>
<!-- configure the various connection pool properties -->
<!-- http://www.eclipse.org/eclipselink/documentation/2.5/jpa/extensions/p_connection_pool.htm -->
<property name="eclipselink.connection-pool.default.initial" value="1" />
<property name="eclipselink.connection-pool.default.min" value="64" />
<property name="eclipselink.connection-pool.default.max" value="64" />
<property name="eclipselink.connection-pool.default.shared" value="true" />
<!-- whether connections in EclipseLink read connection pool should be shared (not exclusive). Connection sharing means the same JDBC connection will be used concurrently for multiple reading threads. -->
<property name="eclipselink.jdbc.connection_pool.read.shared" value="true" />
<!-- specify if JDBC statements should be cached -->
<!-- http://www.eclipse.org/eclipselink/documentation/2.5/jpa/extensions/p_jdbc_cachestatements.htm -->
<property name="eclipselink.jdbc.cache-statements" value="true" />
<!-- the number of statements held when using internal statement caching -->
<!-- http://www.eclipse.org/eclipselink/documentation/2.5/jpa/extensions/p_jdbc_cachestatements_size.htm#CACBICGG -->
<property name="eclipselink.jdbc.cache-statements.size" value="100" />
</properties>
I can see in my stacktrace that there is indeed the IdentityMapManager.acquireDeferredLock(IdentityMapManager.java:146) row that is referred.
The thing is, this error is thrown by the App Engine request (see last line of stacktrace) when I call the getResultList method.
This call is made by the main request thread, other threads (one for query result) has not been launched yet.
So I started to looking for the shared-cache documentation and I found this part:
The shared cache exists for the duration of the persistence unit (EntityManagerFactory, or server) and is shared by all EntityManagers and users of the persistence unit
My EntityManagerFactory instance is instance-shared (I have a static variabile which is initialized at first query).
So at first access (for each App Engine instance) the variabile is initialized and then shared for all the http requests that will be server by the same instance.
I did this sort of "caching" because the deployment descriptor at the first access of EntityManagerFactory is very slow, and even if I can pre-warmup this object, opening a new EMF at every request costs about 1-2 seconds.
So, I open/close a new EntityManager at each flow (and each thread, because EntityManager is not thread-safe) but the EMF object is shared.
Also, there is another line which says
This is normally related to having relationships that do not use LAZY, ensure all relationship use LAZY.
My ManyToMany relationship is already Lazy, as pointed out before, so even this point cannot be the cause
Basing on that here, I tried to gather all together:
- EclipseLink requires locking the cache on reads and writes to ensure consistency, so the access to this cache is atomic and multiple threads are queued.
- The sharedcache is based on the EMF object
- The EMF object is shared between requests of the same instance
As suggested by EclipseLink documentation I tried to disable the shared-cache and all the flow appaers to works, but it is now very slow.
Anyway, this is another point that confirm the problem here is related to the shared-cache of JPA.
This solution is not suitable because, even w/o considering the speed problem, all those request and threads that concurr of get data from the DBMS (while iterating the Lazy list) consumes all the available connections and the DBMS starts on giving connection errors.
Another suggestion from the documentation
DeferredLockManager.SHOULD_USE_DEFERRED_LOCKS = false;
but the error is still the same, nothing changed (the error is on IdentityMapManager.acquireLock, so the deferredLock is not used anyway)
From the App Engine logs I can see that all these requests are killed after 60s timeout, so the Wait was interrupted message can be related to all the threads that were waiting to access the shared-cache, but at the end the App Engine deadline killed the request.
Because of that I tried to deploy on basic-scaling (which does not have the 60s deadline) to see if the request is only slower than the dealine or it is truly stuck on a deadlock
Inside the logs there is no error... but the longest requests does not even show. At this point I can think that the erroneous requests are stuck indefinitely and the request logs will not be shown at all.
Another test I made is reducing all the persistence-unit configuration, removing all the shared configuration, like follows
<properties>
<property name="eclipselink.connection-pool.default.initial" value="1" />
<property name="eclipselink.connection-pool.default.min" value="64" />
<property name="eclipselink.connection-pool.default.max" value="64" />
</properties>
But the error is still the same. So it is not related to a connection-pool sharing but the multi-thread itself
As ultimate test I tried to remove the multi-thread flow (each query result is processed one-by-one by the main thread) and leaving the shared-cache enabled.
This is working.
At this point I'm wondering... because the shared-cache is synchronized, so there is a "funnel" that block anyway the multi-thread process, should I use the monothread implementation anyway?

Tomcat database connections leak

Hi We are using tomcat 6 and context.xml is like below
<Context>
<Resource defaultAutoCommit="false" defaultReadOnly="false"
driverClassName="oracle.jdbc.driver.OracleDriver"
fairQueue="false" initialSize="${DBPool.initialPoolSize}"
jdbcInterceptors="ConnectionState;StatementFinalizer"
jmxEnabled="true" logAbandoned="false" maxActive="${DBPool.maxPoolSize}"
maxIdle="30" maxWait="30000"
minEvictableIdleTimeMillis="5000" minIdle="${DBPool.minPoolSize}"
name="jdbc/BankDBPool" password="${DBPool.bankPassword}"
removeAbandoned="true" removeAbandonedTimeout="60"
testOnBorrow="false" testOnReturn="true"
testWhileIdle="false" timeBetweenEvictionRunsMillis="5000"
type="javax.sql.DataSource"
url="${DBPool.jdbcUrl}"
factory="uk.co.xxxx.encryption.dbcp.DecryptingBasicDataSourceFactory"
useEquals="false" username="${DBPool.bankUser}" validationInterval="30000" validationQuery="select 1 from dual" />
</Context>
DBPool.maxPoolSize=400
DBPool.minPoolSize=15
DBPool.initialPoolSize=15
The issue is we have to set maxPoolSize to very high as it is giving connection not available Exception.
DB Monitor tool is showing the connections idle but seems they can not be reused. Traffic to this application is very low. Around 10000 hits in a day.
We are trying to figure out what might be an issue here.
All my service methods are all marked
#Transactional(propagation = Propagation.REQUIRED, readOnly = true or false)
DecryptingBasicDataSourceFactory only does job of returning datasource.
We are using spring and hibernate.
The issue was resolved. Apparently one of the method had #Transactional missing. Another change was reducing the batch-size in hibernate properties from 100 to 20. But mostly it was adding #Transactional which fixed the issue.

Resources