Best way to retry a transaction with Seam & Hibernate - sql-server

I've got a Seam web application working with Seam & Hibernate (JDBC to SQLServer).
It's working well, but under heavy load (stress test with JMeter), I have some LockAcquisitionException or OptimisticLockException.
The LockAquisitionException is caused by a SQLServerException "Transaction (Process ID 64) was deadlock on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction".
I've then written a Seam Interceptor to rerun such transactions for LockAquisitionException :
#AroundInvoke
public Object aroundInvoke(final InvocationContext invocationContext) throws Exception {
if (instanceThreadLocal.get() == null && isMethodInterceptable(invocationContext)) {
try {
instanceThreadLocal.set(this);
int i = 0;
PersistenceException exception = null;
do {
try {
return invocationContext.proceed();
} catch (final PersistenceException e) {
final Throwable cause = e.getCause();
if (!(cause instanceof LockAcquisitionException)) {
throw e;
}
exception = e;
i++;
if (i < MAX_RETRIES_LOCK_ACQUISITION) {
log.info("Swallowing a LockAcquisitionException - #0/#1", i, MAX_RETRIES_LOCK_ACQUISITION);
try {
if (Transaction.instance().isRolledBackOrMarkedRollback()) {
Transaction.instance().rollback();
}
Transaction.instance().begin();
} catch (final Exception e2) {
throw new IllegalStateException("Exception while rollback the current transaction, and begining a new one.", e2);
}
Thread.sleep(1000);
} else {
log.info("Can't swallow any more LockAcquisitionException (#0/#1), will throw it.", i, MAX_RETRIES_LOCK_ACQUISITION);
throw e;
}
}
} while (i < MAX_RETRIES_LOCK_ACQUISITION);
throw exception;
} finally {
instanceThreadLocal.remove();
}
}
return invocationContext.proceed();
}
First question : do you think this interceptor will correctly do the job ?
By googling around and saw that Alfresco (with a forum talk here), Bonita and Orchestra have some methods to rerun such transactions too, and they are catching much more Exceptions, like StaleObjectStateException for instance (the cause of my OptimisticLockException).
My 2nd question follows : for the StaleObjectStateException ("Row was updated or deleted by another transaction (or unsaved-value mapping was incorrect)"), normaly you can't just rerun the transaction, as it's a problem of synchronisation with the database and #Version fields isn't it ? Why Alfresco for instance tries to rerun such Transactions caused by such Exceptions ?
EDIT :
For LockAcquisitionException caused by SQLServerException, I've looked at some some resources on the web, and even if I should double check my code, it seems that it can happend anyway ... here are the links :
An article on the subject (with a comment which says it can happend by running out of resources also)
Another article with sublinks :
Microsoft talking about that on support.microsoft.com
A way to profile transactions
And some advice to reduce such problems
Even Microsoft says "Although deadlocks can be minimized, they cannot be completely avoided. That is why the front-end application should be designed to handle deadlocks."

Actually I finally found how to dodge the famous "Transaction (Process ID 64) was deadlock on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction".
So I will not really answer my question but I will explain what I saw and how I manage to do that.
At first, I thought that I had a "lock escalation problem" which would transform my row locks into page locks and produce my deadlocks (my JMeter test runs on a scenario which does delete / update while selecting rows, but the deletes and updates don't concern necessarily the same rows as the selects).
So I read Lock Escalation in SQL2005 and How to resolve blocking problems that are caused by lock escalation in SQL Server (by MS) and finally Diagnose SQL Server performance issues using sp_lock.
But before trying to detect if I was in a lock escalation situation, I fall on that page : http://community.jboss.org/message/95300. It talks about "transaction isolation" and that SQLServer has a special one which is called "snapshot isolation".
I then found Using Snapshot Isolation with SQL Server and Hibernate and read Using Snapshot Isolation (by MS).
So I first enabled the "snapshot isolation mode" on my database :
ALTER DATABASE [MY_DATABASE]
SET ALLOW_SNAPSHOT_ISOLATION ON
ALTER DATABASE [MY_DATABASE]
SET READ_COMMITTED_SNAPSHOT ON
Then I had to define transaction isolation for JDBC driver to 4096 ... and by reading the book "Hibernate in Action" on paragraph "5.1.6 Setting an isolation level", it reads :
Note that Hibernate never changes the isolation level of connections obtained from a datasource provided by the application server in a managed environment. You may change the default isolation using the configuration of your application server.
So I read Configuring JDBC DataSources (for JBoss 4) and finally edited my database-ds.xml file to add this :
<local-tx-datasource>
<jndi-name>myDatasource</jndi-name>
<connection-url>jdbc:sqlserver://BDDSERVER\SQL2008;databaseName=DATABASE</connection-url>
<driver-class>com.microsoft.sqlserver.jdbc.SQLServerDriver</driver-class>
<user-name>user</user-name>
<password>password</password>
<min-pool-size>2</min-pool-size>
<max-pool-size>400</max-pool-size>
<blocking-timeout-millis>60000</blocking-timeout-millis>
<background-validation>true</background-validation>
<background-validation-minutes>2</background-validation-minutes>
<idle-timeout-minutes>15</idle-timeout-minutes>
<check-valid-connection-sql>SELECT 1</check-valid-connection-sql>
<prefill>true</prefill>
<prepared-statement-cache-size>75</prepared-statement-cache-size>
<transaction-isolation>4096</transaction-isolation>
</local-tx-datasource>
The most important part is of course <transaction-isolation>4096</transaction-isolation>.
And then, I got no more deadlock problem anymore ! ... so my question is now more or less useless for me ... but perhaps someone could have a real answer !

Related

How should I properly handle CommandTimeout when calling ExecuteSqlCommand

I have some long running commands in stored procedures that are at risk of timing out, and I run them using Context.Database.ExecuteSqlCommand
It would appear that when a command times out, it leaves a lock in the database because the transaction is not rolled back.
I found an explanation for that here: CommandTimeout – How to handle it properly?
Based on the linked example I changed my code to:
Database database = Context.Database;
try
{
return database.ExecuteSqlCommand(sql, parameters);
}
catch (SqlException e)
{
//Transactions can stay open after a CommandTimeout,
//so need to rollback any open transactions
if (e.Number == -2) //CommandTimeout occurred
{
//Single rollback exits all levels of nested transactions,
//no need to loop.
database.ExecuteSqlCommand("IF ##TRANCOUNT>0 ROLLBACK TRAN;");
}
throw;
}
However, that threw an exception inside the catch, because the connection is now null:
ArgumentNullException: Value cannot be null.
Parameter name: connection
Following the comments from Annie and usr I changed my code to this:
Database database = Context.Database;
using (var tran = database.BeginTransaction())
{
try
{
int result = database.ExecuteSqlCommand(sql, parameters);
tran.Commit();
return result;
}
catch (SqlException)
{
var debug = database.SqlQuery<Int16>("SELECT ##SPID");
tran.Rollback();
throw;
}
}
I really thought that would do it, but the locks in the database continue to accumulate when I set my CommandTimeout to a really small value to test it out.
I put a breakpoint at the throw, so I know the transaction has been rolled back. The debug variable tells me the session id and when I check my locks using this query: SELECT * FROM sys.dm_tran_locks, I find a match for the session id in the request_session_id, but it's a lock that was already there, not one of the new ones, so I'm a bit confused.
So, how should I properly handle CommandTimeout when using ExecuteSqlCommand to ensure locks are released immediately?
I downloaded sp_whoisactive and ran it, the spid appears to be linked to a query on tables used by Hangfire - I am using Hangfire to run the long running queries in a background process. So, I think that perhaps I am barking up the wrong tree. I did have a problem with locking but I've rewritten my own queries to avoid locking too many rows and I've disabled lock escalation on the tables where I had a problem. These last locks may be coming from Hangfire, and may not be significant, nonetheless I've decided to go with XACT_ABORT ON for now.

Spring Boot/HikariCP #Transactional not overwriting isolation level

My application is experiencing lock contentions on one of our heavily-trafficked tables in our SQL Server database. I have been advised by our DBA team to follow other teams' configuration, which have their default transaction isolation level set to READ_UNCOMMITTED. They then, supposedly, set the isolation level back to READ_COMMITTED for their inserts and updates. I've fought against doing this for a while, as it feels like a cop-out, and I've seen warnings all over the place against using READ_UNCOMMITTED. However, my hands are now being tied.
I'm using Spring Boot, with HikariCP and using Spring Data repositories to interact with my SQL Server database. I'm allowing Spring to auto-configure my DataSource from my application.properties, and have very little other configuration.
I have managed to set my default transaction isolation level as follows in my app properties:
spring.datasource.hikari.transaction-isolation=TRANSACTION_READ_UNCOMMITTED
I've been able to verify that this is working by querying the transaction log, taking the SPID from the transaction entry, and running the following query, which now returns "ReadUncommitted":
SELECT CASE transaction_isolation_level
WHEN 0 THEN 'Unspecified'
WHEN 1 THEN 'ReadUncommitted'
WHEN 2 THEN 'ReadCommitted'
WHEN 3 THEN 'Repeatable'
WHEN 4 THEN 'Serializable'
WHEN 5 THEN 'Snapshot' END AS TRANSACTION_ISOLATION_LEVEL
FROM sys.dm_exec_sessions
where session_id = ##SPID
However, in one of my services, I'm attempting to overwrite the isolation level back to READ_COMMITTED, but it is not taking effect.
Given the following:
Selections from application.properties
spring.datasource.driver-class-name=com.microsoft.sqlserver.jdbc.SQLServerDriver
spring.datasource.type=com.zaxxer.hikari.HikariDataSource
spring.datasource.hikari.transaction-isolation=TRANSACTION_READ_UNCOMMITTED
JpaConfig.java
#Configuration
#EnableJpaRepositories("my.project.repository")
#EntityScan(basePackages = "my.project.model")
#EnableTransactionManagement
public class JpaConfig {
//DataSource configured by Spring from application.properties
}
MyService.java
#Service
public class MyService {
#Autowired private MyRepository myRepository;
#Transactional(isolation = Isolation.READ_COMMITTED)
public void myMethod() {
//Logic and call to myRepository.save()
}
}
MyRepository.java
public interface MyRepository extends JpaRepository<MyClass, Long> {
}
What am I missing? I do not have a custom TransactionManager, as I'm allowing #EnableTransactionManagement to configure that for me, as I've found no indication anywhere that I should be providing my own custom implementation so far.
I have verified that the transaction rollback is properly occurring if an exception is thrown, but I can't figure out why the #Transactional annotation isn't overwriting the isolation level like I'd expect.
For what it's worth, the root problem we're trying to solve are the lock contentions on our SQL Server database. From what I understand, in SQL Server, even SELECTs put a lock on a table (or row?). The DBAs' first suggestion was to add the WITH (NOLOCK) hint to my queries. I can't figure out for the life of me how to cleanly do this without scrapping the use of JPA entirely and using native queries. So, their solution was to use READ_UNCOMMITTED by default, setting READ_COMMITTED explicitly on our write transactions.
from the source code of hikari
final int level = Integer.parseInt(transactionIsolationName);
switch (level) {
case Connection.TRANSACTION_READ_UNCOMMITTED:
case Connection.TRANSACTION_READ_COMMITTED:
case Connection.TRANSACTION_REPEATABLE_READ:
case Connection.TRANSACTION_SERIALIZABLE:
case Connection.TRANSACTION_NONE:
case SQL_SERVER_SNAPSHOT_ISOLATION_LEVEL: // a specific isolation level for SQL server only
return level;
default:
throw new IllegalArgumentException();
}
As you see above you have to give numeric value of transaction level like
spring.datasource.hikari.transaction-isolation=1
All level numeric values listed:
TRANSACTION_NONE = 0;
TRANSACTION_READ_UNCOMMITTED = 1;
TRANSACTION_READ_COMMITTED = 2;
TRANSACTION_REPEATABLE_READ = 4;
TRANSACTION_SERIALIZABLE = 8;
SQL_SERVER_SNAPSHOT_ISOLATION_LEVEl =4096;
transactionIsolation
This property controls the default transaction isolation level of connections returned from the pool. If this property is not specified, the default transaction isolation level defined by the JDBC driver is used. Only use this property if you have specific isolation requirements that are common for all queries. The value of this property is the constant name from the Connection class such as TRANSACTION_READ_COMMITTED, TRANSACTION_REPEATABLE_READ, etc. Default: driver default
ref: https://github.com/brettwooldridge/HikariCP
Make sure you set in your application.properties
spring.jpa.hibernate.connection.provider_class=org.hibernate.hikaricp.internal.HikariCPConnectionProvider
Thanks for the detail you provided in your question. It really helped in clarifying my scenario and you already provided the answer to use the NOLOCK option.
I was able to figure out how to apply the option using a custom dialect and some query adjustments to force the dialect logic to always be used.
We are querying from a sql server database that is a read-only replica of our production database.
In our case certain tables used for looking up user characteristics are totally deleted and recreated. This ripples into a large amount of locking on the sql server replica during the replication process.
We are seeing outliers with worst case query times into the minutes (should be < 10 millisecond). We think this is most likely locking related.
I was able to get the WITH (NOLOCK) to be emitted properly with the following approach:
Create a custom Dialect.
public class ReadOnlySqlServerDialect extends SQLServer2012Dialect {
#Override
public String appendLockHint(LockOptions lockOptions, String tableName) {
// in our case the entire db is a replica and we never do any writes
return tableName + " WITH (NOLOCK)";
}
}
Configure hibernate.dialect to point at ReadOnlySqlServerDialect.class.getName()
Force Queries to use LockModeType.PESSIMISTIC_READ as this bypasses a shield within hibernate and assures that the ReadOnlySqlServerDialect.appendLockHint() method is always called.
return entityMgr.createNamedQuery(UserLog.FIND_BY_EMAIL, UserLog.class)
.setParameter("email", email)
.setLockMode(LockModeType.PESSIMISTIC_READ)
.getSingleResult();
Resulting in SQL generated like this:
select userlog0_.EMAIL, userlog0_.NAME as email0_18_ from APP.USER_LOG userlog0_ WITH (NOLOCK)

Java more than one DB connection in UserTransaction

static void clean() throws Exception {
final UserTransaction tx = InitialContext.doLookup("UserTransaction");
tx.begin();
try {
final DataSource ds = InitialContext.doLookup(Databases.ADMIN);
Connection connection1 = ds.getConnection();
Connection connection2 = ds.getConnection();
PreparedStatement st1 = connection1.prepareStatement("XXX delete records XXX"); // delete data
PreparedStatement st2 = connection2.prepareStatement("XXX insert records XXX"); // insert new data that is same primary as deleted data above
st1.executeUpdate();
st1.close();
connection1.close();
st2.executeUpdate();
st2.close();
connection2.close();
tx.commit();
} finally {
if (tx.getStatus() == Status.STATUS_ACTIVE) {
tx.rollback();
}
}
}
I have a web app, the DAO taking DataSource as the object to create individual connection to perform database operations.
So I have a UserTransaction, inside there are two DAO object doing separated action, first one is doing deletion and second one is doing insertion. The deletion is to delete some records to allow insertion to take place because insertion will insert same primary key's data.
I take out the DAO layer and translate the logic into the code above. There is one thing I couldn't understand, based on the code above, the insertion operation should fail, because the code (inside the UserTransaction) take two different connections, they don't know each other, and the first deletion haven't committed obviously, so second statement (insertion) should fail (due to unique constraint), because two database operation not in same connection, second connection is not able to detect uncommitted changes. But amazingly, it doesn't fail, and both statement can work perfectly.
Can anyone help explain this? Any configuration can be done to achieve this result? Or whether my understanding is wrong?
Since your application is running in weblogic server, the java-EE-container is managing the transaction and the connection for you. If you call DataSource#getConnection multiple times in a java-ee transaction, you will get multiple Connection instances joining the same transaction. Usually those connections connect to database with the identical session. Using oracle you can check that with the following snippet in a #Stateless ejb:
#Resource(lookup="jdbc/myDS")
private DataSource ds;
#TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
#Schedule(hour="*", minute="*", second="42")
public void testDatasource() throws SQLException {
try ( Connection con1 = ds.getConnection();
Connection con2 = ds.getConnection();
) {
String sessId1 = null, sessId2 = null;
try (ResultSet rs1 = con1.createStatement().executeQuery("select userenv('SESSIONID') from dual") ){
if ( rs1.next() ) sessId1 = rs1.getString(1);
};
try (ResultSet rs2 = con2.createStatement().executeQuery("select userenv('SESSIONID') from dual") ){
if ( rs2.next() ) sessId2 = rs2.getString(1);
};
LOG.log( Level.INFO," con1={0}, con2={1}, sessId1={2}, sessId2={3}"
, new Object[]{ con1, con2, sessId1, sessId2}
);
}
}
This results in the following log-Message:
con1=com.sun.gjc.spi.jdbc40.ConnectionWrapper40#19f32aa,
con2=com.sun.gjc.spi.jdbc40.ConnectionWrapper40#1cb42e0,
sessId1=9347407,
sessId2=9347407
Note that you get different Connection instances with same session-ID.
For more details see eg this question
The only way to do this properly is to use a transaction manager and two phase commit XA drivers for all databases involved in this transaction.
My guess is that you have autocommit enabled on the connections. This is the default when creating a new connection, as is documented here
https://docs.oracle.com/javase/tutorial/jdbc/basics/transactions.html
System.out.println(connection1.getAutoCommit());
will most likely print true.
You could try
connection1.setAutoCommit(false);
and see if that changes the behavior.
In addition to that, it's not really defined what happens if you call close() on a connection and haven't issued a commit or rollback statement beforehand. Therefore it is strongly recommended to either issue one of the two before closing the connection, see https://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#close()
EDIT 1:
If autocommit is false, the it's probably due to the undefined behavior of close. What happens if you switch the statements? :
st2.executeUpdate();
st2.close();
connection2.close();
st1.executeUpdate();
st1.close();
connection1.close();
EDIT 2:
You could also try the "correct" way of doing it:
st1.executeUpdate();
st1.close();
st2.executeUpdate();
st2.close();
tx.commit();
connection1.close();
connection2.close();
If that doesn't fail, then something is wrong with your setup for UserTransactions.
Depending on your database this is quite a normal case.
An object implementing UserTransaction interface represents a "logical transaction". It doesn't always map to a real, "physical" transaction that a database engine respects.
For example, there are situations that cause implicit commits (as well as implicit starts) of transactions. In case of Oracle (can't vouch for other DBs), closing a connection is one of them.
From Oracle's docs:
"If the auto-commit mode is disabled and you close the connection
without explicitly committing or rolling back your last changes, then
an implicit COMMIT operation is run".
But there can be other possible reasons for implicit commits: select for update, various locking statements, DDLs, and so on. They are database-specific.
So, back to our code.
The first transaction is committed by closing a connection.
Then another transaction is implicitly started by the DML on the second connection. It inserts non-conflicting changes and the second connection.close() commits them without PK violation. tx.commit() won't even get a chance to commit anything (and how could it? the connection is already closed).
The bottom line: "logical" transaction managers don't always give you the full picture.
Sometimes transactions are started and committed without an explicit reason. And sometimes they are even ignored by a DB.
PS: I assumed you used Oracle, but the said holds true for other databases as well. For example, MySQL's list of implicit commit reasons.
If auto-commit mode is disabled and you close the connection
without explicitly committing or rolling back your last changes,
then an implicit COMMIT operation is executed.
Please check below link for details:
http://in.relation.to/2005/10/20/pop-quiz-does-connectionclose-result-in-commit-or-rollback/

SQL deadlock in ColdFusion thread

I'm trying to figure out why I would be getting a deadlock error when executing a simple query inside a thread. I'm running CF10 with SQL Server 2008 R2, on a Windows 2012 server.
Once per day, I've got a process that caches a bunch of blog feeds in a database. For each blog feed, I create a thread and do all the work in inside it. Sometimes it runs fine with no errors, other times I get the following error in one or more of the threads:
[Macromedia][SQLServer JDBC Driver][SQLServer]Transaction (Process ID
57) was deadlocked on lock resources with another process and has been
chosen as the deadlock victim. Rerun the transaction.
This deadlock condition happens when I run a query that sets a flag indicating that the feed is being updated. Obviously, this query could happen concurrently with other threads that are updating other feeds.
From my research, I think I can solve the problem by putting a exclusive named lock around the query, but why would I need to do that? I've never had to deal with deadlocks before, so forgive my ignorance on the subject. How is it possible that I can run into a deadlock condition?
Since there's too much code to post, here's a rough algorithm:
thread name="#createUUID()#" action="run" idBlog=idBlog {
try {
var feedResults = getFeed(idBlog);
if (feedResults.errorCode != 0)
throw(message="failed to get feed");
transaction {
/* just a simple query to set a flag */
dirtyBlogCache(idBlog); /* this is where i get the deadlock */
cacheFeedResults(idBlog, feedResults);
}
} catch (any e) {
reportError(e);
}
}
} /* thread */
This approach has been working well for me.
<cffunction name="runQuery" access="private" returntype="query">
arguments if necessary
<cfset var whatever = QueryNew("a")>
<cfquery name="whatever">
sql
</cfquery>
<cfreturn whatever>
</cffunction>
attempts = 0;
myQuery = "not a query";
while (attempts <= 3 && isQuery(myQuery) == false) {
attempts += 1;
try {
myQuery = runQuery();
}
catch (any e) {
}
}
After all, the message does say to re-run the transaction.

Custom constraint in EF fails, async issue

I have a controller action like this (ASP.NET web api)
public HttpResponseMessage<Component> Post(Component c)
{
//Don't allow equal setup ids in within the same installation when the component is of type 5
if (db.Components.Any(d => d.InstallationId == c.InstallationId && d.SetupId == c.SetupId && d.ComponentTypeId == 5)) return new HttpResponseMessage<Component>(c, HttpStatusCode.Conflict);
db.Components.Add(c);
db.SaveChanges();
return new HttpResponseMessage<Component>(c, HttpStatusCode.OK);
}
I send a number of posts request from javascript with 2 of them being equal
{SetupId : 7, InstallationId : 1, ComponentTypeId: 5}
I have verified this both using fiddler and stepping through the code on the server.
However sometimes the constraint that I do above is working as it should and other times it is not. I guess since Post is an async action the request #2 sometimes checks the database for duplicates BEFORE the first request have managed to save to the database.
How can I solve this? Is there a way to lock EF operations to the database from beginning of the post action until the end? Is that even a good idea?
I have though of database restraints, however, since this is only when componenttype is 5 then I'm not sure how to implement that. Or if it's even possible.
This is quite difficult to achieve with EF. In normal SQL you would start transaction and add some table hint to your constraint query to force locking records. The problem is EF doesn't support table hints. You cannot force linq or ESQL query to lock record.
Your options are:
Manual locking in your method. Using for example lock will dramatically reduce throughput of your method so you will most probably need some custom clever implementation locking per those Ids
Using custom SQL or stored procedure instead of LINQ query and force locking. I think UPDLOCK with HOLDLOCK hints should probably work in this case.
Alternatively you can place unique index on your InstallationId, SetupId and ComponentTypeId and simply catch exception if the concurrent request tryes to insert duplicate record. The problem is if that combination must be unique only for some cases but not for other.
I solved this in the database with the help of this answer: https://stackoverflow.com/a/5149263/94394
A conditional constraint allowed in SQL server 2008.
create unique nonclustered index [funcix_Components_setupid_installationid_RecordStatus]
on [components]([setupid], [Installationid])
where [componenttypeid] = 5
Then I catched DbUpdateException and checked if i got a constraint exception error code
try
{
db.Components.Add(c);
db.SaveChanges();
}
catch (DbUpdateException ex) {
Exception innermostException = ex;
while (innermostException.InnerException != null)//Get innermost exception
{
innermostException = innermostException.InnerException;
}
if (((System.Data.SqlClient.SqlException)innermostException).Number == 2601)//Constraint exception id
{
return new HttpResponseMessage<Component>(c, HttpStatusCode.Conflict);
}
}
return new HttpResponseMessage<Component>(c, HttpStatusCode.OK);

Resources