How to resolve DB connection invalidated warning in Airflow Scheduler? - database

I am upgrading our Airflow instance from 1.9 to 1.10.3 and whenever the scheduler runs now I get a warning that the database connection has been invalidated and it's trying to reconnect. A bunch of these errors show up in a row. The console also indicates that tasks are being scheduled but if I check the database nothing is ever being written.
The following warning shows up where it didn't before
[2019-05-21 17:29:26,017] {sqlalchemy.py:81} WARNING - DB connection invalidated. Reconnecting...
Eventually, I'll also get this error
FATAL: remaining connection slots are reserved for non-replication superuser connections
I've tried to increase the SQL Alchemy pool size setting in airflow.cfg but that had no effect
# The SqlAlchemy pool size is the maximum number of database connections in the pool.
sql_alchemy_pool_size = 10
I'm using CeleryExecutor and I'm thinking that maybe the number of workers is overloading the database connections.
I run three commands, airflow webserver, airflow scheduler, and airflow worker, so there should only be one worker and I don't see why that would overload the database.
How do I resolve the database connection errors? Is there a setting to increase the number of database connections, if so where is it? Do I need to handle the workers differently?
Update:
Even with no workers running, starting the webserver and scheduler fresh, when the scheduler fills up the airflow pools the DB connection warning starts to appear.
Update 2:
I found the following issue in the Airflow Jira: https://issues.apache.org/jira/browse/AIRFLOW-4567
There is some activity with others saying they see the same issue. It is unclear whether this directly causes the crashes that some people are seeing or whether this is just an annoying cosmetic log. As of yet there is no resolution to this problem.

This has been resolved in the latest version of Airflow, 1.10.4
I believe it was fixed by AIRFLOW-4332, updating SQLAlchemy to a newer version.
Pull request

Related

Openliberty datasource always uses 1 database connection

I have configured openliberty (version 21) with a database (oracle) connection as follows in the server.xml :
<dataSource jndiName="jdbc/myds" transactional="true">
<connectionManager maxPoolSize="20" minPoolSize="5" agedTimeout="120s" connectionTimeout="10s"/>
<jdbcDriver libraryRef="jdbcLib" />
<properties.oracle URL="jdbc:oracle:thin:#..." user="..." password="..."/>
</dataSource>
The server starts and I can make queries to the database via my rest api but I have noticed that I only use 1 active database connection and parallel http queries result in queuing databases queries over that 1 connection.
I have verified this by monitoring the active open database connections in combination with slow queries (I make several rest calls in parallel). Only 1 connection is opened and 1 query is processes after the other. How do I open a connection pool with for example 5-20 connections for parallel operation.
Based on your described usage, the connection pool should be creating connections as requests come in if there are no connections available in the free pool.
Your connectionTimeout is configured to be 10 seconds. To ensure that your test really is running in parallel would be to make two requests to the server. The server should create a connection, use it, wait 11 seconds, then close the connection.
If your requests are NOT running in parallel, you will not get any exception since the second request won't start until after the first one finished and that would be an issue with your test procedure.
If your requests are running in parallel, and you do not get any exception output from Liberty. Then Liberty likely is making multiple connections and that can be confirmed by enabling J2C trace.
See: https://openliberty.io/docs/21.0.0.9/log-trace-configuration.html
Enable: J2C=ALL
If your requests are running in parallel, and no more than one connection is being created, then you will get a ConnectionWaitTimeoutException. This could be caused by the driver not being able to create more than one connection, incorrect use of the Oracle Connection Pool (UCP), or a number of other factors. I would need more information to debug that issue.

Couldn't commit processed log positions with the source database due to a concurrent connector shutdown or restart

I am receiving the mentioned warning continuously from Debezium Connector for SQL Server, that I am running by using connect-standalone. At a certain moment, I have tried to start two concurent connectors that connect to the same database, but that was yesterday, and after that I restarted this connector several times, and the other connector is stopped, so I don't know where is that information persisted, and why, because, at the moment only one connector is running, so this shouldn't be logged. As CDC does not work, this seems like a problem, regardless of the fact that it is logged as only a warning, because no (other) error is logged but such lines:
WARN Couldn't commit processed log positions with the source database due to a concurrent connector shutdown or restart (io.debezium.connector.common.BaseSourceTask:238)

What does `test-on-borrow` do?

What does it do? How does it work? Why am I supposed to test the database connection before "borrowing it from the pool"?
I was not able to find any related information as to why I should be using it. Just how to use it. And it baffles me.
Can anyone provide some meaningful definition and possibly resources to find out more?
"test-on-borrow" indicates that a connection from the pool has to be validated usually by a simple SQL validation query defined in "validationQuery". These two properties are commonly used in conjunction to make sure that the current connections in the pool are not stale (no longer connected to the DB actively as a result of a DB restart, or timeouts enforced by the DB, or whatever other reason that might cause stale connections). By testing the connections on borrow, the application can automatically reconnect to the DB using new connections (and dropping the invalid ones) without a manual restart of the app and thus preventing DB connection errors in the app.
You can find more information on jdbc connection pool attributes here:
https://tomcat.apache.org/tomcat-8.0-doc/jdbc-pool.html#Common_Attributes

Error 17886 - The server will drop the connection

We are running a website on a vps server with sql server 2008 x64 r2. We are being bombarded with 17886 errors - namely:
The server will drop the connection, because the client driver has
sent multiple requests while the session is in single-user mode. This
error occurs when a client sends a request to reset the connection
while there are batches still running in the session, or when the
client sends a request while the session is resetting a connection.
Please contact the client driver vendor.
This causes sql statements to return corrupt results. I have tried pretty much all of the suggestions I have found on the net, including:
with mars, and without.
with pooling and without
with async=true and without
we only have one database and it is absolutely multi-user.
Everything has been installed recently so it is up to date. They may be correlated with high cpu (though not exclusively according to the monitors I have seen). Also correlated with high request rates from search engines. However, high cpu/requests shouldn't cause sql connections to reset - at worst we should have high response times or iis refusing to send response.
Any suggestions? I am only a developer not dba - do i need a dba to solve this problem?
Not sure but some of your queries might cause deadlocks on the server.
At the point you detect this error again
Open Management Studio (on the server, install it if necessary)
Open a new query window
Run sp_who2
Check the blkby column which is short for Blocked By. If there is any data in that column you have a deadlock problem (Normally it should be like the screenshot I attached, completely empty).
If you have a deadlock then we can continue with next steps. But right now please check that.
To fix the error above, ”MultipleActiveResultSets=True” needs to be added to the connection string.
via Event ID 17886 MSSQLServer – The server will drop the connection
I would create an eventlog task to email you whenever 17886 is thrown. Then go immediately to the db and execute the sp_who2, get the blkby spid and run a dbcc inputbuffer. Hopefully the eventinfo will give you something a bit more tangible to go on.
sp_who2
DBCC INPUTBUFFER(62)
GO
Use a "Instance Per Request" strategy in your DI-instantiation code and your problem will be solved
Most probably you are using dependency injection. During web development you have to take into account the possibility of concurrent requests. Therefor you have to make sure every request gets new instances during DI, otherwise you will get into concurrency issues. Don't be cheap by using ".SingleInstance" for services and contexts.
Enabling MARS will probably decrease the number of errors, but the errors that are encountered will be less clear. Enabling MARS is always never the solution, do not use this unless you know what you're doing.

Can jdbc connections be recovered?

Can jdbc connections which are closed due to database un-availability be recovered.
To give back ground I get following errors in sequence. It doesn't look to be manual re-start. The reason for my question is that I am told that the app behaved correctly without
the re-start. So if the connection was lost, can it be recovered, after a DB re-start.
java.sql.SQLException: ORA-12537: TNS:connection closed
java.sql.SQLRecoverableException: ORA-01034: ORACLE not available
ORA-27101: shared memory realm does not exist
IBM AIX RISC System/6000 Error: 2: No such file or directory
java.sql.SQLRecoverableException: ORA-01033: ORACLE initialization or shutdown in progress
No. The connection is "dead". Create a new connection.
A good approach is to use a connection pool, which will test if the connection is still OK before giving it to you, and automatically create a new connection if needed.
There are several open source connection pools to use. I've used Apache's JDCP, and it worked for me.
Edited:
Given that you want to wait until the database comes back up if it's down (interesting idea), you could implement a custom version of getConnection() that "waits a while and tries again" if the database doesn't respond.
p.s. I like this idea!
The connection cannot be recovered. What can be done is to failover the connection to another database instance. RAC and data guard installations support this configuration.
This is no problem for read-only transactions. However for transactions that execute DML this can be a problem, especially if the last call to the DB was a commit. In case of a commit the client cannot tell if the commit call completed or not. When did the DB fail; before executing the commit, or after executing the commit (but not sending back the acknowledgment to the client). Only the application has this logic and can do the right thing. If the application after failing over does not verify the state of the last transaction, duplicate transactions are possible. This is a known problem and most of us experienced it buying tickets or similar web transactions.

Resources