Golang - many in use connections - database

#golang #oracle
Im trying to understand how the Max connection works. Basically I have this db configuration:
params.MinSessions = 5
params.MaxSessions = 6
params.SessionTimeout = 0
params.WaitTimeout = 5 * time.Second
params.SessionIncrement = 0
params.ConnClass = "GOLANGPOOL"
// Connect!
result, err := sql.Open("godror", params.StringWithPassword())
result.SetMaxIdleConns(0)
However I can see 242 connections using sql.DB.Stats:
DB Established Open Conn (use + idle): 242
DB Idle Conn: 0
DB In Use Conn: 242
DB Max Idle Closed: 766
DB Max Idle Time Closed: 0
DB Max Lifetime Closed: 0
DB Max Open Conn: 0
DB Wait Count: 0
DB Wait Duration (sec): 0
How is it possible? The limit shouldn't be 6?
Thanks

In Oracle connections and sessions are different concepts.
A connection is a network connection to the DB, while a session is an
encapsulation of a user's interaction with the DB...
refering to this book, and Relation between Oracle session and connection pool.

Assuming you are using latest version of driver, https://github.com/godror/godror
sql.Open("godror", params.StringWithPassword())
implies standaloneConnection=0 setting.
The stats you are seeing is from go sql connection pool. The go sql calls the driver connect method which inturn tries to get connection from another pool (OCI maintains it due to setting standaloneConnection=0 ).
The max outbound connections hasn't exceeded the params.MaxSessions but its just the go sql connection counter
numOpen, ....
It is ideal you tune the go sql pool settings closer to another pool values so that the go routines just don't block.
You can check the OCI pool stats by using GetPoolStats()
method from godror.Conn and confirm the real number of maximum outbound connections. example here
https://github.com/godror/godror/blob/main/z_test.go

DB In Use Conn: 242
DB Max Idle Closed: 766
The sum is almost 1000, like the value of this default
poolMaxSessions=1000
I think you don’t have 242 simultaneous connections in use. You have a pull of connections and the database will limit the number of simultaneous sessions.
You should check how the sql package handles it (it is open source!) and how the specific driver handles it (also open source!) and if necessary open an issue on the driver project
https://github.com/godror/godror

Related

Azure Databricks Spark DataFrame fails to insert into MS SQL Server using the MS Spark JDBC connector when executor tries fewer than 4,096 records

That's a title and a half, but it pretty much summarises my "problem".
I have an Azure Databricks workspace, and a an Azure Virtual Machine running SQL Server 2019 Developer. They're on the same VNET, and they can communicate nicely with each other. I can select rows very happily from the SQL Server, and some instances of inserts work really nicely too.
My scenario:
I have a spark table foo, containing any number of rows. Could be 1, could be 20m.
foo contains 19 fields.
The contents of foo needs to be inserted into a table on the SQL Server also called foo, in a database called bar, meaning my destination is bar.dbo.foo
I've got the com.microsoft.sqlserver.jdbc.spark connector configured on the cluster, and I connect using an IP, port, username and password.
My notebook cell of relevance:
df = spark.table("foo")
try:
url = "jdbc:sqlserver://ip:port"
table_name = "bar.dbo.foo"
username = "user"
password = "password"
df.write \
.format("com.microsoft.sqlserver.jdbc.spark") \
.mode("append") \
.option("truncate",True) \
.option("url", url) \
.option("dbtable", table_name) \
.option("user", username) \
.option("password", password) \
.option("queryTimeout", 120) \
.option("tableLock",True) \
.option("numPartitions",1) \
.save()
except ValueError as error :
print("Connector write failed", error)
If I prepare foo to contain 10,000 rows, I can run this script time and time again, and it succeeds every time.
As the rows start dropping down, the Executor occasionally tries to process 4,096 rows in a task. As soon as it tries to do 4,096 in a task, weird things happen.
For example, having created foo to contain 5,000 rows and executing the code, this is the task information:
Index Task Id Attempt Status Executor ID Host Duration Input Size/Records Errors
0 660 0 FAILED 0 10.139.64.6 40s 261.3 KiB / 4096 com.microsoft.sqlserver.jdbc.SQLServerException: The connection is closed.
0 661 1 FAILED 3 10.139.64.8 40s 261.3 KiB / 4096 com.microsoft.sqlserver.jdbc.SQLServerException: The connection is closed.
0 662 2 FAILED 3 10.139.64.8 40s 261.3 KiB / 4096 com.microsoft.sqlserver.jdbc.SQLServerException: The connection is closed.
0 663 3 SUCCESS 1 10.139.64.5 0.4s 261.3 KiB / 5000
I don't fully understand why it fails after 40 seconds. Our timeouts are set to 600 seconds on the SQL box, and the query timeout in the script is 120 seconds.
Every time the Executor does more than 4,096 rows, it succeeds. This is true regardless of the size of the dataset. Sometimes it tries to do 4,096 rows on 100k row sets, fails, and then changes the records in the set to 100k and it immediately succeeds.
When the set is smaller than 4,096, the execution will typically generate one message:
com.microsoft.sqlserver.jdbc.SQLServerException: The connection is closed
and then immediately work successfully having moved onto the next executor.
On the SQL Server itself, I see ASYNC_NETWORK_IO as the wait using Adam Mechanic's sp_whoisactive. This wait persists for the full duration of the 40s attempt. It looks like at 40s there's an immediate abandonment of the attempt, and a new connection is created - consistent with the messages I see from the task information.
Additionally, when looking at the statements, I note that it's doing ROWS_PER_BATCH = 1000 regardless of the original number of rows. I can't see any way of changing that in the docs, but I tried rowsPerBatch in the option for the df, but didn't appear to make a difference - still showing the 1000 value.
I've been running this with lots of different amounts of rows in foo - and when the total rows is greater than 4,096 my testing suggests that the spark executor succeeds if it tries a number of records that exceeds 4,096. If I remove the numPartitions, there are more attempts of 4,096 records, and so I see more failures.
Weirdly, if I cancel a query that appears to be running for longer than 10s, and immediately retry it - if the number of rows in foo is != 4,096, it seems to succeed every time. My sample is obviously pretty small - tens of attempts.
Is there a limitation I'm not familiar with here? What's the magic of 4,096?
In discussing this with my friend, we're wondering whether there is some form of implicit type conversions happening in the arrays when they're <4,096 records, which causes delays somehow.
I'm at quite a loss on this one - and wondering whether I just need to check the length of the DF before attempting the transfer - doing an iterative cursor in PYODBC for fewer rows, and sticking to the JDBC connector for larger numbers of rows. It seems like it shouldn't be needed!
Many thanks,
Johan

Operational Error: An Existing connection was forcibly closed by the remote host. (10054)

Am getting this Operational Error, periodically probably when the application is not active or idle for long hours. On refreshing the page it will vanish. Am using mssql pyodbc connection string ( "mssql+pyodbc:///?odbc_connect= ...") in Formhandlers and DbAuth of gramex
How Can I keep the connection alive in gramex?
Screenshot of error
Add pool_pre_ping and pool_recycle parameters.
pool_pre_ping will normally emit SQL equivalent to “SELECT 1” each time a connection is checked out from the pool; if an error is raised that is detected as a “disconnect” situation, the connection will be immediately recycled. Read more
pool_recycle prevents the pool from using a particular connection that has passed a certain age. Read more
eg: engine = create_engine(connection_string, encoding='utf-8', pool_pre_ping=True, pool_recycle=3600)
Alternatively, you can add these parameters for FormHandler in gramex.yaml. This is required only for the first FormHandler with the connection string.
kwargs:
url: ...
table: ...
pool_pre_ping: True
pool_recycle: 60

Azure SQL - Timeout reached

I have a capacity test running on a S4 Azure database, and I am getting this error:
Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.
I have 500 "users" hitting my site. My connectionstring is this
Server=tcp:database.net,1433;Initial Catalog=database-prd;Persist Security Info=False;User ID=username;Password=password;MultipleActiveResultSets=False;Encrypt=True;TrustServerCertificate=False;Connection Timeout=30;
I have checked my code and what I do is:
Using "using"
using (SqlConnection connection = new SqlConnection(_connectionString))
{
connection.Open();
//... logic
}
Scoped repository
serviceCollection.AddScoped<IRepository, SqlServerRepository>();
I am now thinking of the default Max Pool Size. I haven't set it on the connectionstring. Should I do this? I have an S4, and the properties is:
Max concurrent sessions: 4800
Max concurrent Workers (requests): 200
according to this: https://learn.microsoft.com/en-us/azure/sql-database/sql-database-dtu-resource-limits-single-databases#standard-service-tier-continued
What should I set the pool size to? Does it even matter? As I have understand it "Max Pool Size" is a client side and it defaults to 100. I could try to raise it a bit to maybe 500 or 800.
Where it max out is on some pretty simple selects
select p1,p2,p3 from baskets where Id=1234
and the same for the lines. Not to complex. The only complex query I have has like 4 or 5 joins, but it is not hit that much.
Do anyone here have some points to Max Pool Size? Does it even matter?

MongoDB hidden node still receiving connections

I'm not sure if this question been asked before or if the following behavior of MongoDB is normal. Searching online output no results to this scenario.
Initially, we had a 3 node deployment, 1 Primary, 1 Secondary, and 1 Arbiter.
We wanted to add a ReadOnly replica to the cluster and remove the Arbiter node as well in the process. We added the following to the new node:
priority: 0
hidden: true
votes: 1
And removed the Arbiter in the same reconfiguration process so we always have 3 voting members and it leaves us with 1 Primary and 1 Secondary and 1 ReadOnly Node.
The complete process went through smoothly, however, we still end up seeing connections to the ReadOnly replica.
But when checking via db.currentOp(), no queries show up.
Based on the documentation on MongoDB website,
Hidden members are part of a replica set but cannot become primary and are invisible to client applications.
Is there a way to investigate why there are connections coming in? And if this is normal behavior?
EDIT: (for further clarification)
Assuming the following:
MongoDB A (Primary): 192.168.1.50
MongoDB B (Secondary): 192.168.1.51
MongoDB C (Hidden): 192.168.1.52
Client A: 192.168.1.60
Client B: 192.168.1.61
In the logs, we see the following:
2018-03-12T07:19:11.607+0000 I ACCESS [conn119719] Successfully authenticated as principal SOMEUSER on SOMEDB
2018-03-12T07:19:11.607+0000 I NETWORK [conn119719] end connection 192.168.1.60 (2 connections now open)
2018-03-12T07:19:17.087+0000 I NETWORK [listener] connection accepted from 192.168.1.60:47806 #119720 (3 connections now open)
2018-03-12T07:19:17.371+0000 I ACCESS [conn119720] Successfully authenticated as principal SOMEUSER on SOMEDB
So if the other MongoDB instances were connecting, that would be fine, but my question is regarding why the clients are able to connect even when the hidden option is true and if that behavior is normal.
Thank You

Occasionally retrieving "connection timed out" errors when querying Postgresql

I get this error every so often when utilizing sqlx with pgx, and I believe it's a configuration error on my end and a db concept I'm not grasping:
error: 'write tcp [redacted-ip]:[redacted-port]->[redacted-ip]:[redacted-port]: write: connection timed out
This occurs when attempting to read from the database. I init sqlx in the startup phase:
package main
import (
_ "github.com/jackc/pgx/stdlib"
"github.com/jmoiron/sqlx"
)
//NewDB attempts to connect to the DB
func NewDB(connectionString string) (*sqlx.DB, error) {
db, err := sqlx.Connect("pgx", connectionString)
if err != nil {
return nil, err
}
return db, nil
}
Any structs responsible for interacting with the database have access to this pointer. The majority of them utilize Select or Get, and I understand those return connections to the pool on their own. There are two functions that use Exec, and they only return the result and error at the end of the function.
Other Notes
My Postgres db supports 100 max_connections
I only showed a few active connections at the time of this error
I am not using SetMaxIdleConnections or SetMaxOpenConnections
Refreshing the page and triggering the request again always works
Any tips on what might be happening here?
EDIT: I did not mention this server is on compose.io, which in turn is hosted on AWS. Is it possible AWS turns these connections into zombies because they've been open for so long and the timeout occurs after unsuccessfully trying them one by one?
With the help of some rough calculations, I've set the maximum lifetime of these connections to 10 minutes. I inserted this code into the init function in the question above to limit the number of open connections, idle connections, and to limit to the life of the connection to 30s.
db.SetConnMaxLifetime(time.Duration(30) * time.Second)
db.SetMaxOpenConns(20)
db.SetMaxIdleConns(20)
Hopefully this helps someone else.
SELECT * FROM pg_stat_activity; is great for nailing down connections as well.

Resources