tracking down abandoned / idle connections in postgresql - database

Background Information
I found a bug in some code we have where a function connects to a postgresql database, but then "returns" out of it before closing the database connection.
This issue was only caught because at one point, we had a huge number of concurrent connections that exceeded the MAX_CONNECTIONS value and I found a bunch of "idle" records in the pg_stat_activity table
'Question
I only see these idle connections if I create a load on the database by looping in my script and calling this function a bunch of times...
Meaning, if I use the buggy code that doesn't close the db, and connect once, I don't see any "idle" records in the pg_stat_activity table. Mind you, it takes me a second or two to switch between the window that is running the script and the one that has the psql client running.
So here are my questions.
What's the best way to track idle connections? Am I using the right approach?
After the postgresql session selects the data I've requested and returns it to the client, how long does it wait before killing idle sessions? Or, are these idle records getting killed off when my script has finished running through all its logic?
I've tried using TCP keep alives with very low values just in case that's relevant... and I get the same results.
If my question is not clear enough, please let me know and I will revise.
Thanks

Related

Rogue Process Filling up Connections in Oracle Database

Last week we updated our DB Password and ever since after every db bounce the connections are getting filled up.
We have 20+ schema and connections to only one Schema gets filled up. Nothing shows up in the sessions. There can be old apps accessing our database with old password and filling up connections.
How to identify how many processes are trying to connect to DB server and how many are failed.
Every time we bounce our db servers connections go through post 1hr no one else can make new connections.
BTW: in our company, we have LOGON and LOGOFF triggers which persist the session connect and disconnect information.
It is quite possible that what you are seeing are recursive sessions created by Oracle when it needs to parse SQL statements [usually not a performance problem, but processes parameter may need to be increased]: ...
for example 1, high values for dynamic_sampling cause more recursive SQL to be generated ;
example 2: I have seen a situation for this application of excessive hard parsing; this will drive up the process count as hard parsing will require new processes to execute parse related recursive SQL (increased the processes parameter in this case since it was a vendor app). Since your issue is related to the bounce, it could be that the app startup requires a lot of parsing.
Example 3:
“Session Leaking” Root Cause Analysis:
Problem Summary: We observed periods where many sessions being created, without a clear understanding of what part of the application is creating them and why.
RCA Approach: Since the DB doesn't persist inactive sessions, I monitored the situation by manually snapshotting v$session.
Analysis:
 I noticed a pattern where multiple sessions have the same process#.
 As per Oracle doc’s, these sessions are recursive sessions created by oracle under an originating process which needs to do recursive SQL to satisfy the query (at parse level). They go away when the process that created them is done and exits.
 If the process is long running, then they will stay around inactive until it is done.
 These recursive sessions don't count against your session limit and the inactive sessions are in an idle wait event and not consuming resources.
 The recursive session are most certainly a result of recursive SQL needed by the optimizer where optimizer stats are missing (as is the case with GTT’s) and the initialization parameter setting of 4 for optimizer_dynamic_sampling .
 The 50,000 sessions in an hour that we saw the other day is likely a result of a couple thousand select statements running (I’ve personally counted 20 recursive sessions per query, but this number can vary).
 The ADDM report showed that the impact is not much:
Finding 4: Session Connect and Disconnect
Impact is .3 [average] active sessions, 6.27% of total activity [currently on the instance].
Average Active Sessions is a measure of database load (values approaching CPU count would be considered high). Your instance can handle up to 32 active sessions, so the impact is about 1/100th of the capacity.

How to Update and sync a Database tables at exactly same time?

I need to sync(upload first to remote DB-download to mobile device next) DB tables with remote DB from mobile device (which may insert/update/delete rows from multiple tables).
The remote DB performs other operation based on uploaded sync data.When sync continues to download data to mobile device the remote DB still performing the previous tasks and leads to sync fail. something like 'critical condition' where both 'sync and DB-operations' want access remote Databse. How to solve this issue? is it possible to do sync DB and operate on same DB at a time?
Am using Sql server 2008 DB and mobilink sync.
Edit:
Operations i do in sequence:
1.A iPhone loaded with application which uses mobilink for SYNC data.
2.SYNC means UPLOAD(from device to Remote DB)followed by DOWNLOAD(from Remote DB to device).
3.Remote DB means Consolidated DB ; device Db is Ultralite DB.
4.Remote DB has some triggers to fire when certain tables are updated.
5.An UPLOAD from device to Remote will fire triggers when sync upload finished.
6.Very next moment the UPLOAD finished DOWNLOAD to device starts.
7.Exactly same moment those DB triggers will fire.
8.Now a deadlock between DB SYNC(-DOWNLOAD) and trigger(Update queries included within) operations occur.
9.Sync fails with error saying cannot access some tables.
I did a lots of work around and Google! Came out with a simple(?!) solution for the problem.
(though the exact problem cannot be solved at this point ..i tried my best).
Keep track of all clients who does a sync(kind of user details).
Create a sql job scheduler which contains all the operations to be performed when user syncs.
Announce a "maintenance period" everyday to execute the tasks of sql job with respect to saved user/client sync details.
Here keeping track of client details every time is costlier but much needed!
Remote consolidated DB "completely-updated" only after maintenance period.
Any approaches better than this would be appreciated! all Suggestions are welcome!
My understanding of your system is following:
Mobile application sends UPDATE statement to SQL Server DB.
There is ON UPDATE trigger, that updates around 30 tables (= at least 30 UPDATE statements in the trigger + 1 main update statement)
UPDATEis executed in single transaction. This transaction ends when Trigger completes all updates.
Mobile application does not wait for UPDATE to finish and sends multiple SELECT statements to get data from database.
These SELECTstatements query same tables as the Trigger above is updating.
Blocking and deadlocks occur at some query for some user as Trigger is not completing updates before selects and keeps lock on tables.
When optimizing we are trying make it our processes less easy for computer, achieve same result in less iterations and use less resources or those resources that are more available/less overloaded.
My suggestions for your design:
Use parametrized SPs. Every time SQL Server receives any statement it creates Execution plan. For 1 UPDATE statement with a trigger DB needs at least 31 execution plan. It happens on busy Production environment for every connection every time app updates DB. It is a big waste.
How SPs would help reduce blocking?
Now you have 1 transaction for 31 queries, where locks are issued against all tables involved and held until transaction commits. With SP you'll have 31 small transaction and only 1-2 tables will be locked at a time.
Another question I would like to address: how to do asynchronous updates to your database?
There is a feature in SQL Server called Service Broker. It allows to process message queue (rows from the queue table) automatically: it monitors queue, takes messages from it and does processing you specify and deletes processes messages from the queue.
For example, you save parameters for your SPs - messages - and Service Broker executes SP with parameters.

MS-SQL query time out expired error ( in-case of multi server and multi DB infrastructure )

We are having multi server and multi DB instances in EC2. In one of the Server we have the Main DB(Master DB), but in other servers we have only the Transaction DB's. We are using SQL linked server to connect everything.
Initially there was no problem with my infrastructure. But now as the data load increased, am often getting Time Out expired error even for an normal select query.
Its not for all the processes. If there are 500 processes running in a particular server, in that at-least 200 processes are throwing this Time out expired error.
Recently, I moved all my servers into VPC.
Note:
All my queries will be running only from Master DB because only master DB knows what transaction DB is connected to respective transaction requests. All EC2 instances are in the same region.
Is there a solution for my problem ( time out error exception ). Kindly help me with your suggestions. This is really turning out into an critical business affecting issue.
Error Msg:
Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
Since it worked fine before and started to fail with as load increased, sounds like you are running out or resources (mostly hardware). There are two things to do:
1) Buy more hardware;
2) Tunning your queries to do more work with less (hardware).
The first thing is (maybe) cheaper. To tuning your DB it ill take a lot of time for learning and test out whatever you learn.
I suggest you to get the hard way and try to optimize your queries/isolation level/schema etc.

Failover strategy for database application

I've got a writing and reading database application holding a local cache. In case of an application server fault a backup server shall start working.
The primary and backup application can only run exclusively because of its local cache and some low isolation level on the database.
As far as my communication knowledge goes it is impossible to let both servers always figure out who is allowed to run exclusively.
Can I somehow solve this communication conflict through using the database as a third entity? I think this is a quite typical problem and there might not be a 100% safe method, but I would be happy to know how other people recommend to solve such issues? Or if there is some best practice to this.
It's okay if both application are not working for 30 minutes or so, but there is not enough time to get people out of bed and let them figure out what the problem is.
Can you set up a third server which is monitoring both application servers for health? This server could then decide appropriately in case one of the servers appears to be gone: Instruct the hot standby to start processing.
if i get the picture right, your backup server constantly polls the primary server for data updates, it wouldn't be hard to check if the poll fails, schedule it again for 30s later 3 times and in the third failure dynamically update the DNS entry to the database server to reflect the change in active server. Both Windows DNS and Bind accept dynamic updates signed and unsigned.

Background Worker Process and Connection Timeout

Ok, I was deciding between a thread or BackgroundWorker process and based on the responses from this thread I decided to go with the BackgroundWorker. Here is the thing though, when I started the worker process it stopped half way with a connection timeout error to the database. This is normal when the process is run directly on the DB server (4-5 mins) as I am talking a lot of invoices here. Anyways, I know I can adjust the connection string timeout, but has anyone run into similar issues? What's the average timeout used in these types of scenario?
I was thinking of creating a separate connection with a different timeout specially for this task. This invoice generating task will be ran by one person.
Are you sure this is related to the Connection Timeout and not the Command Timeout?
The time to connect to the database is irrelevant to how heavy is the query. This might indicate a different problem.
If the query is very slow, you should firstly optimize the query and then set the Command Timeout to the expected runtime of the query.

Resources