Last week we updated our DB Password and ever since after every db bounce the connections are getting filled up.
We have 20+ schema and connections to only one Schema gets filled up. Nothing shows up in the sessions. There can be old apps accessing our database with old password and filling up connections.
How to identify how many processes are trying to connect to DB server and how many are failed.
Every time we bounce our db servers connections go through post 1hr no one else can make new connections.
BTW: in our company, we have LOGON and LOGOFF triggers which persist the session connect and disconnect information.
It is quite possible that what you are seeing are recursive sessions created by Oracle when it needs to parse SQL statements [usually not a performance problem, but processes parameter may need to be increased]: ...
for example 1, high values for dynamic_sampling cause more recursive SQL to be generated ;
example 2: I have seen a situation for this application of excessive hard parsing; this will drive up the process count as hard parsing will require new processes to execute parse related recursive SQL (increased the processes parameter in this case since it was a vendor app). Since your issue is related to the bounce, it could be that the app startup requires a lot of parsing.
Example 3:
“Session Leaking” Root Cause Analysis:
Problem Summary: We observed periods where many sessions being created, without a clear understanding of what part of the application is creating them and why.
RCA Approach: Since the DB doesn't persist inactive sessions, I monitored the situation by manually snapshotting v$session.
Analysis:
I noticed a pattern where multiple sessions have the same process#.
As per Oracle doc’s, these sessions are recursive sessions created by oracle under an originating process which needs to do recursive SQL to satisfy the query (at parse level). They go away when the process that created them is done and exits.
If the process is long running, then they will stay around inactive until it is done.
These recursive sessions don't count against your session limit and the inactive sessions are in an idle wait event and not consuming resources.
The recursive session are most certainly a result of recursive SQL needed by the optimizer where optimizer stats are missing (as is the case with GTT’s) and the initialization parameter setting of 4 for optimizer_dynamic_sampling .
The 50,000 sessions in an hour that we saw the other day is likely a result of a couple thousand select statements running (I’ve personally counted 20 recursive sessions per query, but this number can vary).
The ADDM report showed that the impact is not much:
Finding 4: Session Connect and Disconnect
Impact is .3 [average] active sessions, 6.27% of total activity [currently on the instance].
Average Active Sessions is a measure of database load (values approaching CPU count would be considered high). Your instance can handle up to 32 active sessions, so the impact is about 1/100th of the capacity.
Related
We currently run scheduled overnight jobs to sync heavily calculated data into flat tables for use in reports. These processes can take anywhere between 5mins to 2hrs per database, depending on the size. This has been working fine for a long time.
The need we now have is to try to keep the data as up-to-date as possible with our current setup.
I wrote a sync routine that can sync the specific users' data as it gets modified.
The short version is a trigger inserts the userids into a holding table when their records get modified, and there is a job that runs every 10 seconds that checks that table and if a userid is found it then fires the sync for that user and updates the flat table (and then flags the record as complete). This can take anywhere between instant and ~1min depending again on how much data it needs to calculate. Far better than the 24hrs it used to be.
Now onto the 'problem'.
We have upwards of 30 databases that all need to be checked.
Currently, our overnight jobs have a step for each database and run through it in turn.
The problem I can foresee is if customer 1 has a lot of users that are syncing then it'll wait to finish that before moving on to customer 2's database, etc.. by the time you get to customer 30 it could be a relatively long wait before their sync even begins; regardless of how quick the actual sync is. Ie. 2 customers enter data at the same time: As far as customer 1 is concerned, their sync happened in seconds whereas customer 30 took 30 mins before their data updated, even though the sync routine itself only took seconds to complete as it had to wait for 29 other databases to finish their work first.
I have had the idea of changing how we do our scheduled job here and create a job for each database/customer. That way the sync check will run synchronously across all database and no customer at the end of the queue will be waiting for other customers' sync to finish before theirs starts.
Is this a viable solution? Is it a had idea to have 30 jobs checking 30 databases every 10 seconds? Is there another, better option I haven't considered?
Our servers are running Microsoft SQL Server Web and Standard currently, though I believe we may be upgrading to enterprise at some point. (If the version makes a difference to my options here)
A generally assessed poor technique is to create an own database session for every atomic DB activity.
You may sometimes encounter such strategies like:
processing a large amount of items in a loop, each processing step in the loop creates a DB session, executes a small set of SQL statements and terminates the session
a polling process checks a SQL result one time a second, each in a new DB session
But what costs are generated by frequently connecting and disconnecting DB session?
The internal recording of database activity (AWR/ASH) has no answer because establishing the DB connection is not a SQL activity.
The superficial practical answer depends how you define 'connection' - is a connection what the app knows as a connection, or is it the network connection to the DB, or is it the DB server process & memory used to do any processing? The theoretical overall answer is that the process of establishing some application context and starting a DB server process with some memory allocation included - and then doing the reverse when the app has finished running SQL statements - is 'expensive'. This was measured in Peter Ramm's answer.
In practice, long running applications that expect to handle a number of users would create a connection pool (e.g. in Node.js or in Python). These remain open for the life of the application. From the application's point of view, getting a connection from the pool to do some SQL is a very quick operation. The initial cost (a few seconds of startup at most) of creating the connection pool can be amortized over the process life of the application.
The number of server processes (and therefore overhead costs) on the database tier can be reduced by additional use of a 'Database Resident Connection Pool'.
These connection pools have other benefits for Oracle in terms of supporting Oracle's High Availability features, often transparently. But that's off topic.
A simple comparison of system load gives a fuzzy hint to the price of connection creation.
Example:
An idle database instance on a single host with 4 older CPU cores (Intel Xeon E312xx, 2,6 GHz)
a external (not on DB host) SQLPlus client which executes a single "SELECT SYSTIMESTMP FROM DUAL" per DB session
Delay between the SQLPlus calls is time so that 1 connection per second is created and destroyed.
6 Threads active each with 1 session creation per second
Result:
with idle database CPU load over 4 CPU nodes is in average 0.22%
with 6 threads creating and destroying sessions each second CPU load is 6.09%
io wait also occurs with 1.07% in average
so in average 5.87% of 4 CPU nodes are allocated by this 6 threads
Equivalent to 23.48% of one CPU node for 6 threads or 3,91% per thread
That means:
Connecting and disconnecting an Oracle DB session once per second costs approximately 4% of a CPU core of DB server.
This value in mind should help to consider if it's worth to change process behavior regarding session creation or not.
p.s.: This does not consider the additional cost of session creation at client side.
In SQL Server 2014 I open 3 sessions on the same database. In the first session I run Update Statistics A. I time this to take around 1 minute.
In my 2nd and 3rd sessions I run an Update Statistics B (one at a time). Each takes about 1 minute as well.
I then run Update Statistics A on session 1, and Update Statistics B on session 2, both at the same time. Each query finishes in around 1 minute, as expected.
I then run Update Statistics A on window 1, and Update Statistics B on window 3, both at the same time. Each query takes close to 2 minutes now.
I checked sp_who2 and can see 3 distinct sessions here. What could be a possible cause for this?
Also, when I check the query status I noticed in the scenario where I run queries on windows 1 and 3, one status is always running while the other is either runnable or suspended. In the other scenario where I run on windows 1 and 2 both are always running.
Sessions are always different. You can see the SPID in the bottom of the SQL Server Management Studio Window or you can do a:
SELECT ##SPID
to see the session number. SQL Server is a multiple access simultaneous system which manages it own sessions and threads. The number of active sessions is dependent on what the sessions are doing and long a session has been doing something and how much resources the machine that SQL Server is running on - typically more memory/CPUS will result in more active simultaneous sessions. Of course there could be a few heavy sessions that use all the resource of the machine.
Sessions are somewhat isolated but depending on the locking being used the sessions are somewhat, or very, isolated from each other. SQL Server will decide when to temporarily suspend a session so other sessions can have a turn using the CPU/Memory. If a whole bunch of sessions are all updating the same table there may be contention that will slow SQL Server down. What you are asking is how long does it take to drive a 100 miles. Depends of course.
If you have three sessions doing the same thing SQL Server will not fold up the three actions into one. It will do each one (simultaneously if possible) or serially.
Background:
We have number of databases of the similar size and identical schema. All of them have identical settings and are placed on the same instance. Everyone uses an application to access and query databases. Within the application all connection strings are identical (except login and password) for all databases. Many users experience significant slowness when logging into and querying one of our databases, but not the other ones.
Problem:
One of the databases gradually became slower and slower to access. Query execution time is also affected, but not as significantly as the time it takes for the user to log in. Now it takes around 50 seconds to login. For all other databases log-in time is only about 4-5 seconds.
Question:
I would like to compare normal log-in sessions on "healthy databases" to the log-in session on the problematic database. Could you please suggest a way to monitor what exactly happens within the log-in session? I know how to trace queries run against specific database, but I don't know what to look for to find what makes logging in slow. Would either profiler or extended events show such information? Is there any other way to analyse what happens during the time user waits to log in?
You can use the SQL Server Profiler to trace every query sent to the ddbb, and with the ability to filter based on user name, database name, etc.
See https://msdn.microsoft.com/en-us/en-en/library/ms175047.aspx
I would take a look at ddbb indexes and statistics, as these are the areas that could slow your ddbb if are not well maintained.
I have two SQL Server 2005 instances that are geographically separated. Important databases are replicated from the primary location to the secondary using transactional replication.
I'm looking for a way that I can monitor this replication and be alerted immediately if it fails.
We've had occasions in the past where the network connection between the two instances has gone down for a period of time. Because replication couldn't occur and we didn't know, the transaction log blew out and filled the disk causing an outage on the primary database as well.
My google searching some time ago led to us monitoring the MSrepl_errors table and alerting when there were any entries but this simply doesn't work. The last time replication failed (last night hence the question), errors only hit that table when it was restarted.
Does anyone else monitor replication and how do you do it?
Just a little bit of extra information:
It seems that last night the problem was that the Log Reader Agent died and didn't start up again. I believe this agent is responsible for reading the transaction log and putting records in the distribution database so they can be replicated on the secondary site.
As this agent runs inside SQL Server, we can't simply make sure a process is running in Windows.
We have emails sent to us for Merge Replication failures. I have not used Transactional Replication but I imagine you can set up similar alerts.
The easiest way is to set it up through Replication Monitor.
Go to Replication Monitor and select a particular publication. Then select the Warnings and Agents tab and then configure the particular alert you want to use. In our case it is Replication: Agent Failure.
For this alert, we have the Response set up to Execute a Job that sends an email. The job can also do some work to include details of what failed, etc.
This works well enough for alerting us to the problem so that we can fix it right away.
You could run a regular check that data changes are taking place, though this could be complex depending on your application.
If you have some form of audit train table that is very regularly updated (i.e. our main product has a base audit table that lists all actions that result in data being updated or deleted) then you could query that table on both servers and make sure the result you get back is the same. Something like:
SELECT CHECKSUM_AGG(*)
FROM audit_base
WHERE action_timestamp BETWEEN <time1> AND BETWEEN <time2>
where and are round values to allow for different delays in contacting the databases. For instance, if you are checking at ten past the hour you might check items from the start the last hour to the start of this hour. You now have two small values that you can transmit somewhere and compare. If they are different then something has most likely gone wrong in the replication process - have what-ever pocess does the check/comparison send you a mail and an SMS so you know to check and fix any problem that needs attention.
By using SELECT CHECKSUM_AGG(*) the amount of data for each table is very very small so the bandwidth use of the checks will be insignificant. You just need to make sure your checks are not too expensive in the load that apply to the servers, and that you don't check data that might be part of open replication transactions so might be expected to be different at that moment (hence checking the audit trail a few minutes back in time instead of now in my example) otherwise you'll get too many false alarms.
Depending on your database structure the above might be impractical. For tables that are not insert-only (no updates or deletes) within the timeframe of your check (like an audit-trail as above), working out what can safely be compared while avoiding false alarms is likely to be both complex and expensive if not actually impossible to do reliably.
You could manufacture a rolling insert-only table if you do not already have one, by having a small table (containing just an indexed timestamp column) to which you add one row regularly - this data serves no purpose other than to exist so you can check updates to the table are getting replicated. You can delete data older than your checking window, so the table shouldn't grow large. Only testing one table does not prove that all the other tables are replicating (or any other tables for that matter), but finding an error in this one table would be a good "canery" check (if this table isn't updating in the replica, then the others probably aren't either).
This sort of check has the advantage of being independent of the replication process - you are not waiting for the replication process to record exceptions in logs, you are instead proactively testing some of the actual data.