Scaling Postgres database to handle parallel requests - database

I am using Postgres Database as primary database with Rails application. Currently the configuration for the database is: CPU - 8 cores, RAM - 32 GB, hosted on AWS. However I need to scale this server as I am facing a lot of serious bottlenecks.
I need to be able to handle 10000 parallel requests to this server. Even if this is not achievable I at-least need to know what would be the maximum number that this database can handle. The requests includes complex SELECT and UPDATE queries.
I have changed settings such as max_connections to 10000 and in the rails_config/database.yml the connection_pool is set to 10000.
I am running a rails code which currently runs in 50 parallel threads. The code runs fine for a while until I receive this error:
PG::ConnectionBad: PQsocket() can't get socket descriptor. After this when I log into the postgres server and try to execute something, I get this:
db_production=# select id from product_tbl limit 5;
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
db_production=#
After this restarting the rails code also works fine but only for a while until I again receive the same error.
I looked at the CPU usage while the threads are running. Some observations:
One SELECT query uses all the CPU (shows 100% usage) for one core, the rest are sitting idle at 0%. This can be inferred because when only one process or query is running, only one CPU shows 95-100% usage. Does this mean I will only be able to run 8 queries or threads in parallel as there are only 8 cores?
RAM is kept underutilized. Even when 7 out of 8 queries or threads are running in parallel, the RAM used is only around 3 GB. Can I somehow increase this and accommodate more parallel queries?
For every running thread, this is the configuration: virtual memory usage->751M, resident memory usage->154-158M, shared memory used->149-150M, CPU%->60-100%, MEM%->0.5-0.6%. Is everything right here or I need to tweak some settings to make use of the resources more effectively?
What are the configuration settings I need to change in order to scale the database? What must be the ideal number of threads such that I at-least don't receive the connection error? What are the checks and steps to scale the database to handle large parallel connections? How can I make full use of the available resources?

Related

IIS Server and Azure Server hang due to SQL CPU and memory high utilization and not able to process the request further

I have a services hosted on IIS port for submit some information and when it is calling by thousand of users using mobile app on same time, Server is going to stuck and not able to response for the request.
At this time I observe in task manager that SQL took high utilization and memory approx. 70-75 % CPU and memory .
Due to this we need to restart the SQL server daily in morning and evening .(I know this is bad idea for performance and statistics but server hang up)
I have made the API using .NET framework and SQL server 2012.
Any idea what i can do to handle this issue?
The following methods can usually be taken to solve this problem.
Optimize the code of your application, especially the frequent connection and closing of the database. Each new connection will consume cpu resources, so if there is an idle connection, it is better to actively close it instead of waiting for GC to recycle it. For data queries, use index queries as much as possible, especially when there is a lot of data. I think this applies to you, because you will have a large number of users and user data.
Due to the special design mechanism of SQL Server, 100% of the digital memory on the server will be consumed by default, which will cause performance degradation. You can set max server memory to limit the amount that SQL Server allocates to the buffer pool, which is usually the largest memory consumer.
Effects of min and max server memory
As Lex li said, moving the database to a separate computer is a good way. Especially when there is a lot of user data and data processing. IIS and SQL Server are on the same machine. The server not only has to process requests and responses to applications, but also allocate resources to SQL Server to process queries. It is easy to encounter performance bottlenecks.

Oracle-DB: what are the CPU costs of session connect/disconnect

A generally assessed poor technique is to create an own database session for every atomic DB activity.
You may sometimes encounter such strategies like:
processing a large amount of items in a loop, each processing step in the loop creates a DB session, executes a small set of SQL statements and terminates the session
a polling process checks a SQL result one time a second, each in a new DB session
But what costs are generated by frequently connecting and disconnecting DB session?
The internal recording of database activity (AWR/ASH) has no answer because establishing the DB connection is not a SQL activity.
The superficial practical answer depends how you define 'connection' - is a connection what the app knows as a connection, or is it the network connection to the DB, or is it the DB server process & memory used to do any processing? The theoretical overall answer is that the process of establishing some application context and starting a DB server process with some memory allocation included - and then doing the reverse when the app has finished running SQL statements - is 'expensive'. This was measured in Peter Ramm's answer.
In practice, long running applications that expect to handle a number of users would create a connection pool (e.g. in Node.js or in Python). These remain open for the life of the application. From the application's point of view, getting a connection from the pool to do some SQL is a very quick operation. The initial cost (a few seconds of startup at most) of creating the connection pool can be amortized over the process life of the application.
The number of server processes (and therefore overhead costs) on the database tier can be reduced by additional use of a 'Database Resident Connection Pool'.
These connection pools have other benefits for Oracle in terms of supporting Oracle's High Availability features, often transparently. But that's off topic.
A simple comparison of system load gives a fuzzy hint to the price of connection creation.
Example:
An idle database instance on a single host with 4 older CPU cores (Intel Xeon E312xx, 2,6 GHz)
a external (not on DB host) SQLPlus client which executes a single "SELECT SYSTIMESTMP FROM DUAL" per DB session
Delay between the SQLPlus calls is time so that 1 connection per second is created and destroyed.
6 Threads active each with 1 session creation per second
Result:
with idle database CPU load over 4 CPU nodes is in average 0.22%
with 6 threads creating and destroying sessions each second CPU load is 6.09%
io wait also occurs with 1.07% in average
so in average 5.87% of 4 CPU nodes are allocated by this 6 threads
Equivalent to 23.48% of one CPU node for 6 threads or 3,91% per thread
That means:
Connecting and disconnecting an Oracle DB session once per second costs approximately 4% of a CPU core of DB server.
This value in mind should help to consider if it's worth to change process behavior regarding session creation or not.
p.s.: This does not consider the additional cost of session creation at client side.

How to handle JDBC SQL Server, Sybase for long running and high memory consumption process

I have a batch job querying Sybase and SQL Server databases. This batch job can run for up to 1 day or more. We are running this on a small set of data and no error so far in terms of connection timeout. My questions are
How to handle this long running process? Should I configure a reconnect period so that the connection gets closed and reopened?
How to handle the resultset when it can return back to the client with 1 million records?
EDIT #1:
This sounds like a general question for jdbc but it's not because each database provider has their own options such as fetching size. It's very much up to each provider to support this or not. If Sybase does not support this, it means it will load all results into memory at once.
This is a general question not strictly related to Sybase (SAP) ASE.
If you want the tcp/ip connection not to break then use some keep alive parameters for network connections. If you want to handle network connection breaks then use some connection polling libraries.
You don't have to store the whole result set in your memory. Just read your rows and process them on the fly. If you do want to fetch all 1 million rows before before doing anything with them - then just more memory to the JVM.
According to https://docs.oracle.com/cd/E13222_01/wls/docs90/jdbc_drivers/sybase.html.
We can setFetchSize() to determine the maximum of records to be kept in the memory at one time. If you have enough memory, you can set it to 0. Hence, we can limit the memory allowance for each fetching so that it doesn't blow up our memory.

Rogue Process Filling up Connections in Oracle Database

Last week we updated our DB Password and ever since after every db bounce the connections are getting filled up.
We have 20+ schema and connections to only one Schema gets filled up. Nothing shows up in the sessions. There can be old apps accessing our database with old password and filling up connections.
How to identify how many processes are trying to connect to DB server and how many are failed.
Every time we bounce our db servers connections go through post 1hr no one else can make new connections.
BTW: in our company, we have LOGON and LOGOFF triggers which persist the session connect and disconnect information.
It is quite possible that what you are seeing are recursive sessions created by Oracle when it needs to parse SQL statements [usually not a performance problem, but processes parameter may need to be increased]: ...
for example 1, high values for dynamic_sampling cause more recursive SQL to be generated ;
example 2: I have seen a situation for this application of excessive hard parsing; this will drive up the process count as hard parsing will require new processes to execute parse related recursive SQL (increased the processes parameter in this case since it was a vendor app). Since your issue is related to the bounce, it could be that the app startup requires a lot of parsing.
Example 3:
“Session Leaking” Root Cause Analysis:
Problem Summary: We observed periods where many sessions being created, without a clear understanding of what part of the application is creating them and why.
RCA Approach: Since the DB doesn't persist inactive sessions, I monitored the situation by manually snapshotting v$session.
Analysis:
 I noticed a pattern where multiple sessions have the same process#.
 As per Oracle doc’s, these sessions are recursive sessions created by oracle under an originating process which needs to do recursive SQL to satisfy the query (at parse level). They go away when the process that created them is done and exits.
 If the process is long running, then they will stay around inactive until it is done.
 These recursive sessions don't count against your session limit and the inactive sessions are in an idle wait event and not consuming resources.
 The recursive session are most certainly a result of recursive SQL needed by the optimizer where optimizer stats are missing (as is the case with GTT’s) and the initialization parameter setting of 4 for optimizer_dynamic_sampling .
 The 50,000 sessions in an hour that we saw the other day is likely a result of a couple thousand select statements running (I’ve personally counted 20 recursive sessions per query, but this number can vary).
 The ADDM report showed that the impact is not much:
Finding 4: Session Connect and Disconnect
Impact is .3 [average] active sessions, 6.27% of total activity [currently on the instance].
Average Active Sessions is a measure of database load (values approaching CPU count would be considered high). Your instance can handle up to 32 active sessions, so the impact is about 1/100th of the capacity.

connection timeout

My method executes lots of asynchronous SQL requests and I constantly get connection timeout exceptions. What else can I do except increasing the connection timeout value and proper indexing? I mean with the database part, not with the code part. Can't change the code part. Besides, the application is running fine on different servers, but only I experience those timeout exceptions on my pc and local MS SQL Server 2008 R2 database (which is also on the same PC). So I think this is clearly a performance issue since the connection timeout is already set to 3 minutes. Maybe there is something I can change on the server? Maybe there is a number of simultanious requests constraint? Each of my requests needs clearly less that 3 minutes, but there are about 26 000 of them running asynchroniously, and only I experience those problems on my local PC and local DB.
I've run the process monitor and I see that at the time when my code starts the SQL Server eventually consumes 200 MB of RAM and takes up about a half of CPU processing time. But I still have 1 GB of RAM free, so this is not a memory problem.
I think the number of connection can be the cause. Make sure you close the connection properly or try to reduce the amount of them. You can also use pipes, which will overcome the limitations of usual connections.

Resources