SQL Server commit and working set - sql-server

There's a SQL Server instance MSSQLSERVER running on local host in windows 7. I realized that its commit is much larger than its working set. Here’s a comparison between my local instance and another instance MS_MSBI_SSDS running on Windows Server 2008R2.
Local SQL Server
Image PID Hard Faults/sec Commit(KB) Working Set (KB) Sharable(KB)
sqlservr.exe 2380 0 45 615 948 61 992 17 784
Remote SQL Server
Image PID Hard Faults/sec Commit(KB) Working Set (KB) Sharable(KB) Private(KB)
sqlservr.exe 1964 1 6 464 988 5 496 884 40 608 5 456 636
The large amount of commit makes the local machine almost unusable. The commit charge is at 100% when MSSQLSERVER launched. Please notice that there isn’t any particular process running on the local SQL Server. And it has 2 databases (8GB), copied from the remote one.
My questions are
Why the local instance has a large commit when it has only a small working set?
Can I find what have been actualy committed ?
How to decrease its commit charge ?
Might the problem come from McAfee ? I don't have right to modify it due to company policy. What can I do ? Here's a relative post SQLSERVR.EXE High Commit Usage causing a low virtual memory condition.

Related

Why does Instant File Initialisation make my restore slower

I have a Azure VM with the following:
Windows DataCenter 2019
SQL Server 2017 Developer
Virtual drive of 6TB (built up of 12 512GB Premium SSD disks)
112 GB of RAM
16 VCPUS
I have a db that has a data file of approx ~5TB (2TB empty) and log file of approx ~1TB (99% empty).
I have backed this up to Azure blob storage (64 block blobs).
When I restore to my SQL Server with Instant File Initialisation enabled, it takes ~ 40 hours.
I can see the network and disk Throughput really low.
When I disable Instant File Initialisation, it takes ~3hours to zero out the files and then I get good performance on the restore ~1 1/2 hours (on top of the ~3 hours to zero out the files)
Does anyone know why this could be.
My code to restore
restore database [<db_name>] from
url = 'https://.....url_1.bak',
...
url = 'https://.....url_64.bak',
move 'db_log' to 'new log location' -- i am only moving the log file, as the data file's location doesn't change
stats = 1, norecovery;

SQL Server performance slow when query text has a particular number of characters

I'm setting up a website on new app and database servers. Server details are:
Windows 2016 Standard
SQL Server 2017 (RTM) - 14.0.1000.169 (X64)
16GB RAM on SQL and 8GB on app server
4 virtual CPUs on SQL and 1 on app server
VMs running on local servers (i.e. not in the cloud)
TCP/IP protocol is configured for SQL Server, Named Pipes disabled
Web pages are running slow so I've dug in to try and find the cause. What I ended up finding and can't explain is that if I have a simple query, e.g. (this query has the problem but I don't run it on the website):
SELECT 1
The query on its own has no problem, but if I pad it out with spaces so the query text is 676 characters or more (up to 675 runs fine) the execution will magically gain 500ms. If I keep adding spaces to around 1500 characters the performance is mostly slow with a random boost here and there. Adding more spaces to around 2000 characters the query becomes consistently fast again.
Running the query on the SQL server itself there is no problem, only when running remotely. I have tried a simple PowerShell script using SqlCommand on the app server and I've tried SQL Server Management Studio on another machine, both of these are slow. SQL Profiler shows the queries running instantly, Duration is 0 (CPU, Reads and Writes are also 0).
Here are some sample runs, this comes from SQL Management Studio client statistics with the slower instances where the query size are above 675 and the quicker instance below 676.
Client processing time 0 0 0 15 0
Total execution time 15 62 531 546 531
Wait time on server replies 15 62 531 531 531
Getting back to the website, I'm not running "SELECT 1" or padding queries with spaces in the website code. The actual queries being executed from the website are mainly from Entity Framework and because of the way those queries are constructed, the columns listed in the select clause, some queries have joins and where clauses, all those things result in the length of query text reaching that magical 676 character limit and the query runs slow. Here is an actual query from the website:
exec sp_executesql N'SELECT
[Extent1].[Id] AS [Id],
[Extent1].[ID] AS [Id1],
[Extent1].[Name] AS [Name],
[Extent1].[Code] AS [Code],
[Extent1].[DisplayOrder] AS [DisplayOrder],
[Extent1].[ScreenTypeId] AS [ScreenTypeId],
[Extent1].[Exclude] AS [Exclude],
[Extent1].[Author] AS [Author],
[Extent1].[Editor] AS [Editor],
[Extent1].[Created] AS [Created],
[Extent1].[Modified] AS [Modified],
[Extent2].[Id] AS [Id2],
[Extent2].[Name] AS [Name1],
[Extent2].[ScreenUrl] AS [ScreenUrl],
[Extent2].[Author] AS [Author1],
[Extent2].[Editor] AS [Editor1],
[Extent2].[Created] AS [Created1],
[Extent2].[Modified] AS [Modified1]
FROM [dbo].[ProfitCentre] AS [Extent1]
INNER JOIN [dbo].[ScreenType] AS [Extent2] ON [Extent1].[ScreenTypeId] = [Extent2].[Id]
WHERE [Extent1].[ScreenTypeId] = #p__linq__0',N'#p__linq__0 int',#p__linq__0=16
Why are the queries taking so long when the length of the query text changes and how can I fix it?
The issue in this case is due to the way the virtual machines were set up on the host environment. I don't deal with the infrastructure so I'm not sure the specific details but the infrastructure people shifted the VMs around into the appropriate cluster and that resolved the performance.

Connection drop from postgresql on azure virtual machine

I am a bit new to postgresql db. I have done a setup over Azure Cloud for my PostgreSQL DB.
It's Ubuntu 18.04 LTS (4vCPU, 8GB RAM) machine with PostgreSQL 9.6 version.
The problem that occurs is when the connection to the PostgreSQL DB stays idle for some time let's say 2 to 10 minutes then the connection to the db does not respond such that it doesn't fulfill the request and keep processing the query.
Same goes with my JAVA Spring-boot Application. The connection doesn't respond and the query keep processing.
This happens randomly such that the timing is not traceable sometimes it happens in 2 minutes, sometimes in 10 minutes & sometimes don't.
I have tried with PostgreSQL Configuration file parameters. I have tried:
tcp_keepalive_idle, tcp_keepalive_interval, tcp_keepalive_count.
Also statement_timeout & session_timeout parameters but it doesn't change anyway.
Any suggestion or help would be appreciable.
Thank You
If you are setting up PostgreSQL DB connection on Azure VM you have to be aware that there are Unbound and Outbound connections timeouts . According to
https://learn.microsoft.com/en-us/azure/load-balancer/load-balancer-outbound-connections#idletimeout ,Outbound connections have a 4-minute idle timeout. This timeout is not adjustable. For inbound timeou there is an option to change in on Azure Portal.
We run into similar issue and were able to resolve it on client side. We changed Spring-boot default Hikari configuration as follow:
hikari:
connection-timeout: 20000
validation-timeout: 20000
idle-timeout: 30000
max-lifetime: 40000
minimum-idle: 1
maximum-pool-size: 3
connection-test-query: SELECT 1
connection-init-sql: SELECT 1

MS SQL server failover

We have a spring java application that connects to a MS SQL server cluster of 2 nodes (2016 SP2 standard version).
We are testing failover: if a node fails, the application needs 90 seconds before reconnecting to the other node, that will be too much for production.
After reading and reading again the HickaryCP documentation for java, I tried to test this scenario with datagrip: I run a long query (insert a line in a table every 500 ms during 10 minutes) and I get the same issue: the database was unavailable for 90 seconds after 1 node failure.
Maybe the issue is cluster side and not application side...
Is there any SQL server cluster configuration that prevent us to reconnect before 90 seconds?
How can the connection be back before these 90 seconds? is there any caching or default configuration that we should update?
Thanks a lot for your help
EDIT
The test was wrong, I updated in comments the issue I am getting:
it reconnects as soon as the 1st node is back. The issue is after a second failover: no connection can be established then (I wait for the 2 nodes synchronization before the 2nd failover)

Windows Server 2012 with ASP.NET MVC application stops working (ESENT errors)

The solution is an ASP.NET MVC application using E/F hosted in IIS on a Windows Server 2012 R2 Standard VM hosted in a Hyper-V environment. The same VM is running SQL Server 2012.
The hosting environment is hosting 30 other solutions and there is plenty of free disk space and no known disk problems with hosting environment or VM (chkdsk and sfc has been run on VM and did not report any problems).
The problem is that the solution/server stops working for short periods of 5-1o minutes and every time we see event ID 508/533 from ESENT and a message about writing to "C:\Windows\system32\LogFiles\Sum".
A similar message has been seen with sqlsvr but this was solved by giving everyone all rights to C:\Windows\system32\LogFiles\Sum.
When the problem persists, it affects the whole VM and sometimes it is no even possible to connect via remote desktop.
We have seen a high number of open SQL Server connections when the problems occurs and prior to introducing caching for a specific Web API method we were actually able to empty the SQL Server connection pool. Just in case we have changed the connection pool from 100 to 200 connections even though we have not seen this particular problem since we introduced the cache.
All DbContext instances are disposed by "using", an ApiController.Dispose override or a Controller.Dispose override and only one SqlConnection are used (for the logging system).
I suspect the problem to be outside the solution and that the high number of SQL Server connections are related to the fact that SQL Server is unable to write to the disk.
Below is some Windows Event Log excerpts for three recent "break downs" with some additional info about the number of web request prior to the problem and after the server has automatically recovered.
Any suggestions?
web requests during the 10 minutes right before the problem: 1399
web requests during the first 10 minutes after the server has recovered: 1630
18-03-2015 20:07:20 833 MSSQLSERVER
SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\Xxx.mdf] in database [Xxx] (5). The OS file handle is 0x0000000000000A7C. The offset of the latest long I/O is: 0x000003e104e000
18-03-2015 20:07:40 833 MSSQLSERVER
SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\Xxx_log.ldf] in database [Xxx] (5). The OS file handle is 0x0000000000000A8C. The offset of the latest long I/O is: 0x0000007f203000
18-03-2015 20:08:16 533 ESENT
svchost (1740) A request to write to the file "C:\Windows\system32\LogFiles\Sum\Svc.log" at offset 1806336 (0x00000000001b9000) for 4096 (0x00001000) bytes has not completed for 36 second(s). This problem is likely due to faulty hardware. Please contact your hardware vendor for further assistance diagnosing the problem.
18-03-2015 20:17:14 508 ESENT
svchost (1740) A request to write to the file "C:\Windows\system32\LogFiles\Sum\Svc.log" at offset 1806336 (0x00000000001b9000) for 4096 (0x00001000) bytes succeeded, but took an abnormally long time (36 seconds) to be serviced by the OS. This problem is likely due to faulty hardware. Please contact your hardware vendor for further assistance diagnosing the problem.
web requests during the 10 minutes right before the problem: 696
web requests during the first 10 minutes after the server has recovered: 614
19-03-2015 01:17:19 533 ESENT
svchost (1740) A request to write to the file "C:\Windows\system32\LogFiles\Sum\Svc.log" at offset 3067904 (0x00000000002ed000) for 4096 (0x00001000) bytes has not completed for 36 second(s). This problem is likely due to faulty hardware. Please contact your hardware vendor for further assistance diagnosing the problem.
19-03-2015 01:33:02 508 ESENT
svchost (1740) A request to write to the file "C:\Windows\system32\LogFiles\Sum\Svc.log" at offset 3067904 (0x00000000002ed000) for 4096 (0x00001000) bytes succeeded, but took an abnormally long time (983 seconds) to be serviced by the OS. This problem is likely due to faulty hardware. Please contact your hardware vendor for further assistance diagnosing the problem.
19-03-2015 01:33:03 833 MSSQLSERVER
SQL Server has encountered 5 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\Xxx_log.ldf] in database [Xxx] (5). The OS file handle is 0x0000000000000A8C. The offset of the latest long I/O is: 0x000000a389d000
web requests during the 10 minutes right before the problem: 555
web requests during the first 10 minutes after the server has recovered: 784
19-03-2015 03:33:51 833 MSSQLSERVER
SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\Xxx_log.ldf] in database [Xxx] (5). The OS file handle is 0x0000000000000A8C. The offset of the latest long I/O is: 0x000000aa95f000
19-03-2015 03:40:48 533 ESENT
svchost (1740) A request to write to the file "C:\Windows\system32\LogFiles\Sum\Svc.log" at offset 3846144 (0x00000000003ab000) for 4096 (0x00001000) bytes has not completed for 36 second(s). This problem is likely due to faulty hardware. Please contact your hardware vendor for further assistance diagnosing the problem.
19-03-2015 03:40:48 833 MSSQLSERVER
SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\MSDBLog.ldf] in database [msdb] (4). The OS file handle is 0x0000000000000A90. The offset of the latest long I/O is: 0x00000000108000
19-03-2015 03:40:49 508 ESENT
svchost (1740) A request to write to the file "C:\Windows\system32\LogFiles\Sum\Svc.log" at offset 3846144 (0x00000000003ab000) for 4096 (0x00001000) bytes succeeded, but took an abnormally long time (36 seconds) to be serviced by the OS. This problem is likely due to faulty hardware. Please contact your hardware vendor for further assistance diagnosing the problem.
19-03-2015 03:40:49 17894 MSSQLSERVER
Dispatcher (0x1a88) from dispatcher pool 'XE Engine main dispatcher pool' Worker 0x00000000F03B8160 appears to be non-yielding on Node 0. Approx CPU Used: kernel 0 ms, user 0 ms, Interval: 336140.
Disk I/O problems was my initial thought but the "funny" thing is that it actually never has happened during peak hours and that the server during peak hours is not stressed on CPU or disk I/O.
I cannot find any VM disk errors. I have no access to the hosting environment but I am told that there are no disk problems. The hosting environment is performing VM backups and if this is the problem, there is nothing to do about it, as it is required. I might try to have the VM moved to another disk but I do not know if this is possible.
Currently we have set up some detailed disk I/O monitoring on the VM and hopefully this will give us some information about the problem but I rather doubt it.
Maybe the VM is just "sick" and the next step might be to create a new one from scratch…
It sounds like your disk is just plain overloaded, since I/Os are taking so long. Ideally they should take around 10 milliseconds. Instead, they're taking over 1000x that long.
Since you're running in a VM, though, tracking down the problem can be a bit more tricky. Is it due to the I/O load in the virtual machine, or on the host? Your VM disk may be shared with other I/O load of the host.
Can you move the database to a different volume in the VM, hosted on a different physical spindle of the host?
Another possibility is that the underlying storage is going bad, and the I/Os are being retried by the underlying hardware.
-martin

Resources