db2oledb Performance single vs multi threaded - sql-server

Having an issue with DB2OLEDB performance, using sql server 2017 performing a data load from IBM i 7.3 .
The client is a VMware VM, network settings seem ok and have been tweaked up to the best of my ability (vmxnet3 driver 1.8). Load from other VMs or from www flies at over 100mbits.
Troubleshooting so far:
DB2OLEDB (Microsoft) performs substantially faster (3-5x) than IBMDASQL.
Setting I/O Affinity mask to one core doubles performance, but additional cores have no impact.
RSS is on.
DB2OLEDB inprocess on/off has no effect on throughput but off introduces substantial spool up time at the beginning of each query.
Performance currently around 15 mbit. Same table from another SQL server (cached) loads about 3x faster at 50mbit+ (different provider obviously).
Interestingly, enabling rowcache spikes network throughput at the beginning to 100-150mbits. I.e. I'm inferring that there is plenty of network bandwidth available.
Finally, we are using in-memory table as destination in order to eliminate disk i/o as a culprit.
Cpu is burning up one core and the remaining are at ~20% ish.
Any thoughts?
I suspect that DB2OLEDB driver or some part of COM is the bottleneck at this point.
edit: #MandyShaw (too long for comment) Windows Side. IBM i never breaks 1% for my particular workload and it generaly runs 25%-50% load depending on TOD. SQL statements are varied. Everything from straight four part query to 7 table snowflake as a passthrough. One interesting thing: throughput (network) varies based on row length. Wider tables appear to pump at roughly the same row rate as thinner tables. This is true for both the IBM and Microsoft driver. Reducing network latency had a great impact on performance (see RSC issues with Vmxnet3 driver 1.6.6.0). If I understand correctly the OLEDB drivers fetch one row at a time (except possibly when loading the rowset cache).
In other words, for every row we're issuing a request from SQL server to COM/OLEDB driver to Supervisor Network Driver, to Hypervisor Network driver, to physical NIC, through fiber and landing at the IBM i. Then back again. We have successfully been able to multiplex large table loads using the service broker (but this is impractical for most applications). This, as well as other metrics, suggests that the IBM i has plenty of cpu and bandwidth to spare. The fiber fabric is mostly idle, we've tuned the bejeezus out of the hypervisor (VMware) and the supervisor (tcp/ip stack) as well as the SQL server itself.
This is why I'm looking at the COM/OLEDB provider for answers. Something in this model seems to stink. It's either not configured properly or simply doesn't support multiple threads of execution.
I'm also willing to accept that it's something in SQL server, but for the life of me I can't find a way to make a linked server query run multi-threaded using any combination of configuration, options or hints. Again, it may just not be possible by design.
At this point in time, the few known leads that I have involve (1) tuning network interrupt request coalescing and frequency to minimize interrupts to the OLEDB driver thread and (2) throwing the HIS gateway on a consumer x86 box with a high single core frequency (5ghz) in order to maximize single threaded performance.
These are both shitty options.
If you've got something particular in mind with the EBCIDIC/ASCII conversion performance, I'd be happy to try it and report back. Shoot me a link/info.

Related

SQL Server 2008 sudden high IO Stall and queries dead in water

SQL Server 2008 Enterprise SP4 0.0.6547.0 x64
Running on Windows 2012R2 patched current.
A VM running on Cisco UCM blades and 6.0 Update 3 plus patches.
A Nimble CS700 SAN for the storage.
This is a large OLTP server with 12 vCPU. Normal CPU usage hovers around 6-11%
What happens is that, without warning, the IO Stall times will go through the roof (2000-1000ms) and most queries will stop returning results. Adam Machanic's sp_whoisactive will show dozens of active queries. CPU is at 90+%.
SAN shows almost zero activity and all other VMs on the same SAN are operating optimally.
We see massive blocking as the stalled processes hold blocks, with some timing out and sleeping with blocks hanging on the SPID. Killing the SPIDs in question provides temporary relief, but seconds later we are right back where we started.
The only thing that provides relief is a reboot of the server.
Management is rightly demanding an actual root cause. When this happened last summer, with visibility to the CEO level, we engaged Microsoft support, who were dumbfounded and offered no actual root cause.
What I can't do is upgrade the SQL server. The machine hosts a packaged application and the package publisher refuses to support their software if we implement any newer SQL Server version. I desperately want to go to 2014/2016/2017, and would feel that it would solve this problem and others.
In any event, I searched the bug reports and did not see anything that matched.
Has anyone run into this issue? If so did you suss out a root cause? I have a gut feel that there is a bug in either SQL 2008, Windows 2012R2 or how they interact. But I don't want to write that into the RCA without having some corroboration.
Would appreciate any pointers.
Here is my approach
1.) Try eliminate storage issues.We once had a storage issue(SAN) and root cause seemed to be some HBA.You can further check if your storage is performing with in acceptable limits
You should start with below counters and see if they are less than 15ms
Avg. Disk sec/Read - is the average time, in seconds, of a read of data from the disk.
Avg. Disk sec/Write - is the average time, in seconds, of a write of data to the disk.
There is more info here :https://www.mssqltips.com/sqlservertip/2460/perfmon-counters-to-identify-sql-server-disk-bottlenecks/
2.) Once you have eliminated storage issues, you can further check if SQLSERVER is the only causing IO spikes or if there are any other applications causing IO.You can use resource monitor to find this
3.) If you have reached here, SQLSERVER may be culprit..Go with below steps and try following same sequence and see if problem persists after each step.
Remember HIGH IO can be caused due to
Stale stats and missing indexes:You might not be updating stats regularly or some type of queries might need more frequent index rebuilds/stats update
gather queries causing HIGH IO and try tuning them,you can observe number of reads done and try adding indexes to minimize number of reads
Further Check memory pressure,some times high memory usage can cause Buffer pool flush and there by queries will go to disk..You can look for a counter called PLE and see what is good for your environment
Further research pointed to VMWare. The machine was allocated 304GB of RAM, 264GB of which was assigned to SQL Server. However the underlying host was overcommitted on RAM by a large amount. We suspect thrashing as page life drops, and as other VMs also need real RAM.
Thanks
John.

Dynamics GP 2010 Awful Report Performance

We are running Dynamics GP 2010 on 2 load balanced citrix servers. For the past 3 weeks we have had severe performance hits when users are running Fixed Assets reporting.
The database is large in size, but when I run the reports locally on the SQL server, they run great. The SQL server seems to be performing adequately even when users are seeing slow performance.
Any ideas?
Just because your DB seems un-stressed, it does not mean that it is fine. It could contain other bottlenecks. Typically, if a DB server is not maxing-out its CPUs occasionally, it means there is a much bigger problem.
Standard process for troubleshooting performance problems on a data driven app go like this:
Tune DB indexes. The Tuning Wizard in SSMS is a great starting point. If you haven't tried this yet, it is a great starting point.
Check resource utilization: CPU, RAM. If your CPU is maxed-out, then consider adding/upgrading CPU or optimize code or split your tiers. If your RAM is maxed-out, then consider adding RAM or split your tiers.
Check HDD usage: if your queue length goes above 1 very often (more than once per 10 seconds), upgrade disk bandwidth or scale-out your disk (RAID, multiple MDF/LDFs, DB partitioning).
Check network bandwidth
Check for problems on your app (Dynamics) server
Shared report dictionaries are the bane of reporting in GP. they do tend to slow things down. also, modifying reports becomes impossible as somebody has it open all the time.
use local report dictionaries and have a system to keep them synced with a "master" reports.dic

Buffering question in Microsoft SQL Server

I'd like to know how to configure Microsoft SQL Server to work in the following manner:
All db writes are "write behind", all of the queries operate primarily out of the RAM cache (for speed), i.e. it persists the data to the hard drive at its leisure, in the background.
The reason? Speed. We assume 99.99% reliability of the underlying machine (its an Amazon EC2 instance), so we don't mind caching all of the data in RAM (and even if there is a failure we can just rebuild the database ourselves anyway from 3rd party data sources).
For example:
User 1 writes data packet X to the database.
User 2 queries this same data packet X, 2ms later.
User 2 should see data packet X, as SQL will serve it straight out of its RAM cache (even if data packet X hasn't been persisted to the hard drive).
Data packet X will be persisted to the hard drive at leisure, maybe 500ms later.
If you have large amounts of memory and have set a high Min Memory setting in your SQL Server instance than SQL will attempt to maximize its use.
The Checkpoint process is the thing which forces the dirty pages to be written to disk (which happens automatically but can be forced) so you might want to have a read of the following.
http://msdn.microsoft.com/en-us/library/ms188748.aspx
This subject is quite involved and can be effected by the hardware and solution you are using. For instance Virtualization brings a whole ream of other considerations.

When can I host IIS and SQL Server on the same machine?

I've read that it's unwise to install SQL Server and IIS on the same machine, but I haven't seen any evidence for that. Has anybody tried this, and if so, what were the results? At what point is it necessary to separate them? Is any tuning necessary? I'm concerned specifically with IIS7 and SQL Server 2008.
If somebody can provide numbers showing when it makes more sense to go to two machines, that would be most helpful.
It is unwise to run SQL Server with any other product, including another instance of SQL Server. The reason for this recommendation is the nature of of how SQL Server uses the OS resources. SQL Server runs on a user mode memory management and processor scheduling infrastructure called SQLOS. SQL Server is designed to run at peak performance and assumes that is the only server on the OS. As such the SQL OS reserves all RAM on the machine for SQL process and creates a scheduler for each CPU core and allocates tasks for all schedulers to run, utilizing all CPU it can get, when it needs it. Because SQL reserves all memory, other processes that need memory will cause SQL to see memory pressure, and the response to memory pressure will evict pages from buffer pool and compiled plans from the plan cache. And since SQL is the only server that actually leverages the memory notification API (there are rumors that the next Exchange will too), SQL is the only process that actually shrinks to give room to other processes (like leaky buggy ASP pools). This behavior is also explained in BOL: Dynamic Memory Management.
A similar pattern happens with CPU scheduling where other processes steal CPU time from the SQL schedulers. On high end systems and on Opteron machines things get worse because SQL uses NUMA locality to full advantage, but no other processes are usually not aware of NUMA and, as much as the OS can try to preserve locality of allocations, they end up allocating all over the physical RAM and reduce the overall throughput of the system as the CPUs are idling on waiting for cross-numa boundary page access. There are other things to consider too like TLB and L2 miss increase due to other processes taking up CPU cycles.
So to sum up, you can run other servers with SQL Server, but is not recommended. If you must, then make sure you isolate the two server to your best ability. Use CPU affinity masks for both SQL and IIS/ASP to isolate the two on separate cores, configure SQL to reserve less RAM so that it leaves free memory for IIS/ASP, configure your app pools to recycle aggressively to prevent application pool growth.
Yes, it is possible and many do it.
It tends to be a question of security and/or performance.
Security is questioned as your attack surface is increased on a box that has both. Perhaps not an issue for you.
Performance is questioned as now your server is serving web and DB requests. Again, perhaps not an issue in your case.
Test vs. Production....
Many may feel fine in test environments but not production....
Again, your team's call. I like my test and production environments being as similar as possible if possible but that's my preference.
It's possible, yes.
A good idea for a production environment, no.
The problem that you're going to run in to is that a SQL Server database under substantial load is, more than likely, going to be doing heavy disk I/O and have a large memory footprint. That combination is going to tie up the machine, and you're going to see a performance hit in IIS as it tries to serve up the pages.
It's unwise in certain contexts... totally wise in others.
If your machine is underutilized and won't experience heavy loads, then there is an advantage to installing the database on the same machine, because you simply won't have to transfer anything across the network.
On the other hand, if one or both of IIS or the database will be under heavy load, they will likely start to interfere, and the performance gain of dedicated hardware for each will probably outstrip the loss of having to go over the network.
Don't forget the maintenance issue...you can't reboot/patch one without nuking the other. If they are on two boxes, you could give your users a better experience, than no response from the webserver if you are maintaining the SQL box.
Not highest on the list, but should be noted.
You certainly can. You will run into performance issues if, for example, you have large user base or if there are a lot of heavy query's being run against the DB. I have worked on several sites, usually hosted at 1and1, that run IIS and SQL Server (Express!) on the same box with thousands of users (hundreds concurrent) and millions of records in poorly designed tables, accessed via poorly written stored procedures and the user experience was certainly tolerable. It all comes down to how hard you plan on hitting the server.

SQL Server Performance Problem

Our primary database server is an 8 core box with 8GB of RAM. The CPU is a Xeon E7330 # 2.4GHz. It runs Windows Server 2003 R2 (x64 edition) and SQL Server 2005
I wanted to do some testing so I set up SQL Server 2005 on another brand-new server which is an 8 core box with 4 GB of RAM. It has a Xeon X5460 # 3.16GHz and runs Windows Server 2003 R2 Standard. I Installed SQL Server 2005 out of the box and restored a backup of the primary database on to it, and did an UPDATE STATISTICS on all the tables.
The process I was testing executes the same stored proc many times. I was astounded to find from the profiler that this proc which executes with duration=0 or 1 on the primary server, was consistently executing with durations in excess of 130. This essentially makes the secondary server useless for testing, because it's just too slow.
No other apps run on either of these two boxes, just SQL server. And unlike the primary database server, the test server only had me accessing it.
I can't believe the difference in spec between these two machines explains this colossal difference in performance. Can anybody suggest any settings I may need to change?
Updates in answers to questions:
Second server is 32 bit Windows
I'm inquiring now about the disk arrays and how comparable they are
On the primary server, the data and logs are on the same drive (!) and it works fine
Looking in task manager on the test server, the CPU is running at like 10%, only one core even showing activity
Task manager on the test server (4GB RAM) shows "PF Usage 2.01GB" with SQL Server running. On the primary server (8GB RAM) it shows "PF Usage 6.67GB". How would I make SQL Server on the test box use more of the RAM? Maybe that would make a difference
Another update:
The primary server has a RAID-5 with 15,000 RPM drives. The test box has a RAID-5 with 10,000 RPM drives.
32 bit OS means 2 GB Virtual Address Space for your processes. Standard edition OS mean no AWE extensions either. So your test machine will be severely RAM deprived compared with the production one. Your buffer pool will suffer from premature eviction of the pages, your execution plans will not have the option to choose hash-joins for a lot of queries and so on and so forth. I doubt this explains the entire difference, I'm sure there must be something more at play. You say only 10 CPU usage during the query, is your MAXDOP setting 1 by any chance on the test server? Have you compared the output of sp_configure on the two machines? (make sure you enable 'advanced options' too).
Can you run the same problem query on the two machines, from a SSMS query window, with SET STATISTICS IO ON and SET STATISTICS TIME ON? Run it 2-3 times on each and write down the results. Does it show the same number of logical reads but vastly different number of physical reads? This would point to the RAM being insufficient to cache the needed pages. IS the number of logical reads very different? It probably means you get a bad execution plan on test.
Is the query write intensive by any chance? If so did you pre-grow the test database or is your execution blocked by log growth and database growth events?
There are plenty of places to look at to narrow down the issue, like SQL performance counters, sys.dm_os_wait_stats, check the sys.dm_exec_requests wait_type and wait_resource.
was the data in the memory cache yet? or was it all read from disk
You either have a different plan being generated or some hardware differences. For hardware you can check the disk seconds/[read,write] (edit to clarify - you do this in perfmon) and see if you have some massive differences from caching (e.g. high perf raid controller).
For the plan difference just check out the execution plans.
Also do set statistics io on and see if you are getting physical reads instead of logical reads. Maybe the mem difference is keeping your dataset from fitting in memory in secondary but not primary machine.
Although you may not be able to use AWE on your 32-bit server, you can provide SQL Server with a little more memory by adding the /3GB switch to the boot.ini file. Check out Books Online, it should give you more information.

Resources