MSSQL on ECS with EFS (NFS) Drive Performance? - sql-server

I've a single container hosting an MSSQL instance. I'm streaming a ~90Mb Gzip CSV via a Lambda from S3 and writing a large amount of inserts in a transaction.
So far the lambda has taken 5 minutes but I have larger files then this and will likely exceed the 15 minute lambda timeout.
I'm thinking that the mounted EFS drive inside the docker container will be an IO bottleneck. Anyone ran into this before? I'm also unsure what the outcome of ECS starting a second container with both MSSQL instances using the same data folder would be.

Related

How to access large amount of csv files Data from aws EC2 instance to in multiple Spot Instances?

I have 5 years of Stock market data, and strategies which I regularly run on my local machine, but I felt that my local machine takes so much time for processing that data. so I decided to process my data in was server.
So I did little research about EC2 instances.
So I found that I can store my raw stock market data in Ec2 Instance and install Jupyter notebook and python modules in EC2 instances.
and run/process my data on multiple Spot Instances. which take less time for processing data, and as well as it will cost me efficiency.
but I don't understand How can I access data and connect Ec2 instances and spot instances to each other.
so please someone can help me?

Requests from docker container in Google Cloud Run Service to Google Cloud SQL takes up to 2 minutes

Im using the Google Cloud Run service to host my Spring application in a docker container. The database is running in the Google SQL service. My problem is that the requests from the application to the database can take up to 2 minutes. See the Google Cloud Run log (the long requests are painted yellow). And here's the Dockerfile and Docker Compose File
The database is quite empty, it contains about 20 tables but each contains only few rows, so no request is bigger than few kB. And to make it more strange, after re-deploying the application the requests are fast again. But after few minutes, hours or even after a whole day the requests slow down again. When I start the application on my local machine the requests are always fast (to my local SQL and Google SQL instance), never had any slow connection. All actions within my application that doesn't require any DB request are still fast and takes only few ms.
Both services are running in the same region (europe-west) and CPU usage of the run service is never higher than 15%, of the Google SQL never above 3%. The Google SQL uses 1 CPU and 3.75GB, the Google run service has 4GB RAM and 2CPUs. But increasing the power of the Google Run Service and Google SQL doesn't improve the request latency. Google Cloud SQL is using MySQL 5.7 (like my local DB).
And after seeing the logs only warnings are shown in the filtered Google SQL log (I really dont know why this happens). Additionally here are my DB connection settings in the Spring config. But I dont think this has any impact, the config works perfect when connecting my local application to my local SQL instance or to the Google SQL instance.
But maybe one of you has an idea?
While not a real answer, there is a bug filed at Google that is tracking the issue:
https://issuetracker.google.com/issues/186313545
This is really hurting our customers experience and makes us loose trust in the service quality of cloud run. Even more so if there is no feedback from Google to know if they are even addressing the issue.
Edit:
The issue now seems to be resolved, according to the interactions in https://issuetracker.google.com/issues/186313545

Nifi ExecuteSQLRecord: how to capture performance bottleneck

Objective
I have Apache Nifi Docker container on Azure VM with attached premium very high-throughput SSD disk. I have MSSQL Server 2012 database on AWS. Nifi to database communication happens through mssql jar v6.2, and through high-throughtput AWS Direct Connect MPLS network.
Within Nifi Flow only one processor is executed - ExecuteSQLRecord. It use only one thread/CPU and has 4 GB JVM Heap Space available. ExecuteSQLRecord execute query that return 1 million of rows, which equals to 60MB Flow File. Query is based on table indexes, so there is nothing to optimize on DB side. Query looks like: SELECT * FROM table WHERE id BETWEEN x AND y.
The issue
ExecuteSQLRecord with 1 thread/CPU, 1 query , retrieves 1M of rows (60MB) in 40 seconds.
In the same time, the same query run from MSSMS and database internal network takes 18 seconds.
In the same time query is already optimized on DB side (with indexes), and throughtput scale linearly with increasing number of threads/CPUs - network is not a bottleneck.
Questions
Is this performance okay for Nifi 1 CPU? Is it okay that Nifi spends 22 seconds (from 40) for retrieval and storing the results to Content Repository?
How does Nifi pull the data from MSSQL Server? Is this a pull approach? If yes, maybe we have to many roundtrips?
How can I check how much time Nifi spending on converting result set to CSV, and how much time for writting into Content Repository?
Are you using the latest Docker image (1.11.4)? If so you should be able to set the fetch size on the ExecuteSQLRecord processor (https://issues.apache.org/jira/browse/NIFI-6865)
I got a couple of different results when I searched for the default fetch size for the MSSQL driver, one site said 1 and another said 32. In your case for that many records I'd imagine you'd want it to be way higher (see https://learn.microsoft.com/en-us/previous-versions/sql/legacy/aa342344(v=sql.90)?redirectedfrom=MSDN#use-the-appropriate-fetch-size for setting the appropriate fetch size).
To add to Matt's answer, you can examine the provenance data for each flowfile and see the lineage duration (amount of time) it spent in each segment of the flow. You can also see the status history for every processor, so you can examine the data in/out by size and number of flowfiles, CPU usage, etc. for each processor.

Weird SQL Server slowdown when transferring files to AWS S3

I'm an experienced SQL Server DBA, but I just can't figure this out and wondering if anyone has experienced the same.
I have a 1 TB database running on an m4.10xlarge AWS instance. I have Full/Diff/Trans backups that happen on various schedules to a separate drive that's an "sc1" AWS Volume Type -- basically the cheap cold HDD storage type.
As an additional failsafe, I started experimenting with copying the files on this backup drive to S3. This is where my trouble and this great mystery began. I'm using the AWS CLI "sync" command to transfer files from the sc1 volume to S3, using a simple command like:
aws s3 sync "j:\sqlbackups" s3://****bucketname****/sqlbackups/
Because of the size of the backup files, an initial sync can take a few hours. Whenever I have this command running, at some point, SQL Server starts grinding down to being ridiculously slow. Queries timeout. Connections can't be established. CPU/Memory usage remains the same -- nothing out of the ordinary there. The only thing that's out of the ordinary is the Read Throughout on the Drive containing the Data file spikes:
https://www.dropbox.com/s/avqcyug700jdjzw/Screenshot%202019-12-08%2015.07.45.png?dl=0
BUT, there's no relation between this drive (the one with the SQL Server data file), and the sc1 backup drive. And the aws sync command shouldn't touch the drive with the data file.
It's not just SQL Server that slows down -- the whole Windows server that it's on slows down. Launching Chrome takes 30 seconds instead of 1. Reading an event in Event Viewer takes 60 seconds instead of <1 second.
Once this odd slowdown happens, I kill the aws sync command, but the problem still remains. I have to stop/restart the SQL Server service for the problem to go away.
I'm so confused as to how this could be happening. Any ideas?
EDIT 12/8/19 5:39PM EST
I've realized that it's not just the drive containing the data file whose throughput goes off the charts during an S3 transfer -- it's all the drives on the server. This led me to read about how EBS storage works, and I understand that EBS storage is network attached storage, and therefore uses network bandwidth.
This makes me think that the transfer of multi-TB files to S3 is saturating my network bandwidth, making my C/D/E drives perform really poorly. That would explain why every part of the Windows server, not just SQL Server, slows during during an S3 transfer.
What doesn't make sense is that as an m4.10xlarge instance, this server has a 10 Gbps connection, which means more than 1 Gigabyte / second bandwidth. My S3 transfers are only showing a usage of a max of 150 Megabytes/second. Plus, the m4.10xlarge instance type is EBS-Optimized, meaning there's dedicated bandwidth available for the EBS volumes that shouldn't conflict with the bandwidth being used for the S3 transfer.

Docker container IO performance

I am trying to investigate the IO performance overhead of docker so I created a mysql docker container on a specific machine and I ran the sysbench mysql benchmark to measure IO performance. Sysbench basically executes some read/write transactions over a period of time and then outputs the number of completed transactions and the transactions/second rate.
When I run the benchmark on the native machine, I get a 779.5 transactions per second. When I run the benchmark in a mysql container, I get 336 transactions/second. Almost half the number of transactions per second. Is this a normal performance overhead of docker? This is a huge disadvantage for running the database in a container in production systems, especially for IO/database intensive applications
Are you using a Docker volume for the database files? By default, file writes inside the container will be made to the copy-on-write filesystem, which will be inefficient for database files. Using a volume will mean you can write directly to the host filesystem.

Resources