I'm using a App Engine task handler, to process a workload (import files to database).
Looking at my Cloud SQL monitoring, I see that after some minutes, the write rate declines (see picture), and my task runs much slower. Does Google throttle the Instance's CPU or might there be other reasons?
According to the documentation https://cloud.google.com/compute/docs/cpu-platforms
all cores have turbo frequency, although it is not guaranteed.
All-core turbo frequency: The frequency at which each CPU typically
runs when all cores in the socket are not idle at the same time.
This post explains how you can monitor your CPU speed https://askubuntu.com/questions/218567/any-way-to-check-the-clock-speed-of-my-processor
You can ssh into the machine and monitor it real-time.
Most services including Cloud SQL provide an IOPS quota which is based upon disk size and other factors.
Your screenshot indicates that you have exceeded that READ quota for Cloud SQL. The result is throttling of disk I/O.
When you created the Cloud SQL instance, you selected a very small storage disk. I recommend resizing that disk larger so that normal operations do not exceed the disk IOPS quota for both read and write.
Related
I have 13 million documents on Azure Blob Storage that I can azcopy to my desktop memory within 24 hours. However, as soon as I try to transfer these files to my external hard drive, the time needed to complete the transfer jumps to 60 days. The files aren't large - each 100 kb - so the entire transfer is about 1.3 TB. I have tried:
Zipping the files, transfer, unzip. Problem: Unzipping takes just as long
azcopy directly into the SSD hard drive
robocopy files from internal to external drive
Simple ctrl-c ctrl-v.
Each of the above options take months to complete the transfer. Any ideas on how to speed this up??? Why would azcopy be so much faster for an internal drive than an external one?
There could be several reasons for the performance issue.
You can run a performance benchmark test on specific blob containers or file shares to view general performance statistics and to identify performance bottlenecks. You can run the test by uploading or downloading generated test data.
Use the following command to run a performance benchmark test.
Syntax
azcopy benchmark 'https://<storage-account-name>.blob.core.windows.net/<container-name>'
Optimize the performance of AzCopy with Azure Storage
There are several options for transferring data to and from Azure, depending on your needs: Transfer data to and from Azure
Azcopy Fast Data Transfer is a tool for fast upload of data into Azure – up to 4 terabytes per hour from a single client machine. It moves data from your premises to Blob Storage, to a clustered file system, or direct to an Azure VM. It can also move data between Azure regions.
The tool works by maximizing utilization of the network link. It efficiently uses all available bandwidth, even over long-distance links. On a 10 Gbps link, it reaches around 4 TB per hour, which makes it about 3 to 10 times faster than competing tools we’ve tested. On slower links, Fast Data Transfer typically achieves over 90% of the link’s theoretical maximum, while other tools may achieve substantially less.
For example, on a 250 Mbps link, the theoretical maximum throughput is about 100 GB per hour. Even with no other traffic on the link, other tools may achieve substantially less than that. In the same conditions (250 Mbps, with no competing traffic) Fast Data Transfer can be expected to transfer at least 90 GB per hour. (If there is competing traffic on the link, Fast Data Transfer will reduce its own throughput accordingly, in order to avoid disrupting your existing traffic.)
Fast Data Transfer runs on Windows and Linux. Its client-side portion is a command-line application that runs on-premises, on your own machine. A single client-side instance supports up to 10 Gbps. Its server-side portion runs on Azure VM(s) in your own subscription. Depending on the target speed, between 1 and 4 Azure VMs are required. An Azure Resource Manager template is supplied to automatically create the necessary VM(s).
Your files are very small (e.g. each file is only 10s of KB).
You have an ExpressRoute with private peering.
You want to throttle your transfers to use only a set amount of network bandwidth.
You want to load directly to the disk of a destination VM (or to a clustered file system). Most Azure data loading tools can’t send data direct to VMs. Tools such as Robocopy can, but they’re not designed for long-distance links. We have reports of Fast Data Transfer being over 10 times faster.
You are reading from spinning hard disks and want to minimize the overhead of seek times. In our testing, we were able to double disk read performance by following the tuning tips in Fast Data Transfer’s instructions.
I have a GAE standard Python app that does some fairly computational processing. I need to complete the processing within the 60 second request time limit, and ideally I'd like to do it faster for a better user experience.
Splitting the work to multiple threads don't seem to be a good solution because the threads would likely run on the same CPU and thus wouldn't give a speed up.
I was wondering if Google Cloud Functions (GCF) could be used in a similar manner as threads. For example, if I create a GCF to do the processing, split my work into 10 chunks, and make 10 GCF calls in parallel, can I expect to get results 10x faster? (aside from latency and GCF startup costs)
Each function invocation runs in its own server instance, and a function will scale up to 1000 instances to handle concurrent requests in parallel. So yes, you can do this, if you are willing to potentially pay the cold start cost of each server instance as it's allocated for its first request.
If you're able to split the workload in smaller chunks that you'd be launching in parallel via separate (external) requests I'd suspect you'd get a better performance (and cost) by using GAE itself (maybe in a separate service) instead of CFs:
GAE standard environment instances can have higher CPU speeds - a B8 instance has 4.8 GHz, the max CF CPU speed is 2.4 GHz
you have better control over the GAE scaling configuration and starting time penalties
I suspect networking delays would be at least the same if not better on GAE - not going to another product infra (unsure though)
GAE costs would likely be smaller since you pay per instance hours (regardless of how many requests the instance handles) not per request/invocations
I have an application that requires really low latency (real time game).
Currently in my solution it takes less than 2 milliseconds for a message to route to from the client front-end server to the destination server.
Does anybody know how much time will it take in Google Cloud Pub/Sub to route a message from one server to another?
Thank you!
While Cloud Pub/Sub's end-to-end latency at the 99.9th percentile is sufficient for many applications--including some using it for real-time interaction, 2ms is lower than what the system can currently promise. We have thus far prioritized high throughput and strong delivery guarantees. End-to-end latency is also highly dependent on the rate at which a subscriber issues pull requests. A subscriber should always have at least a few open pull requests if throughput and/or latency are important. We do aim to significantly reduce out intra-region latencies but at the moment Cloud Pub/Sub cannot guarantee 2ms intra-region latencies at the 99.9th percentile.
I am running a site on App Engine (managed VM). It is currently running on f1-micro instances.
The Cloud platform Console reports that CPU utilization is ~40%. I became a little suspicious because the site is receiving practically zero traffic. Is this normal for an idle golang app on a f1-micro instance?
I logged onto the actual instance and "top" reports CPU utilization ~2%.
What gives? Why is "top" saying something different than the Console?
top gives a momentary measure (I believe every second?), while the Console's data might be over a longer period of time during which the site had higher activity. With a micro instance, it seems plausible that relatively normal amounts of traffic could take up a relatively high percentage of the CPU, leading to such a metric.
I have an application that I'd like to make more efficient - it isn't taxing any one resource enough that I can identify it as a bottleneck, so perhaps the app is doing something that is preventing full efficiency.
The application pulls data from a database on one SQL Server instance, does some manipulation on it, then writes it to a database on another SQL Server instance - all on one machine. It doesn't do anything in parallel.
While the app is running (it can take several hours), none of the 4 CPU cores are maxed out (they hover around 40-60% utilization each), the disks are almost idle and very little RAM is used.
Reported values:
Target SQL Server instance: ~10% CPU utilization, 1.3GB RAM
Source SQL Server instance: ~10% CPU utilization, 300MB RAM
Application: ~6% CPU utilization, 45MB RAM
All the work is happening on one disk, which writes around 100KB/s during the operation, on average. 'Active time' according to task manager is usually 0%, occasionally flickering up to between 1 and 5% for a second or so. Average response time, again according to task manager, moves betweeen 0ms and 20ms, mainly showing between 0.5 and 2ms.
Databases are notorious for IO limitations. Now, seriously, as you say:
The application pulls data from a database on one SQL Server instance,
does some manipulation on it, then writes it to a database on another
SQL Server instance - all on one machine.
I somehow get the idea this is a end user level mashine, maybe a workstation. Your linear code (a bad idea to get full utilization btw, as you never run all 3 parts - read, process, write - in parallel) will be seriously limited by whatever IO subsystem you have.
But that will not come into play as long as you can state:
It doesn't do anything in parallel.
What it must do is do things in parallel:
One task is reading the next data
One task does the data processing
One task does the data writing
You can definitely max out a lot more than your 4 cores. Last time I did something like that (read / manipulate / write) we were maxing out 48 cores with around 96 or so processing threads running in parallel (and a smaller amount doing the writes). But a core of that is that your application msut start actually using multiple CPU's.
If you do not parallelize:
You only will max out one core max,
YOu basically waste time waiting for databases on both ends. The latency while you wait for data to be read or committed is latency you are not processing anything.
;) And once you fix that you will get IO problems. Promised.
I recommend reading How to analyse SQL Server performance. You need to capture and analyze the wait stats. These will tell you what is the execution doing that prevents it from going max out on CPU. You already have a feeling that the workload is causing the SQL engine to wait rather than run, but only after you understand the wait stats you'll be able to get a feel what is waiting for. Follow the article linked for specific analysis techniques.