Google Cloud SQL Performance? - google-app-engine

I switched from a join and view based strategy on my old dedicated server to a multiple small queries strategy for Google Cloud. For me this is also easier to maintain and on my dev machine there was no noticeable performance difference. But on my App Engine and Cloud SQL it is really slow.
For example if I want to query the last 50 articles it takes 4-5 seconds, on my dev machine 160ms. For each article there are min. 12 queries on average 15 queries. That are ~750 queries, if I monitor the Cloud SQL I noticed that it always caps at ~200 queries per second. The CPU just peeks at 20%, I have just a db-n1-standard-1 with SSD. 200 queries per second also mean if I want to get the last 100 articles it will take 8-9 seconds and so on.
I already tried to set the App Engine Instance class to F4 to see if this will change anything. It didn't change anything, the number where the same. I haven't tried to increase the DB Instance because I can't see that it is at it's limit.
What to do?
Software: I use GO with a unlimited mysql connection pool.
EDIT: I even changed to the db-n1-standard-2 instance and there was no difference :(
EDIT2: I tried some changes over the weekend, 1500 iops, 4 cores, etc but nothing showed the expected improvements. The usage graphs were already indicating that there is no "hardware" limit. I managed to isolate the slow query tho... it was a super simple one where I query the country name via country-ISO2 and language-ISO3 both keys are indexed and still it takes 50ms for EACH. So I just cached ALL countires in memcache and done.

Google Cloud SQL uses GCE VM instances so things that apply to GCE apply to Cloud SQL.
When you create a db-n1-standard-1 instance your network throughput is caped to 250 MB/s by your CPU but your Read/Write (a)disk throughtput and (b)IOPS speed are capped by the the storage capacity and type, which is:
Write: 4.8ᵅ|300ᵇ
Read: 4.8ᵅ|300ᵇ
You can view what if anything is missing in your instance details:
https://console.cloud.google.com/sql/instances/[INSTANCE_NAME]
If you want to increase performance of your instance raise the number of its vCPUs and its storage capacity/type as sugested in the links above.

Related

Clickhouse DB slows down on a daily basis at 10am for seemingly no reason

I have been using Clickhouse at work for analytics purposes for a while now.
I am currently running Clickhouse v22.6.3 revision 54455 on-premise on a VM with:
fast storage
200Gb of RAM
no swap
a 40-cores CPU.
I have a few Tb of data, but no table bigger than 300 Gb. I do not use distributed tables or replication yet, and I write frequently into Clickhouse (but I don't use deletes or updates and prefer using things like the ReplacingMergeTree engine). I also leverage the MaterializedView feature for a few tables. Let me know if you need any more context or parameter, I use a pretty standard configuration.
Now, for a few months I have been experiencing performances issues where the server significantly slows down every day at 10am, and I cannot figure out why.
Based on Clickhouse built-in Graphite monitoring, the "symptoms" of the issue seem to be as follow:
At 10am:
On the server side:
Both load and RAM usage remain reasonable. Load goes up a little.
Disk write await time goes up (which I suspect is what leads to higher load)
Disk utilization % skyrockets to something between 90 and 100%
On Clickhouse side:
DiskSpaceReservedForMerge stays roughly the same (ie between 0 and 70Gb)
both OpenFileForRead and OpenFileForWrite go up by a factor of ~2
BackgroundCommonPoolTask goes slightly up, so does BackgroundSchedulePoolTask (which I found weird, because I thought this pool was dedicated to distributed operations - which I don't use) - both numbers remain seemingly reasonable
The number of active Merge tasks per minutes drop significantly but I'm unsure whether it's a consequence of slow writing or if it's causing it
both insert and general querying time are multiplied by ~10 which renders the database effectively unusable even for small tasks
Restarting Clickhouse usually fixes the problem but I obviously do not want to restart my main database every day at 10am. Most of the heavy load I put on the DB (such as data extraction and transformation, etc) happens earlier in the morning (and end around 7-8am) and runs fine. I do not have any heavy tasks running at 10am. The Clickhouse VM takes most of its host resources and I have confirmed with the devOps team that there doesn't seem to be a problem on the host or anything else scheduled on it at that time.
Is there any kind of background tasks or process that is run by Clickhouse on a daily basis and that could have a high impact on our disk capacity? What else can I monitor to figure out what is causing this problem?
Again, let me know if I can be more thorough on our settings and the state of the DB when the "bug" occurs.
Do you use https://github.com/innogames/graphite-ch-optimizer ?
Do you use TTL ?
select * from system.merges;
select * from system.part_log where event_time between ~10am~

Snowflake as backend for high demand API

My team and I have been using Snowflake daily for the past eight months to transform/enrich our data (with DBT) and make it available in other tools.
While the platform seems great for heavy/long running queries on large datasets and powering analytics tools such as Metabase and Mode, it just doesnt seem to behave well in cases where we need to run really small queries (grab me one line of table A) behind a high demand API, what I mean by that is that SF sometimes takes as much as 100ms or even 300ms on a XLARGE-2XLARGE warehouse to fetch one row in a fairly small table (200k computed records/aggregates), that added up to the network latency makes for a very poor setup when we want to use it as a backend to power a high demand analytics API.
We've tested multiple setups with Nodejs + Fastify, as well as Python + Fastapi, with connection pooling (10-20-50-100)/without connection pooling (one connection per request, not ideal at all), deployed in same AWS region as our SF deployment, yet we werent able to sustain something close to 50-100 Requests/sec with 1s latency (acceptable), but rather we were only able to get 10-20 Requests/sec with as high as 15-30s latency. Both languages/frameworks behave well on their own, or even with just acquiring/releasing connections, what actually takes the longest and demands a lot of IO is the actual running of queries and waiting for a response. We've yet to try a Golang setup, but it all seems to boil down to how quick Snowflake can return results for such queries.
We'd really like to use Snowflake as database to power a read-only REST API that is expected to have something like 300 requests/second, while trying to have response times in the neighborhood 1s. (But are also ready to accept that it was just not meant for that)
Is anyone using Snowflake in a similar setup? What is the best tool/config to get the most out of Snowflake in such conditions? Should we spin up many servers and hope that we'll get to a decent request rate? Or should we just copy transformed data over to something like Postgres to be able to have better response times?
I don't claim to be the authoritative answer on this, so people can feel free to correct me, but:
At the end of the day, you're trying to use Snowflake for something it's not optimized for. First, I'm going to run SELECT 1; to demonstrate the lower-bound of latency you can ever expect to receive. The result takes 40ms to return. Looking at the breakdown that is 21ms for the query compiler and 19ms to execute it. The compiler is designed to come up with really smart ways to process huge complex queries; not to compile small simple queries quickly.
After it has its query plan it must find worker node(s) to execute it on. A virtual warehouse is a collection of worker nodes (servers/cloud VMs), with each VW size being a function of how many worker nodes it has, not necessarily the VM size of each worker (e.g. EC2 instance size). So now the compiled query gets sent off to a different machine to be run where a worker process is spun up. Similar to the query planner, the worker process is not likely optimized to run small queries quickly, so the spin-up and tear-down of that process might be involved (at least relative to say a PostgreSQL worker process).
Putting my SELECT 1; example aside in favor of a "real" query, let's talk caching. First, Snowflake does not buffer tables in memory the same way a typical RDBS does. RAM is reserved for computation resources. This makes sense since in traditional usage you're dealing with tables many GBs to TBs in size, so there would be no point since a typical LRU cache would purge that data before it was ever accessed again anyways. This means that a trip to an SSD disk must occur. This is where your performance will start to depend on how homogeneous/heterogeneous your API queries are. If you're lucky you get a cache hit on SSD, otherwise its off to S3 to get your tables. Table files are not redundantly cached across all worker nodes, so while the query planner will make an attempt to schedule a computation on a node most likely to have the needed files in cache, there is no guarantee that a subsequent query will benefit from the cache resulting from the first query if it is assigned to a different worker node. The likeliness of this happening increases if you're firing 100s of queries at the VM/second.
Lastly, and this could be the bulk of your problem but have saved it for last since I am the least certain on it. A small query can run on a subset of the workers in a virtual warehouse. In this case the VH can run concurrent queries with different queries on different nodes. BUT, I am not sure if a given worker node can process more than one query at once. In that case, your concurrency will be limited by the number of nodes in the VH, e.g. a VH with 10 worker nodes can at most run 10 queries in parallel, and what you're seeing are queries piling up at the query planner stage while it waits for worker nodes to free up.
maybe for this type of workload , the new SF feature Search Optimization Service could help you speeding up performances ( https://docs.snowflake.com/en/user-guide/search-optimization-service.html ).
I have to agree with #Danny C - that Snowflake is NOT designed for very low (sub-second) latency on single queries.
To demonstrate this consider the following SQL statements (which you can execute yourself):
create or replace table customer as
select *
from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER
limit 500000;
-- Execution time 840ms
create or replace table customer_ten as
select *
from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER
limit 10;
-- Execution time 431ms
I just ran this on an XSMALL warehouse and it demonstrates currently (November 2022) Snowflake can copy a HALF MILLION ROWS in 840 milliseconds - but takes 431 ms to copy just 10 rows.
Why is Snowflake so slow compared (for example) to Oracle 11g on premises:
Well - here's what Snowflake has do complete:
Compile the query and produce an efficient execution plan (plans are not currently cached as they often lead to a sub-optimal plan being executed on data which has significantly increased in volume)
Resume a virtual warehouse (if suspended)
Execute the query and write results to cloud storage
Synchronously replicate the data to two other data centres (typically a few miles apart)
Return OK to the user
Oracle on the other hands needs to:
Compile the query (if the query plan is not already cached)
Execute the query
Write results to local disk
If you REALLY want sub-second query performance on SELECT, INSERT, UPDATE and DELETE on Snowflake - it's coming soon. Just check out Snowflake Unistore and Hybrid Tables Explained
Hope this helps.

What is the smallest AWS EC2 instance I can run a postgres db on?

There is the free tier on AWS and I can get a micro EC2 instance for free essentially.. or close to.. I'm sure setting up elastic ips - loads balancers etc is extra.
Would it effectively be possible for me to run a postgres DB - for a small api. Roughly about 50 inserts + 50 reads per second ... say about 6000 operations per min at the most.
I can't seem to find anything online - which makes me think that this might be a silly idea.
For this not to be an "open question" - it's simply: Is it possible and realistic to expect usable performance on an EC2 instance running my postgres DB.
The best way to determine whether the database can handle a particular workload is to test it at that capacity. Launch the database, simulate traffic and monitor its performance. Please note that every application uses a database differently, so nobody can provide "general advice" as to whether a particular-sized database would meet the needs of your particular application.
If you are going to run 'production' workloads, try to avoid using the Burstable performance instances (T2, T3) since they can hit limits under heavy workloads unless the 'Unlimited' option is selected. T2/T3 is great for bursty workloads, but not for sustained workloads.
Comparing m5.xlarge between EC2 and RDS:
Amazon EC2: 19.2c/hr ($4.61/day)
Amazon RDS: 35.6c/hr ($8.54/day)
For the additional price, Amazon RDS provides a fully-managed database, automated backups, CloudWatch metrics, etc. This is probably worth much more than $4 of your time every day.
Alternatively, if you can modify your application to use NoSQL instead of SQL, you could use Amazon DynamoDB where the capacity you mention would cost 4c/hour ($1/day) plus request and storage costs.
Don't underspend on your database — it powers everything you do. Instead, try to save money by turning off non-production systems when they aren't being used (eg weekends and evenings). That will hopefully give you enough savings to afford an appropriately-powered database.

What is fastest way to Improve database performance? aim straight to my target (5000 users)? Or first use artificial milestones?

When improving database performance, What is the fastest way?
should I go straight to my target (5000 users)?
Or should I go through artificial milestones - (for example - first 1000 users, only then 2500...)?
we have a performance problem. For the purpose of the question, suppose the only bottleneck is the Database and all other resources consume zero time.
Facts:
Database performance for 50 concurrent users is good
Database performance for more then 50 is not acceptable and by that I mean database requests takes too long. Instead of ~0.5 sec they usually take between ~2 and ~7 sec, and sometimes even ~30 sec
On about 70 users, the system stop working at all, most database requests takes more then ~30 sec
Database is implemented in SQL SERVER 2005\8
Requirement:
The application need to support 5000 concurrent users.
From my understanding, and please correct me if I'm wrong or there is something missing, the best practise for database performance improvement is:
Define performance objective targets.
Establish testing environment identical (as much as possible) to production environment
Check if the performance targets is OK. if not do the following until performance targets is OK
3.1 Improve performance (on at a time) by - server re-configuration, indexes improvements ,code improvements, database structure improvements and more, using trace and monitor tools such as - performance counters, SQL Server profiler, logs and more
3.2 Test and analyse the affects of the change and accept or reject it
My question is:
Assuming that getting the system to work on anything else then 5000, say for example 1000, or even 2500 users Doesn't provide any business value, What is the fastest way to get to 5000 users?
Getting there directly?
Or first get to smaller milestones?
for example - first target 500 users, then 1000, then 2500 ....
Something else??
Do I get any technical value from using milestones, that will eventually take me to 5000 users faster?
Tuning up to 500 users, for example, can lead you to solutions that won't be effective when you increase to 1000 users. Example: let's imagine that you can tune cache for 500 users, perhaps you won't have enought RAM to get a good result for 5,000.
Go for your goal.
I don't see that artificial milestones really add any value, if the requirement if 5000 concurrent users, go straight for that, otherwise you run the risk of wasting time making optimisations which don't scale.
I would say that getting small milestone would make you a problem, imagine that you get a small number of users (or a very big one) you would have to find a point where the milestones are not too big nor too small. That can be tricky.

is Google App Engine-MapReduce my best bet for a massively parallel solution in a cloud?

Is Google App Engine-MapReduce my best bet for a massively parallel solution in a cloud? My problem takes hours multi-threaded on a 4 core PC. I'd say 600 minutes might do. I would prefer 1000 servers get it done in 36 seconds. Switching from 4 core threading to 1000 server processing is eminently doable in my app. In fact, I can already send 1000 small jobs to 4 cores but it's not going to get done sooner than 4 big jobs to 4 cores given that I still have only 4 cores. (My dataset is small so Map-Reduce, which was designed for large datasets, might have a different sweet-spot than my type of compute-bound problem.)
I think I can get this done if I have 1000 simultaneous URL fetches but as you may know Google limits at 10 requests. It seems Google is actively discouraging outsiders from putting massively parallel solutions on their infrastructure.
I started looking into Google App Engine because upon deployment there will be very few users and it appeared App Engine has fine-grained costs - a feature I really like. My impression was that Amazon EC2 would be more work but also that costs were more likely to be chunky. Given that I'm a home-based business, I don't want to pay anything more than a nominal amount when in the early months I don't expect a lot of visitors to my website. May be they will never visit.
In general, where do people turn to for massively parallel (compute-bound) problems that ought to be served by a cloud?
For compute bound tasks, EC2 is often better than App Engine. App Engine is focused on serving web requests, not pure number crunching. It is not designed to go from 0 requests this minute to 1000 requests the next minute and back to 0 requests the minute after that. In fact, one of its features is that you generally don't need explicit control over how many instances are running at once. Also, long running jobs are not possible, though for many tasks you can use Task Queues to chain jobs together. I think the current limit on background tasks is 10 minutes.
EC2 does have a super low tier of service that you can get for free. EC2 lets you explicitly bring servers up and down, but I think the smallest increment you can pay for is 1 hour.
Of course, if you want to literally run your job on 1000 servers, neither app engine nor EC2 will likely let you do that for free. Both are very elastic/adaptive, but bringing 1000 servers up for 30 seconds of work is not very economical for them. On App Engine you will likely run up against an hourly or daily quota before you had 1000 simultaneous instances running. On EC2, you generally pay by the server instance. So you would be paying for 1000 hours of instance time. Of course, one of Amazon's High CPU instances might be much more powerful than your PC, so maybe you'd only need a 100 or so. or maybe you could compromise and have only 20 instances running at a time, meaning it takes a few minutes to finish your computation, but you don't go broke.
Have you checked Amazon's Elastic MapReduce? http://aws.amazon.com/elasticmapreduce/
With App Engine you should also investigate the task queues. If you already know how to split the big problem into many small ones, you could create one task that takes in the big problem and then creates 1000 (or 10.000) subtasks to tackle the smaller problems. And after that collect the results in one task, if needed.
Individual tasks can run up to 10 minutes before they are terminated, which makes them a little bit easier to use for computing tasks than regular requests.

Resources