I'm using a snowflake trial version to do a performance test.
I perform 9 heavy queries (20 mins taken by XS cluster) at the same time and watch the warehouse or history pane. However, the time to display page is too much; about 30 seconds.
I think the cloudservice (like hadoop headnode?) doesn't have adequate resources to do this.
Is it because I'm using the trial version? If I use enterprise or business critical versions, will it happen?
The "cloud services", is actually unrelated to your warehouses, and 9 queries is not enough overload that. But at the same time the trial accounts might be on slightly underpowered SC layer, but I am not sure why that would be the case. The allocated credit spend is just that.
I am puzzled what you are trying to "test" about running many slow for the server sized queries at the same time?
When you say the page takes 30 seconds to load, do you mean if you do nothing the query execution status/time is only updated every ~30 seconds, or if you do a full page reload it is blank for 30 seconds?
Related
The actual execution plan of a query shows a total of 2.040s time(taking the sum of time taken at every step) but takes 52 secs(time shown at the bottom of SQL Server Management Studio) to complete.
Why there is so much difference in both the times? And how can I reduce this 52 secs time?
The elapsed time in SSMS includes network round-trip time, client render time, etc.
The execution plan indicates how long it took the server to process the query, not how long it took to stream the results to you.
If you're outputting data to a messages pane or, worse, a grid, that isn't free. As SSMS draws the data in the grid, the server is sending you rows over the network, but the query engine isn't doing anything anymore. Its job is done.
The execution plan itself only knows about the time the query took on the server. It has no idea about network latency or slow client processing. SSMS will tell you how much time it spent doing that, and the execution plan doesn't have any visibility into it at all because it's generated before SSMS has done its thing.
The execution plan runs on the server. It doesn't even know what SSMS is, never mind what it's doing with your 236,833 rows. Let's think about it another way:
You buy some groceries, and the cash register receipt says it took you 4 minutes to check out. Then you take the long way home, stop for coffee, and you dropped the groceries on the way into the house, and then it took you 20 minutes to remember where everything goes. Finally, you sit down on the couch. The cash register receipt doesn't then update and add your travel time and organization time, which is equivalent to what SSMS is doing when it is struggling trying to show you 236,833 rows.
And this is why we don't try to time the performance of a query by adding in unrealistic things that won't happen in the real world, because no real world user can process 200,000 rows of anything. Really don't draw any conclusions about real world performance from your testing in a client GUI. Is your application going to do pagination, or aggregation, or something else so an end user doesn't have to wait for 200,000 rows to render? If so, test that. If not, reconsider.
To make this faster in the meantime, try with the "Discard results after execution" option in SSMS.
I have been using Clickhouse at work for analytics purposes for a while now.
I am currently running Clickhouse v22.6.3 revision 54455 on-premise on a VM with:
fast storage
200Gb of RAM
no swap
a 40-cores CPU.
I have a few Tb of data, but no table bigger than 300 Gb. I do not use distributed tables or replication yet, and I write frequently into Clickhouse (but I don't use deletes or updates and prefer using things like the ReplacingMergeTree engine). I also leverage the MaterializedView feature for a few tables. Let me know if you need any more context or parameter, I use a pretty standard configuration.
Now, for a few months I have been experiencing performances issues where the server significantly slows down every day at 10am, and I cannot figure out why.
Based on Clickhouse built-in Graphite monitoring, the "symptoms" of the issue seem to be as follow:
At 10am:
On the server side:
Both load and RAM usage remain reasonable. Load goes up a little.
Disk write await time goes up (which I suspect is what leads to higher load)
Disk utilization % skyrockets to something between 90 and 100%
On Clickhouse side:
DiskSpaceReservedForMerge stays roughly the same (ie between 0 and 70Gb)
both OpenFileForRead and OpenFileForWrite go up by a factor of ~2
BackgroundCommonPoolTask goes slightly up, so does BackgroundSchedulePoolTask (which I found weird, because I thought this pool was dedicated to distributed operations - which I don't use) - both numbers remain seemingly reasonable
The number of active Merge tasks per minutes drop significantly but I'm unsure whether it's a consequence of slow writing or if it's causing it
both insert and general querying time are multiplied by ~10 which renders the database effectively unusable even for small tasks
Restarting Clickhouse usually fixes the problem but I obviously do not want to restart my main database every day at 10am. Most of the heavy load I put on the DB (such as data extraction and transformation, etc) happens earlier in the morning (and end around 7-8am) and runs fine. I do not have any heavy tasks running at 10am. The Clickhouse VM takes most of its host resources and I have confirmed with the devOps team that there doesn't seem to be a problem on the host or anything else scheduled on it at that time.
Is there any kind of background tasks or process that is run by Clickhouse on a daily basis and that could have a high impact on our disk capacity? What else can I monitor to figure out what is causing this problem?
Again, let me know if I can be more thorough on our settings and the state of the DB when the "bug" occurs.
Do you use https://github.com/innogames/graphite-ch-optimizer ?
Do you use TTL ?
select * from system.merges;
select * from system.part_log where event_time between ~10am~
I have spent a lot of time optimizing a query for our DB Admin. When I run it in our test server and our live server I get similar results. However, once it is actually running in production with the query being utilized a lot it runs very poorly.
I could post code but the query is over 1400 lines and it would take a lot of time to obfuscate. All data types match and I am using indices on my queries. It is broken down into 58 temp tables. When I test it using SQL Sentry I get it using 707 CPU cycles and 90,0007 reads and a time of 1.2 seconds to run a particular exec statement. The same parameters in production last week used 10,831 CPU cycles, 2.9 million reads, and took 33.9 seconds to run.
My question is what could be making the optimizer run more cycles and reads in production than my one off tests? Like I mentioned, I could post code if needed, but I am looking for answers that point me in a direction to troubleshoot such a discrepancy. This particular procedure is run a lot during our Billing cycle so it is hitting the server hundreds of times a day, it will become thousands as we near the 15th of the month.
Thanks
ADDENDUM:
I know how to optimized queries as this is a regular part of my job duties. As I stated in a comment, my test queries don't have large differences like this usually from actual production. I don't know the SQL Server side of things and wondered if there was something I needed to be aware of that might affect my query when the server is under a heavier load. This may be outside the scope of this forum, but I thought I would reach out for ideas from the community.
UPDATE:
I am still troubleshooting this, just replying to some of the comments.
The execution plans are the same between my one off tests and the production level executions. I am testing in the same environment, on the same server as production. This is a procedure for report data. The data returned, the records, and tables hit, are all the same. I am testing using the same parameters that during production took astronomical amounts of time to process, the difference between what I am doing and what happened during the production run is the load on the server. Not all of production executions are taking a long time, the vast majority are within the thresholds of acceptable CPU and reads, when the outliers have such a large discrepancy it is 500 times the CPU and 150 times the reads of the average execution (even with the same parameters).
I have zero control over the server side of things. I only can control how to code the proc. I realize that my proc is large and without it, it is probably impossible to get a good answer on this forum. I also realize that even with the code posted here I would not likely get a good answer due to the size of the procedure.
I was/am looking only for insights, directions of things to look at, using anecdotal evidence of issues other developers have overcome when dealing with similar problems. Comments that state the size of my code is the reason why performance is in the toilet, and that code that size is rarely needed, are not helpful and quite frankly ignorant. I am working with legacy c# code and a database that deals with millions of transactions for billing. There are thousands of tables in dozens of interconnected databases with an ERD that would blow your mind, I am optimizing a nightmare. That said, I am very good at it, and when myself and the database administrators are stumped as to why we see such stark numbers I thought I would widen my net and see if this larger community had any ideas.
Below is an image showing a report of the top 32 executions for this procedure in a 15 min window. Even among the top 32 the numbers are not consistent. The image below that shows all of the temp tables and main query that I just ran on the same server for #1 resource hog of the first image. The rows are different temp tables with a sum at the bottom. The sum shows 1.5 (1.492) seconds to run with 534 CPU and 92,045 reads. Contrast that with the 33.9 seconds, 10,831 CPU, and 2.9 million reads yesterday:
I have a report with a refresh time of about 1 minute, that I reduced from 5 minutes by doing the following:
Displaying fewer elements (e.g. tables, charts)
Making the formulae as simple as possible (e.g. simplifying nested if statements)
Making the queries simpler (e.g. pulling through fewer columns)
My question is, are there any other ways for me to reduce refresh time that I haven't considered?
Talking round the office shows that the methods I used were optimal from my side. Slow refresh times were caused by systemic under investment by my employer, e.g.
Universes being clunking and indexed incorrectly
Slow internet connection/being restricted to IE
Slow laptop to access BO4
This lead me to build a business case to invest in the infrastructure by leveraging the amount of time lost in waiting for reports to refresh.
We are running an eCommerce venture that has around 2000 unique visitors in a day. The total data is around 6 GB as of now.
We are using SQL Server as our database and in the coming months the website may scale up to 10000 users per day.
From this link deciphered that it would be best to use M1 instance but could anyone help really clueless as to what to purchase from these options.
Note: Our budget is around 170 Dollars PM.
EDIT: The number of concurrent users we have had is around 150
I'd try to fit everything in memory. If you can't due to budget, you need to make sure the disk response times are up to par for your expected load. You application can vary widely. One visit to a homepage could generate many queries, or maybe you have application caching set up - so it's hard for anyone to just tell you. You should also get solid numbers on your peak number of concurrent users so you can plan for that. You don't mention your current environment, but you can get some numbers about CPU, Disk MB Read/Writes/s and memory used to help you get the right size.
I'd look at the xlarge m1. That gives you 15GB of memory to play with. You'll be able to cache all the data you need and have some left over for the OS and also have some room to grow. CPU probably won't be your issue, but be sure to check out your current use.
If you have some time to spend on it, I'd try setting up JMeter to do some load testing and see how many concurrent users you can max out with one of the cheaper options.
This topic may be better suited to ServerFault.
I'd suggest you to check at the reserved instances in heavy category where you can get interesting discounts if you plan to run it for a year or similar.
But with that budget you should be thinking about an m1.medium instance, which might be a little tight for your requirements.