I am doing load test on an application that uses Vespa as a database. I have some sample records for which I am doing the test. Now when I run the load test for the first time, Vespa caches the query result which affects our next test scenario results.
Is there a way so that we can disable caching of results for the query for testing purposes and then enable it again.
I am hoping to get the same response time from Vespa after running the same query for the second time.
Even though I have implemented the below code in the services.xml file of our Vespa application. The response time drastically changed for the second time query.
<content id="content" version="1.0">
<engine>
<proton>
<tuning>
<searchnode>
<summary>
<store>
<cache>
<maxsize>0</maxsize>
<compression>
<type>none</type>
</compression>
</cache>
</store>
</summary>
</searchnode>
</tuning>
</proton>
</engine>
...
</content>
Vespa does not cache the query result, and with the summary cache disabled there is no caching at all.
Vespa (like many other databases) takes a while to "warm up" - due to effects like JIT compilation of Java code, OS disk caches, CPU instruction/data caches and so on. You should begin by doing enough queries first that the query latency reaches a steady state.
Related
We are seeing very inconsistent performance scores in web.dev with our NextJS application. At first, we had around 30 performance so we started optimising. Now we are at around 90 with a margin of 5 locally in Lighthouse. However, when we are testing it on web.dev, our score variates from 73 to 99 which is a huge difference. What could be the cause of this? When you compare the two reports with exact the same bundle size, one of them has 670ms total blocking time and the other has 70ms. Also, de "Minimize main-thread work" and "Reduce Javascript execution time" differ a lot. "Minimize main-thread work" is 3.5s at the less performant run and 2.8s at the high performing run. "Reduce Javascript execution time" is 1.5s at the less performant run and is not present at all (so 0s i assume) on the performant run. Again, this is with exact the same JS and CSS bundle.
What could cause this drop in performance? Is this any kind of error in my code or is this just an issue in Lighthouse/web.dev? I am hosting on Vercel which serves my website trough a CDN and i am also using a CDN for serving images.
Any help will be appreciated.
Two factors jumped to my mind:
CDN related
Your CDN provider runs many datacenters around the globe. The request from any user including web.dev is routed to the nearest datacenter. Which may or may not have the requested resource in its cache. If it doesn't, then the resource (.html page or script bundle etc.) is requested from your server - this takes extra time and performance suffers.
Once in cache, the resource remains there for some time. No CDN provider will keep it there forever so sooner or later it gets evicted from the cache. When this happens depends on things like CDN provider policy, the free or paid plan you are on, the HTTP headers set by your webserver, demand on the resource.
Lighthouse related
The report generated by web.dev has "CPU/Memory Power" setting at the bottom. It reflects the metrics of the hardware used by Lighthouse. This setting affects the performance results a lot.
Cloud instance of Lighthouse at web.dev runs on a shared cloud VM and the setting reflects the current workload that varies from time to time.
P.S.
Server related
When the CDN requests a resource from a webserver, the performance could take a further hit in case the server suffers from cold starts.
My team and I have been using Snowflake daily for the past eight months to transform/enrich our data (with DBT) and make it available in other tools.
While the platform seems great for heavy/long running queries on large datasets and powering analytics tools such as Metabase and Mode, it just doesnt seem to behave well in cases where we need to run really small queries (grab me one line of table A) behind a high demand API, what I mean by that is that SF sometimes takes as much as 100ms or even 300ms on a XLARGE-2XLARGE warehouse to fetch one row in a fairly small table (200k computed records/aggregates), that added up to the network latency makes for a very poor setup when we want to use it as a backend to power a high demand analytics API.
We've tested multiple setups with Nodejs + Fastify, as well as Python + Fastapi, with connection pooling (10-20-50-100)/without connection pooling (one connection per request, not ideal at all), deployed in same AWS region as our SF deployment, yet we werent able to sustain something close to 50-100 Requests/sec with 1s latency (acceptable), but rather we were only able to get 10-20 Requests/sec with as high as 15-30s latency. Both languages/frameworks behave well on their own, or even with just acquiring/releasing connections, what actually takes the longest and demands a lot of IO is the actual running of queries and waiting for a response. We've yet to try a Golang setup, but it all seems to boil down to how quick Snowflake can return results for such queries.
We'd really like to use Snowflake as database to power a read-only REST API that is expected to have something like 300 requests/second, while trying to have response times in the neighborhood 1s. (But are also ready to accept that it was just not meant for that)
Is anyone using Snowflake in a similar setup? What is the best tool/config to get the most out of Snowflake in such conditions? Should we spin up many servers and hope that we'll get to a decent request rate? Or should we just copy transformed data over to something like Postgres to be able to have better response times?
I don't claim to be the authoritative answer on this, so people can feel free to correct me, but:
At the end of the day, you're trying to use Snowflake for something it's not optimized for. First, I'm going to run SELECT 1; to demonstrate the lower-bound of latency you can ever expect to receive. The result takes 40ms to return. Looking at the breakdown that is 21ms for the query compiler and 19ms to execute it. The compiler is designed to come up with really smart ways to process huge complex queries; not to compile small simple queries quickly.
After it has its query plan it must find worker node(s) to execute it on. A virtual warehouse is a collection of worker nodes (servers/cloud VMs), with each VW size being a function of how many worker nodes it has, not necessarily the VM size of each worker (e.g. EC2 instance size). So now the compiled query gets sent off to a different machine to be run where a worker process is spun up. Similar to the query planner, the worker process is not likely optimized to run small queries quickly, so the spin-up and tear-down of that process might be involved (at least relative to say a PostgreSQL worker process).
Putting my SELECT 1; example aside in favor of a "real" query, let's talk caching. First, Snowflake does not buffer tables in memory the same way a typical RDBS does. RAM is reserved for computation resources. This makes sense since in traditional usage you're dealing with tables many GBs to TBs in size, so there would be no point since a typical LRU cache would purge that data before it was ever accessed again anyways. This means that a trip to an SSD disk must occur. This is where your performance will start to depend on how homogeneous/heterogeneous your API queries are. If you're lucky you get a cache hit on SSD, otherwise its off to S3 to get your tables. Table files are not redundantly cached across all worker nodes, so while the query planner will make an attempt to schedule a computation on a node most likely to have the needed files in cache, there is no guarantee that a subsequent query will benefit from the cache resulting from the first query if it is assigned to a different worker node. The likeliness of this happening increases if you're firing 100s of queries at the VM/second.
Lastly, and this could be the bulk of your problem but have saved it for last since I am the least certain on it. A small query can run on a subset of the workers in a virtual warehouse. In this case the VH can run concurrent queries with different queries on different nodes. BUT, I am not sure if a given worker node can process more than one query at once. In that case, your concurrency will be limited by the number of nodes in the VH, e.g. a VH with 10 worker nodes can at most run 10 queries in parallel, and what you're seeing are queries piling up at the query planner stage while it waits for worker nodes to free up.
maybe for this type of workload , the new SF feature Search Optimization Service could help you speeding up performances ( https://docs.snowflake.com/en/user-guide/search-optimization-service.html ).
I have to agree with #Danny C - that Snowflake is NOT designed for very low (sub-second) latency on single queries.
To demonstrate this consider the following SQL statements (which you can execute yourself):
create or replace table customer as
select *
from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER
limit 500000;
-- Execution time 840ms
create or replace table customer_ten as
select *
from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER
limit 10;
-- Execution time 431ms
I just ran this on an XSMALL warehouse and it demonstrates currently (November 2022) Snowflake can copy a HALF MILLION ROWS in 840 milliseconds - but takes 431 ms to copy just 10 rows.
Why is Snowflake so slow compared (for example) to Oracle 11g on premises:
Well - here's what Snowflake has do complete:
Compile the query and produce an efficient execution plan (plans are not currently cached as they often lead to a sub-optimal plan being executed on data which has significantly increased in volume)
Resume a virtual warehouse (if suspended)
Execute the query and write results to cloud storage
Synchronously replicate the data to two other data centres (typically a few miles apart)
Return OK to the user
Oracle on the other hands needs to:
Compile the query (if the query plan is not already cached)
Execute the query
Write results to local disk
If you REALLY want sub-second query performance on SELECT, INSERT, UPDATE and DELETE on Snowflake - it's coming soon. Just check out Snowflake Unistore and Hybrid Tables Explained
Hope this helps.
I have a solr cloud setup, I fire a simple query in a loop. The number and type of documents and query never changes but the qtime and corresponding request time varies from 6 ms to 1000+ ms in the same loop. What could be causing this?
This is because of caching. When you start the service not all kind of caches will be loaded into the memory(which is dependent on auto warmup).
The query time is also dependent on the load on the server machine. If GC is running on the service because of other activities, then the queryTime again varies.
From some last couple of weeks, I have been working around Elasticsearch and Solr, and trying to do OLTP processing in real time. However, what comes to me is they claims(especially ES) to be real time. The meaning of real time looks a lot fuzzy to me.
If we go deep into it, both ES and Solr, defines a refresh rate or a soft-commit rate, after which the newly indexed documents would be available for search, effectively providing only Near-Real time capabilities.
It looks like by Real time search, it is either a marketing statement to call it real time, or they make the word fuzzy by talking about Real Time Search rather than batch or analytical processing.
Am I correct, or correct me if I am wrong, and there is a real-time search possible in a typical OLTP system, where every transaction has search visibility to last document ?
Elasticsearch is a Near Real Time search engine for search. Elasticsearch is Real Time for operations like Create, Update, Delete and Get.
By default, refresh is 1 second. In some use cases, it could appear as real time. For example, I was working for a french gov service and we were producing statistics per day. So for our use case, it was somehow real time from our perspective.
For logs for example, 1 second is enough in most use cases.
You can modify this default value but it comes with a cost.
If you really need real time, then you probably want to use a SQL database.
My 2 cents.
Yes, DSE Search is indeed Near real-time and has not yet achieved the mythical goal of absolute zero latency. But... even traditional Real real-time is not real-time once you factor in the time to do the actual database update, plus the fact that a lot of traditional database updates are batch-oriented, or even if the actual update operation is not batched, there is likely to be some human process that delays the start of the database update from the original source of a data change.
Also keep in mind that the latency of a database update needs to include maintaining the required (tunable) consistency for replicating data updates in the cluster.
Rather than push you back towards SQL if you want real-time, I would challenge you to fully justify the true latency requirements of the app. For example, with complex distributed applications you need to be prepared for occasional resource outages, such as network delays, so that it is usually much better to design a modern distributed application to be a lot more flexible and asynchronous than a traditional, synchronous, fragile (think HealthCare.gov) app architecture that improperly depends on a perception of zero-latency distributed operations.
Finally, we are working on enhancements to reduce the actual latency of database updates, coupled with ongoing improvements in hardware performance that further shrink the update latency window.
But ultimately, all computing real-time measures will have some non-zero latency and modern distributed apps must be designed for at least some degree of decoupling between database updates and absolute dependency on those updates.
Worst case scenario, apps that need to synchronize with database updates may need to implement a polling strategy to wait for the update to complete.
ElasticSearch has real time features for CRUD operations. On GET operations, it checks the Transaction log, to look for any uncommitted changes and return the most relevant document.
The Percolator feature enables realtime in search queries as well. It allows you to register queries (percolation), that will be used at indexing time to return matching documents to those predefined queries.
This workflow looks like this:
Register specific query (percolation) in Elasticsearch
Index new content (passing a flag to trigger percolation)
The response to the indexing operation will contain the matched percolations
A very good blog with live example that explains the Percolator concept:
http://blog.qbox.io/elasticsesarch-percolator
I'm working at a Apache Solr project.
( distributed in a cloud environment - Amazon ec2 instances ).
I've noticed Solr does an excellent job in caching the results.
When I execute the same queries again - the respons states Solr QTime 0 or 1 millisecond.
I want to stress test the Solr system. Therefore I have a limited list of queries I could use ( 50 000 unique queries ). The problem now is that all queries are cached!
When I stress test - after 5 mins or so - all my queries are given in Solr & executed.
This makes the system sweat unther the heavy load :) ( witch was the purpose ).
But then, as I execute the same query set again - QTime is almost zero!
--> Solr has an easy time & isn't stressed.
My question:
How can you turn of ALL Solr caches ( Both Solr and Lucence caches)?
Or how can you limit the cache?
I've tried to turn of all Solr intern cache, but the cache still stays.
( QueryResultCache and FieldCache )
Note: The config mentions that Lucence will take management of an internal cache - maybe this cache is the problem?
It's just weird that all of the 50 000 queries can be stored in the cache - out of the box.
You can comment out the filterCache, queryResultCache and documentCache in your configuration. Lucene's FieldCache cannot be disabled.
Although it doesn't really make any sense to do so, even for benchmarking. Would you also disable disk caching in your operating system? CPU caches (all three levels)? The internal cache of each hard disk?
Caches are part of the system, if you disable them you won't accurately simulate what happens in production, thus rendering the benchmark useless.
Turning off caches is an excellent idea, at least those that are application specific. A benchmark in this case is intended I gather to find the response/cost of a query that has not been seen before; as opposed to those that are popular within a cache expire.
You sound like you want metrics that tell you how the search system performs; not the query cache.
Previous answers are really out of left field, suggesting all benchmarks should measure the same thing, "his own definition of " real life performance. That is not how engineering works.
As to the remark about "disk caches". There are no disk caches in Linux; only a page cache; whether that page is persisted on disk, created and destroyed in memory or pre-allocations for large file systems that are smart....they are all pages.
There is benefit to benchmarking with caches... if you bother to measure the cache performance metrics. duh.
BTW, between "-server" and "XXcompileThreshold" you want to make sure your first large set of queries are either random enough or specifically chosen to exercise as many function pathways in the Solr/Lucene as you can; so JIT is both active and somewhat settled.