How to limit statistics in Gatling report to steady state - benchmarking

I have a classical Gatling benchmark setup, I ramp users for a while and then keep a constant rate. In the very nice reports Gatling shows, there are some stats but IIUC these cover the whole scenario - not just steady state. Is it possible to limit this to only certain part of the scenario? (basically just ignore all requests started or finished before and after this period?) Or do I have to manually truncate the simulation logs?

That's only available in FrontLine, our commercial product. You can get stats on any time window.

Related

Is there a way to reduce the time for the snowpipe load time?

I'm reading this User Guide and it mentions that "typically" snowpipe takes 1 minute to load the data. In my experiments I found that it takes a minute always. Where is this 1 minute latency coming from? It feels like there is some batch processing going on per minute. Is there a setting somewhere to reduce it further down.
As of today, there's no setting to reduce this latency - you're essentially microbatching to the minute by minute level.
If you want to do more frequent updates, your best option is to keep a warehouse running and either submit UPDATE or COPY queries to it.
If you don't require sub-minute latency, you should use Snowpipe and potentially a tool like Kinesis Firehose to batch up records into a single file that drops into S3 once per minute.

Google Stackdriver Profling: How to understand this profile

I have attached two Google Stackdriver Profilers for our backend server.
The backend api simply trying to read from Memcache first, if it doesn't exist or has timeout error, then retrieve the data from Big Table.
Based on the wall time and CPU time profiles, I'd like to know
Why the libjvm_so process consumes that much (45.8%) of CPU time, is it because the server allocates lots of memory that cause garbage collection using a lot of CPU?
The wall time profile shows 97% of thread time is waiting for a resource, I guess it's waiting for Memcache/big table server to return the data, is it true? Does this mean the server doesn't need that much CPU? (currently 16 CPUs, with an average load of 35%)
Any other insights? Where can be improved etc?

How to make Flink job with huge state finish

We are running a Flink cluster to calculate historic terabytes of streaming data. The data calculation has a huge state for which we use keyed states - Value and Map states with RocksDb backend. At some point in the job calculation the job performance starts degrading, input and output rates drop to almost 0. At this point exceptions like 'Communication with Taskmanager X timeout error" can be seen in the logs, however the job is compromised even before.
I presume the problem we are facing has to the with the RocksDb's disk backend. As the state of the job grows it needs to access the Disk more often which drags the performance to 0. We have played with some of the options and have set some which make sense for our particular setup:
We are using the SPINNING_DISK_OPTIMIZED_HIGH_MEM predefined profile, further optimized with optimizeFiltersForHits and some other options which has somewhat improved performance. However not of this can provide a stable computation and on a job re-run against a bigger data set the job halts again.
What we are looking for is a way to modify the job so that it progresses at SOME speed even when the input and the state increases. We are running on AWS with limits set to around 15 GB for Task Manager and no limit on disk space.
using SPINNING_DISK_OPTIMIZED_HIGH_MEM will cost huge off-heap memory by memtable of RocksDB, Seeing as you are running job with memory limitation around 15GB, I think you will encounter the OOM issue, but if you choose the default predefined profile, you will face the write stall issue or CPU overhead by decompressing the page cache of Rocksdb, so I think you should increase the memory limitation.
and here are some post about Rocksdb FYI:
https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB
https://www.ververica.com/blog/manage-rocksdb-memory-size-apache-flink

How much RealTime is Elasticsearch, Solr and DSE realtime search?

From some last couple of weeks, I have been working around Elasticsearch and Solr, and trying to do OLTP processing in real time. However, what comes to me is they claims(especially ES) to be real time. The meaning of real time looks a lot fuzzy to me.
If we go deep into it, both ES and Solr, defines a refresh rate or a soft-commit rate, after which the newly indexed documents would be available for search, effectively providing only Near-Real time capabilities.
It looks like by Real time search, it is either a marketing statement to call it real time, or they make the word fuzzy by talking about Real Time Search rather than batch or analytical processing.
Am I correct, or correct me if I am wrong, and there is a real-time search possible in a typical OLTP system, where every transaction has search visibility to last document ?
Elasticsearch is a Near Real Time search engine for search. Elasticsearch is Real Time for operations like Create, Update, Delete and Get.
By default, refresh is 1 second. In some use cases, it could appear as real time. For example, I was working for a french gov service and we were producing statistics per day. So for our use case, it was somehow real time from our perspective.
For logs for example, 1 second is enough in most use cases.
You can modify this default value but it comes with a cost.
If you really need real time, then you probably want to use a SQL database.
My 2 cents.
Yes, DSE Search is indeed Near real-time and has not yet achieved the mythical goal of absolute zero latency. But... even traditional Real real-time is not real-time once you factor in the time to do the actual database update, plus the fact that a lot of traditional database updates are batch-oriented, or even if the actual update operation is not batched, there is likely to be some human process that delays the start of the database update from the original source of a data change.
Also keep in mind that the latency of a database update needs to include maintaining the required (tunable) consistency for replicating data updates in the cluster.
Rather than push you back towards SQL if you want real-time, I would challenge you to fully justify the true latency requirements of the app. For example, with complex distributed applications you need to be prepared for occasional resource outages, such as network delays, so that it is usually much better to design a modern distributed application to be a lot more flexible and asynchronous than a traditional, synchronous, fragile (think HealthCare.gov) app architecture that improperly depends on a perception of zero-latency distributed operations.
Finally, we are working on enhancements to reduce the actual latency of database updates, coupled with ongoing improvements in hardware performance that further shrink the update latency window.
But ultimately, all computing real-time measures will have some non-zero latency and modern distributed apps must be designed for at least some degree of decoupling between database updates and absolute dependency on those updates.
Worst case scenario, apps that need to synchronize with database updates may need to implement a polling strategy to wait for the update to complete.
ElasticSearch has real time features for CRUD operations. On GET operations, it checks the Transaction log, to look for any uncommitted changes and return the most relevant document.
The Percolator feature enables realtime in search queries as well. It allows you to register queries (percolation), that will be used at indexing time to return matching documents to those predefined queries.
This workflow looks like this:
Register specific query (percolation) in Elasticsearch
Index new content (passing a flag to trigger percolation)
The response to the indexing operation will contain the matched percolations
A very good blog with live example that explains the Percolator concept:
http://blog.qbox.io/elasticsesarch-percolator

Determine time spent waiting for stats to be updated - SQL05/08

Does anyone know how to specifically identify the portion of the overall compilation time that any queries spent waiting on statistics (after stats are deemed stale) to be updated in SQL 2005/2008? (I do not desire to turn on the async thread to update stats in the background just in case that point of conversation comes up). Thanks!
Quantum,
I doubt that level of detail and granularity is exposed in SQL Server. What is the real question here? Are you trying to gauge how long it takes for the queries to re-compile when the stats are deemed stale to the normal compilation time? Is this a one off request or are you planning to put something in production and measure the difference over a period of time?
If it is former then, you can get that info by figuring out the time taken individually (set statistics time on) and combing them together. If it is latter then I am NOT sure there is anything that is currently available in SQL Server.
PS: I haven't checked Extended Events (in DENALI) in detail for this activity but there could be something there for you. You may want to check that out if you are really interested.

Resources