I'm starting a connection to a CnosDB cluster but would like to increase the concurrency of the client. Having found this parameter from the doc:
-t --target-partitions Optional; the number of slices to execute the query, increasing which can increase concurrency. Not specified by default.
If not specified, what is the default concurrency level? 1 or other number?
When the client starts without target partions, it checks the number of cpu cores on the computer and sets it to this value.We'll update the document later
Related
I have six node solr cluster and every node having 200GB of storage, we created one collection with two shards.
I like to know what will happen if my document reached 400GB (node1-200GB,node-2 200GB) ? is solr automatically use another free node from my cluster ?
If my document reached 400GB (node1-200GB,node-2 200GB) ?
Ans: I am not sure about what exacly error you may get, however in production you should try not to face this situation. To avoid/handle such scenarios we have monitoring/autoscaling triggers apis.
Is solr automatically use another free node from my cluster ?
Ans: No, Extra shards will not be added automatically. However whenever you observe that search is getting slow or if solr is crossing physical limitations of machines then you should go for splitShard .
So ultimately you can handle this with autscaling triggers. That is you can set autscaling triggers to identify whether a shard is crossing specified limits about the number of document or size of the index etc. Once this limits reaches this trigger can call splitShard
This link mentions
This trigger can be used for monitoring the size of collection shards,
measured either by the number of documents in a shard or the physical
size of the shard’s index in bytes.
When either of the upper thresholds is exceeded the trigger will
generate an event with a (configurable) requested operation to perform
on the offending shards - by default this is a SPLITSHARD operation.
I am using hikari cp with spring boot app which has more that 1000 concurrent users.
I have set the max pool size-
spring.datasource.hikari.maximum-pool-size=300
When i look at the processlist of mysql using
show processlist;
It shows max 300 which is equal to the pool size.It never increases than max pool.Is this intened?
I thought pool size means connections maintained so that the connections can be reused when future requests to the database are required but when need comes more connections can be made.
Also when I am removing the max pool config ,I immediately get-
HikariPool-0 - Connection is not available, request timed out after 30000ms.
How to resolve this problem.Thanks in advance.
Yes, it's intended. Quoting the documentation:
This property controls the maximum size that the pool is allowed to reach, including both idle and in-use connections. Basically this value will determine the maximum number of actual connections to the database backend. A reasonable value for this is best determined by your execution environment. When the pool reaches this size, and no idle connections are available, calls to getConnection() will block for up to connectionTimeout milliseconds before timing out. Please read about pool sizing. Default: 10
So basically, when all 300 connections are in use, and you are trying to make your 301st connection, Hikari won't create a new one (as maximumPoolSize is the absolute maximum), but it will rather wait (by default 30 seconds) until a connection is available again.
This also explains why you get the exception you mentioned, because the default (when not configuring a maximumPoolSize) is 10 connections, which you'll probably immediately reach.
To solve this issue, you have to find out why these connections are blocked for more than 30 seconds. Even in a situation with 1000 concurrent users, there should be no problem if your query takes a few milliseconds or a few seconds at most.
Increasing the pool size
If you are invoking really complex queries that take a long time, there are a few possibilities. The first one is to increase the pool size. This however is not recommended, as the recommended formula for calculating the maximum pool size is:
connections = ((core_count * 2) + effective_spindle_count)
Quoting the About Pool Sizing article:
A formula which has held up pretty well across a lot of benchmarks for years is
that for optimal throughput the number of active connections should be somewhere
near ((core_count * 2) + effective_spindle_count). Core count should not include
HT threads, even if hyperthreading is enabled. Effective spindle count is zero if
the active data set is fully cached, and approaches the actual number of spindles
as the cache hit rate falls. ... There hasn't been any analysis so far regarding
how well the formula works with SSDs.
As described within the same article, that means that a 4 core server with 1 hard disk should only have about 10 connections. Even though you might have more cores, I'm assuming that you don't have enough cores to warrant the 300 connections you're making, let alone increasing it even further.
Increasing connection timeout
Another possibility is to increase the connection timeout. As mentioned before, when all connections are in use, it will wait for 30 seconds by default, which is the connection timeout.
You can increase this value so that the application will wait longer before going in timeout. If your complex query takes 20 seconds, and you have a connection pool of 300 and 1000 concurrent users, you should theoretically configure your connection timeout to be at least 20 * 1000 / 300 = 67 seconds.
Be aware though, that means that your application might take a long time before showing a response to the user. If you have a 67 second connection timeout and an additional 20 seconds before your complex query completes, your user might have to wait up to a minute and a half.
Improve execution time
As mentioned before, your primary goal would be to find out why your queries are taking so long. With a connection pool of 300, a connection timeout of 30 seconds and 1000 concurrent users, it means that your queries are taking at least 9 seconds before completing, which is a lot.
Try to improve the execution time by:
Adding proper indexes.
Writing your queries properly.
Improve database hardware (disks, cores, network, ...)
Limit the amount of records you're dealing with by introducing pagination, ... .
Divide the work. Take a look to see if the query can be split into smaller queries that result in intermediary results that can then be used in another query and so on. As long as you're not working in transactions, the connection will be freed up in between, allowing you to serve multiple users at the cost of some performance.
Use caching
Precalculate the results: If you're doing some resource-heavy calculation, you could try to pre-calculate the results during a moment that the application isn't used as often, eg. at night and store those results in a different table that can be easily queried.
...
I've been reading a bit on MaxDOP and have run into a question that I cant seem to find an answer for. If MaxDOP is set to a value, lets say 8, does that mean that SQL Server will always spin-up 8 threads on the parallel activities in the query, or could it decide to use less threads for a particular operator?
It boils down to: Is too many threads a performance concern if the workload is small (OLTP) and MaxDOP has been set too high?
A hint to the correct DMW would be nice. I got lost in DMW land, again.
The short answer is: SQL Server will dynamically decide to use a parallel execution of the query, but will not exceed the maximum degree of parallelity (MAXDOP) that you have indicated.
The following article has some more detailed information: How It Works: Maximizing Max Degree Of Parallelism (MAXDOP). I'll just cite a part of it here:
There are several stages to determining the degree of parallelism (MAXDOP) a query can utilize.
Stage 1 – Compile
During complication SQL Server considers the hints, sp_configure and resource workgroup settings to see if a parallel plan should even be considered. Only if the query operations allow parallel execution:
If hint is present and > 1 then build a parallel plan
else if no hint or hint (MAXDOP = 0)
if sp_configure setting is 1 but workload group > 1 then build a parallel plan
else if sp_configure setting is 0 or > 1 then build parallel plan
Stage 2 – Query Execution
When the query begins execution the runtime, degree of parallelism is determined. This involves many factors, already outlined in SQL Server Books Online: http://technet.microsoft.com/en-US/library/ms178065(v=SQL.105).aspx
Before SQL Server looks at the idle workers and other factors it determines the target for the degree of parallelism.
[... see details in article ...]
If still 0 after the detailed calculations it is set to 64 (default max for SQL Server as documented in Books Online.) [...] SQL Server hard codes the 64 CPU target when the runtime target of MAXDOP is still 0 (default.)
The MAXDOP target is now adjusted for:
Actual CPU count (affinity settings from sp_configure and the resource pool).
Certain query types (index build for example) look at the partitions
Other query type limitations that may exist
Now SQL Server takes a look at the available workers (free workers for query execution.) You can loosely calculate the free worker count on a scheduler using (Free workers = Current_workers_count – current_tasks_count) from sys.dm_os_schedulers.
Once the target is calculated the actual is determined by looking at the available resources to support a parallel execution. This involves determining the node(s) and CPUs with available workers.
[...]
The worker location information is then used to target an appropriate set of CPUs to assign the parallel task to.
Using XEvents you can monitor the MAXDOP decision logic. For example:
XeSqlPkg::calculate_dop_begin
XeSqlPkg::calculate_dop
You can monitor the number of parallel workers by querying: sys.dm_os_tasks
It is only used to limit the max number of threads allowed per request:
https://msdn.microsoft.com/en-us/library/ms189094.aspx
So if SQL thinks using one thread is fastest it will just use one.
Generally on an OLTP system you will keep this on the low side. On large warehouse DB's you may want to keep a higher number.
unless you are seeing specific problems I wouldn't change it unless you are confident of the outcome.
SQL Server can also decide to use less threads, you can see them from the actual plan with the number of rows handled by each thread. The maximum of threads is also for each of the parallel sections, and one query can have more than one section.
In addition to MAXDOP there is setting "cost threshold for parallelism" which decides if parallel plan is even considered for a query.
I am planning to decide on how many nodes should be present on Kafka Cluster. I am not sure about the parameters to take into consideration. I am sure it has to be >=3 (with replication factor of 2 and failure tolerance of 1 node).
Can someone tell me what parameters should be kept in mind while deciding the cluster size and how they effect the size.
I know of following factors but don't know how it quantitatively effects the cluster size. I know how it qualitatively effect the cluster size. Is there any other parameter which effects cluster size?
1. Replication factor (cluster size >= replication factor)
2. Node failure tolerance. (cluster size >= node-failure + 1)
What should be cluster size for following scenario while consideration of all the parameters
1. There are 3 topics.
2. Each topic has messages of different size. Message size range is 10 to 500kb. Average message size being 50kb.
3. Each topic has different partitions. Partitions are 10, 100, 500
4. Retention period is 7 days
5. There are 100 million messages which gets posted every day for each topic.
Can someone please point me to relevant documentation or any other blog which may discuss this. I have google searched it but to no avail
As I understand, getting good throughput from Kafka doesn't depend only on the cluster size; there are others configurations which need to be considered as well. I will try to share as much as I can.
Kafka's throughput is supposed to be linearly scalabale with the numbers of disk you have. The new multiple data directories feature introduced in Kafka 0.8 allows Kafka's topics to have different partitions on different machines. As the partition number increases greatly, so do the chances that the leader election process will be slower, also effecting consumer rebalancing. This is something to consider, and could be a bottleneck.
Another key thing could be the disk flush rate. As Kafka always immediately writes all data to the filesystem, the more often data is flushed to disk, the more "seek-bound" Kafka will be, and the lower the throughput. Again a very low flush rate might lead to different problems, as in that case the amount of data to be flushed will be large. So providing an exact figure is not very practical and I think that is the reason you couldn't find such direct answer in the Kafka documentation.
There will be other factors too. For example the consumer's fetch size, compressions, batch size for asynchronous producers, socket buffer sizes etc.
Hardware & OS will also play a key role in this as using Kafka in a Linux based environment is advisable due to its pageCache mechanism for writing data to the disk. Read more on this here
You might also want to take a look at how OS flush behavior play a key role into consideration before you actually tune it to fit your needs. I believe it is key to understand the design philosophy, which makes it so effective in terms of throughput and fault-tolerance.
Some more resource I find useful to dig in
https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
http://blog.liveramp.com/2013/04/08/kafka-0-8-producer-performance-2/
https://grey-boundary.io/load-testing-apache-kafka-on-aws/
https://cwiki.apache.org/confluence/display/KAFKA/Performance+testing
I had recently worked with kafka and these are my observations.
Each topic is divided into partitions and all the partitions of a topic are distributed across kafka brokers; first of all these help to save topics whose size is larger than the capacity of a single kafka broker and also they increase the consumer parallelism.
To increase the reliability and fault tolerance,replications of the partitions are made and they do not increase the consumer parallelism.The thumb rule is a single broker can host only a single replica per partition. Hence Number of brokers must be >= No of replicas
All partitions are spread across all the available brokers,number of partitions can be irrespective of number of brokers but number of partitions must be equal to the number of consumer threads in a consumer group(to get best throughput)
The cluster size should be decided keeping in mind the throughput you want to achieve at consumer.
The total MB/s per broker would be:
Data/Day = (100×10^6 Messages / Day ) × 0.5MB = 5TB/Day per Topic
That gives us ~58MB/s per Broker. Assuming that the messages are equally split between partitions, for the total cluster we get: 58MB/s x 3 Topics = 178MB/s for all the cluster.
Now, for the replication, you have: 1 extra replica per topic. Therefore this becomes 58MB/sec/broker INCOMING original data + 58MB/sec/broker OUTGOING replication data + 58MB/sec/broker INCOMING replication data.
This gets about ~136MB/s per broker ingress and 58MB/s per broker egress.
The systems load will get very high and this is without taking into consideration any stream processing.
The system load could be handled by increasing the number of brokers and splitting your topics to more specific partitions.
If your data are very important, then you may want a different (high) replication factor. Fault tolerance is also an important factor for deciding the replication.
For example, if you had very very important data, apart from the N active brokers (with the replicas) that are managing your partitions, you may require to add stand-by followers in different areas.
If you require very low latency, then you may want to further increase your partitions (by adding additional keys). The more keys you have, the fewer messages you will have on each partition.
For low latency, you may want a new cluster (with the replicas) that manages only that special topic and no additional computation is done to other topics.
If a topic is not very important, then you may want to lower the replication factor of that particular topic and be more elastic to some data loss.
When building a Kafka cluster, the machines supporting your infrastructure should be equally capable. That is since the partitioning is done with round-robin style, you expect that each broker is capable of handling the same load, therefore the size of your messages does not matter.
The load from stream processing will also have a direct impact. A good software to manage your kafka monitor and manage your streams is Lenses, which I personally favor a lot since it does an amazing work with processing real-time streams
What are the differences between LOG_CHECKPOINT_INTERVAL and LOG_CHECKPOINT_TIMEOUT? I need a clear picture of volume based intervals and time based interval. What are the relations among LOG_CHECKPOINT_TIMEOUT,LOG_CHECKPOINT_INTERVAL and FAST_START_IO_TARGET?
A checkpoint is when the database synchronizes the dirty blocks in the buffer cache with the datafiles. That is, it writes changed data to disk. The two LOG_CHECKPOINT parameters you mention govern how often this activity occurs.
The heart of the matter is: if the checkpoint occurs infrequently it will take longer to recover the database in the event of a crash, because it has to apply lots of data from the redo logs. On the other hand, if the checkpoint occurs too often the database can be tied up as various background processes become a bottleneck.
The difference between the two is that the INTERVAL specifies the maximum amount of redo blocks which can exist between checkpoints and the TIMEOUT specifies the maximum number of seconds between checkpoints. We need to set both parameters to cater for spikes of heavy activity. Note that LOG_CHECKPOINT_INTERVAL is measured in OS blocks not database blocks.
FAST_START_IO_TARGET is a different proposition. It specifies a target for the number of I/Os required to recover the database. The database then manages its checkpoints intelligently to achieve this target. Again, this is a trade-off between recovery times and the amount of background activity, although the impact on normal processing should be less than badly set LOG_CHECKPOINT paremeters. This parameter is only available withe the Enterprise Edition. It was deprecated in 9i in favour of FAST_START_MTTR_TARGET, and Oracle removed it in 10g. There is a view V$MTTR_TARGET_ADVICE which, er, provides advice on setting the FAST_START_MTTR_TARGET.
We should set either the FAST_START%TARGET or the LOG_CHECKPOINT_% parameters but not both. Setting the LOG_CHECKPOINT_INTERVAL will override the setting of FAST_START_MTTR_TARGET.