Effects of programmatically enabling `Efficiency Mode` for services in Windows 11? - c

Suppose there's a service that's extremely busy during the day but generally idle at night.
Currently Task Manager shows Efficiency mode not enabled
However, applying the code changes below, Task Manager shows Efficiency mode enabled
It achieves this mode by applying these methods
First, the Efficiency mode lowers the process priority of background
tasks so that Windows does not allocate important resources to these
apps.
Second, it deploys something called EcoQoS, which is a Quality of
Service package that reduces the clock speed for efficient tasks.
To get the Efficiency mode to appear in the Task Manager, at a minimum these two are required (through trial and error):
Set process priority class to IDLE_PRIORITY_CLASS
Throttle CPU power with PROCESS_POWER_THROTTLING_EXECUTION_SPEED
#include <windows.h>
// Sets the process priority to IDLE_PRIORITY_CLASS.
void set_process_priority()
{
SetPriorityClass(GetCurrentProcess(), IDLE_PRIORITY_CLASS);
}
// Enables EcoQos to reduce the clock speed.
void enable_ecoqos()
{
PROCESS_POWER_THROTTLING_STATE PowerThrottling = { 0 };
PowerThrottling.Version = PROCESS_POWER_THROTTLING_CURRENT_VERSION;
PowerThrottling.ControlMask = PROCESS_POWER_THROTTLING_EXECUTION_SPEED;
PowerThrottling.StateMask = PROCESS_POWER_THROTTLING_EXECUTION_SPEED;
SetProcessInformation(GetCurrentProcess(), ProcessPowerThrottling, &PowerThrottling, sizeof(PowerThrottling));
}
int main(int argc, char* argv[])
{
set_process_priority();
enable_ecoqos();
// Process is now running in Efficiency mode...
return 0;
}
Question
Will enabling Efficiency mode cause degraded performance issues during the day when the service is very busy? (Trying to lower the costs of CPU/Memory usage and make it more Green Software friendly).
Are there other efficiency options that could be enabled to improve the overall Efficiency mode?

Because Efficiency Mode reduces the resources allocated to a service and reduces its priority relative to other running services, switching a process to Efficiency Mode would likely reduce its overall performance.
Whether that matters or not depends on many factors. If the service is well-designed, it should still perform adequately, however. The operating system typically defers to services that are in high-demand.
The point of reducing resources is not to achieve performance gains (to those processes having higher priorities); it is to keep the system responsive. It doesn't make much sense for a background process to consume large amounts of resources if it spends most of its time idle.
I once worked in a company that had a server which polled a SQL Server endpoint ten times per second. As you can imagine, this is quite a waste if the server sits idle most of the day and does the bulk of its work at night. I changed the code so that after six seconds of inactivity, it reduced the polling interval to once per second. After another minute of inactivity, it reduced the polling to once per minute. If a poll produced a request for activity, the interval went back up to 10 per second.
This had the effect of eliminating most of the wasteful activity, while still making the server responsive during its busy times.

Related

Does max connection pool also limits max connections to database?

I am using hikari cp with spring boot app which has more that 1000 concurrent users.
I have set the max pool size-
spring.datasource.hikari.maximum-pool-size=300
When i look at the processlist of mysql using
show processlist;
It shows max 300 which is equal to the pool size.It never increases than max pool.Is this intened?
I thought pool size means connections maintained so that the connections can be reused when future requests to the database are required but when need comes more connections can be made.
Also when I am removing the max pool config ,I immediately get-
HikariPool-0 - Connection is not available, request timed out after 30000ms.
How to resolve this problem.Thanks in advance.
Yes, it's intended. Quoting the documentation:
This property controls the maximum size that the pool is allowed to reach, including both idle and in-use connections. Basically this value will determine the maximum number of actual connections to the database backend. A reasonable value for this is best determined by your execution environment. When the pool reaches this size, and no idle connections are available, calls to getConnection() will block for up to connectionTimeout milliseconds before timing out. Please read about pool sizing. Default: 10
So basically, when all 300 connections are in use, and you are trying to make your 301st connection, Hikari won't create a new one (as maximumPoolSize is the absolute maximum), but it will rather wait (by default 30 seconds) until a connection is available again.
This also explains why you get the exception you mentioned, because the default (when not configuring a maximumPoolSize) is 10 connections, which you'll probably immediately reach.
To solve this issue, you have to find out why these connections are blocked for more than 30 seconds. Even in a situation with 1000 concurrent users, there should be no problem if your query takes a few milliseconds or a few seconds at most.
Increasing the pool size
If you are invoking really complex queries that take a long time, there are a few possibilities. The first one is to increase the pool size. This however is not recommended, as the recommended formula for calculating the maximum pool size is:
connections = ((core_count * 2) + effective_spindle_count)
Quoting the About Pool Sizing article:
A formula which has held up pretty well across a lot of benchmarks for years is
that for optimal throughput the number of active connections should be somewhere
near ((core_count * 2) + effective_spindle_count). Core count should not include
HT threads, even if hyperthreading is enabled. Effective spindle count is zero if
the active data set is fully cached, and approaches the actual number of spindles
as the cache hit rate falls. ... There hasn't been any analysis so far regarding
how well the formula works with SSDs.
As described within the same article, that means that a 4 core server with 1 hard disk should only have about 10 connections. Even though you might have more cores, I'm assuming that you don't have enough cores to warrant the 300 connections you're making, let alone increasing it even further.
Increasing connection timeout
Another possibility is to increase the connection timeout. As mentioned before, when all connections are in use, it will wait for 30 seconds by default, which is the connection timeout.
You can increase this value so that the application will wait longer before going in timeout. If your complex query takes 20 seconds, and you have a connection pool of 300 and 1000 concurrent users, you should theoretically configure your connection timeout to be at least 20 * 1000 / 300 = 67 seconds.
Be aware though, that means that your application might take a long time before showing a response to the user. If you have a 67 second connection timeout and an additional 20 seconds before your complex query completes, your user might have to wait up to a minute and a half.
Improve execution time
As mentioned before, your primary goal would be to find out why your queries are taking so long. With a connection pool of 300, a connection timeout of 30 seconds and 1000 concurrent users, it means that your queries are taking at least 9 seconds before completing, which is a lot.
Try to improve the execution time by:
Adding proper indexes.
Writing your queries properly.
Improve database hardware (disks, cores, network, ...)
Limit the amount of records you're dealing with by introducing pagination, ... .
Divide the work. Take a look to see if the query can be split into smaller queries that result in intermediary results that can then be used in another query and so on. As long as you're not working in transactions, the connection will be freed up in between, allowing you to serve multiple users at the cost of some performance.
Use caching
Precalculate the results: If you're doing some resource-heavy calculation, you could try to pre-calculate the results during a moment that the application isn't used as often, eg. at night and store those results in a different table that can be easily queried.
...

Configuring a task queue and instance for non urgent work

I am using an F4 instance (because of memory needs) with automatic scheduling to do some background processing. It is run from a task queue. It takes 40s to 60s to complete each invocation. Because of the high memory needs, each instance should only handle one request at a time.
The action that needs to be done is not urgent. If it doesn't get scheduled for 30 minutes that isn't a problem. Even 60 minutes is acceptable and I'd rather make use of that time rather than spin up more instances. However, if the service gets popular and the is getting more than 60 requests an hour I want to spin up more instances to make sure there isn't more than a 60 minute wait.
I am having trouble figuring out how to configure the instance and queue parameters to keep my costs down but be able to scale in that way. My initial thought was something like this:
<queue>
<name>non-urgent-queue</name>
<target>slow-service</target>
<rate>1/m</rate>
<bucket-size>1</bucket-size>
<max-concurrent-requests>1</max-concurrent-requests>
</queue>
<automatic-scaling>
<min-idle-instances>0</min-idle-instances>
<max-idle-instances>0</max-idle-instances>
<min-pending-latency>20m</min-pending-latency>
<max-pending-latency>1h</max-pending-latency>
<max-concurrent-requests>1</max-concurrent-requests>
</automatic-scaling>
First of all those latency settings are invalid, but I can't find documentation on the valid range or units. Can anyone direct me to that info?
Secondly, if I understand the queue settings correctly, this configuration would limit it to 60 invocations an hour getting to the service, even if the task queue had 60+ jobs waiting.
Thanks for your help!
Indeed, throttling at the queue level basically defeats the ability to scale when needed. So you can't use the <rate> in the queue configuration at the values you have right now, you need to use the value matching the maximum rate you're willing to accept (with you max number of instances running simultaneously):
the max rate of requests that can go through the queue being limited at 1/min means you can't scale above 60/h
the <bucket-size> set at 1 means no peaks above the rate can be handled (as soon as one task starts the token bucket empties).
the <max-concurrent-requests> set at 1 will basically prevent multiple instances dealing simultaneouly with the queued workload. They may be started by the autoscaler because of the request latencies, but they won't be able to help since only one queue task can be handled at a time.
In the <automatic-scaling> section the <max-concurrent-requests> set to 1 is good - this ensures no instance handles more than 1 request at a time - which is what you want.
The bad news is that the max values for the latencies appear to be 15s. At least when using the app.yaml config for python (but I think it's unlikely for that to differ across language sandboxes):
Error 400: --- begin server output ---
automatic_scaling.min_pending_latency (30s), must be in the range [0.010000s,15.000000s].
--- end server output ---
and
Error 400: --- begin server output ---
automatic_scaling.max_pending_latency (60s), must be in the range [0.010000s,15.000000s].
--- end server output ---
Which probably also explains why your 5m and 1h values aren't accepted - I used 30s and 60s and got the above errors.
This means you won't be able to use the autoscaling parameters to tune such a slow-moving processing like you desire.
The only alternative I can think of is to have 2 queues:
a fast one feeding just trigger tasks for the slow-service jobs, but which your service intercepts and saves in the datastore. Maybe performed by some faster service (you don't want these stuck behind a slow-service job execution as it can cause unnecessary instance launching. Maybe, depending on the rest of your implementation, you can replace this queue completely with just storing the job info in the datastore instead of enqueing tasks in the fast queue.
a slow one for the actual slow-service job execution tasks
You'd also have a cron job executing once a minute, checking how many triggers are pending in the datastore, decide how much to scale and enqueue the corresponding number of slow-service job tasks in the slow queue. The autoscaler would simply bring up the corresponding number of instances (if needed). Low latency autoscaling configs would be desirable in this case - you already decided how you want your app to scale.
This is how I ended up doing it. I use a slow queue and a fast queue configured like this:
<queue>
<name>slow-queue</name>
<target>pdf-service</target>
<rate>2/m</rate>
<bucket-size>1</bucket-size>
<max-concurrent-requests>1</max-concurrent-requests>
</queue>
<queue>
<name>fast-queue</name>
<target>pdf-service</target>
<rate>10/m</rate>
<bucket-size>1</bucket-size>
<max-concurrent-requests>5</max-concurrent-requests>
</queue>
The max-concurrent-requests in the slow queue ensures only one task will run at a time, so there will only be one instance active.
Before I post to the slow queue I check to see how many items are already on the queue. The result may not be totally reliable, but for my purposes it is sufficient. In java:
QueueStatistics queueStats = queue.fetchStatistics();
if(queueStats.getNumTasks()<30) {
//post to slow queue
} else {
//post to fast queue
}
So when my slow queue gets too full, I post to the fast queue which allows concurrent requests.
The instance is configured like this:
<automatic-scaling>
<min-idle-instances>0</min-idle-instances>
<max-idle-instances>automatic</max-idle-instances>
<min-pending-latency>15s</min-pending-latency>
<max-pending-latency>15s</max-pending-latency>
<max-concurrent-requests>1</max-concurrent-requests>
</automatic-scaling>
So it will create new instances as slowly as possible (15s is the max latency) and make sure only one process runs on an instance at a time.
With this configuration I'll have a max of 6 instances at a time but that should do about 500/hr. I could increase the rate and concurrent requests to do more.
The negative of this solution is an element of unfairness. Under heavy load, some tasks will be stuck in the slow queue while others will get processed more quickly in the fast queue.
Because of that, I have decreased the max items on the slow queue to 13 so the unfairness won't be so extreme, maybe a 10 minute wait for jobs that go to the slow queue when it is full.

Library design methodology

I want to make the "TRAP AGENT" library. The trap agent library keeps the tracks of the various parameter of the client system. If the parameter of the client system changes above threshold then trap agent library at client side notifies to the server about that parameter. For example, if CPU usage exceeds beyond threshold then it will notify the server that CPU usage is exceeded. I have to measure 50-100 parameters (like memory usage, network usage etc.) at client side.
Now I have the basic idea about the design, but I am stuck with the entire library design.
I have thought of below solutions:
I can create a thread for each parameter (i.e. each thread will monitor single parameter).
I can create a process for each parameter (i.e. each process will monitor single parameter).
I can classify the various parameters into the various groups, like data usage parameter will fall into network group, CPU memory usage parameter will fall into the system group, and then will create thread for each group.
Now 1st solution is looking good as compare to 2nd. If I am adopting 1st solution then it may fail when I want to upgrade my library for 100 to 1000 parameters. Because I have to create 1000 threads at that time, which is not good design (I think so; if I am wrong correct me.)
3rd solution is good, but response time will be high since many parameters will be monitored in single thread.
Is there any better approach?
In general, it's a bad idea to spawn threads 1-to-1 for any logical mapping in your code. You can quickly exhaust the available threads of the system.
In .NET this is very elegantly handled using thread pools:
Thread vs ThreadPool
Here is a C++ discussion, but the concept is the same:
Thread pooling in C++11
Processes are also high overhead on Windows. Both designs sound like they would ironically be quite taxing on the very resources you are trying to monitor.
Threads (and processes) give you parallelism where you need it. For example, letting the GUI be responsive while some background task is running. But if you are just monitoring in the background and reporting to a server, why require so much parallelism?
You could just run each check, one after the other, in a tight event loop in one single thread. If you are worried about not sampling the values as often, I'd say that's actually a benefit. It does no help to consume 50% CPU to monitor your CPU. If you are spot-checking values once every few seconds that is probably fine resolution.
In fact high resolution is of no help if you are reporting to a server. You don't want to denial-of-service-attack your server by doing a HTTP call to it multiple times a second once some value triggers.
NOTE: this doesn't mean you can't have a pluggable architecture. You could create some base class that represents checking a resource and then create subclasses for each specific type. Your event loop could iterate over an array or list of objects, calling each one successively and aggregating the results. At the end of the loop you report back to the server if any are out of range.
You may want to add logic to stop checking (or at least stop reporting back to the server) for some "cool down period" once a trap hits. You don't want to tax your server or spam your logs.
You can follow below methodology:
1.You can have two threads one thread is dedicated to measure emergency parameter and second thread monitors non emergency parameter.
hence response time for emergency parameter will be less.
2.You can define 3 threads.First thread will monitor the high priority(emergency parameter).Second thread will monitor the intermediate priority parameter. and last thread will monitor lowest priority parameter.
So overall response time will be improved as compared to first solution.
3.If response time is not concern then you can monitor all the parameters in single thread.But in this case response time becomes worst when you upgrade your library to monitor 100 to 1000 parameters.
So in 1st case there will be more response time for non emergency parameter.While in 3rd case there will be definitely very high response time.
So solution 2 is better.

SQL Server 2008 Activity Monitor Resource Wait Category: Does Latch include CPU or just disk IO?

In SQL Server 2008 Activity Monitor, I see Wait Time on Wait Category "Latch" (not Buffer Latch) spike above 10,000ms/sec at times. Average Waiter Count is under 10, but this is by far the highest area of waits in a very busy system. Disk IO is almost zero and page life expectancy is over 80,000, so I know it's not slowed down by disk hardware and assume it's not even touching SAN cache. Does this mean SQL Server is waiting on CPU (i.e. resolving a bajillion locks) or waiting to transfer data from the local server's cache memory for processing?
Background: System is a 48-core running SQL Server 2008 Enterprise w/ 64GB of RAM. Queries are under 100ms in response time - for now - but I'm trying to understand the bottlenecks before they get to 100x that level.
Class Count Sum Time Max Time
ACCESS_METHODS_DATASET_PARENT 649629086 3683117221 45600
BUFFER 20280535 23445826 8860
NESTING_TRANSACTION_READONLY 22309954 102483312 187
NESTING_TRANSACTION_FULL 7447169 123234478 265
Some latches are IO, some are CPU, some are other resource. It really depends on which particular latch type you're seeing this. sys.dm_os_latch_stats will show which latches are hot in your deployment.
I wouldn't worry about the last three items. The two nesting_transaction ones look very healthy (low average, low max). Buffer is also OK, more or less, although the the 8s max time is a bit high.
The AM_DS_PARENT latch is related to parallel queries/parallel scans. Its average is OK, but the max of 45s is rather high. W/o going into too much detail I can tell that long wait time on this latch type indicate that your IO subsystem can encounter spikes (and the max 8s BUFFER latch waits corroborate this).

Handling multiple calls to BeginExecuteNonQuery in SQL Server 2008

I have an application that is receiving a high volume of data that I want to store in a database. My current strategy is to fire off an asynchronous call (BeginExecuteNonQuery) with each record when it's ready. I'm using the asynchronous call to ensure that the rest of the application runs smoothly.
The problem I have is that as the volume of data increases, eventually I get to the point where I'm trying to fire a command down the connection while it's still in use. I can see two possible options:
Buffer the pending data myself until the existing command is finished.
Open multiple connections as needed.
I'm not sure which of these options is best, or if in fact there is a better way. Option 1 will probably lead to my buffer getting bigger and bigger, while option 2 may be very bad form - I just don't know.
Any help would be appreciated.
Depending on your locking strategy, it may be worth using several connections but certainly not a number "without upper bounds". So a good strategy/pattern to use here is "thread pool", with each of N dedicated threads holding a connection and picking up write requests as the requests come and the thread finishes the previous one it was doing. Number of threads in the pool for best performance is best determined empirically, by benchmarking various possibilities in a realistic experimental/prototype setting.
If the "buffer" queue (in which your main thread queues write requests and the dedicated threads in the pool picks them up) grows beyond a certain threshold, it means you're getting data faster than you can possibly write it out, so, unless you can get more resources, you'll simply have to drop some of the incoming data -- maybe by a random-sampling strategy to avoid biasing future statistical analysis. Just count how much you're writing and how much you're having to drop due to the resource shortage in each period of time (say every minute or so), so you can use "stratified sampling" techniques in future data-mining explorations.
Thanks Alex - so you'd suggest a hybrid method then, assuming that I'll still need to buffer updates if all connections are in use?
(I'm the original poster, I've just managed to get two accounts without realizing)

Resources