Is there a way to enforce a steady poll rate using the google-cloud-pubsub client?. I want to avoid scenarios where if there is spike in the publish rate, the pull request rate also tend to increase.
The client provides FlowControl settings, by setting the maxOutstanding messages. From my understanding, it sets the max batch size during a pull operation.
I want to understand how to create a constant pull rate, say 1000 RPS.
Message Flow Control can be used to set the maximum number of messages being processed at a given time (i.e., setting max_messages in the case of the python client), which indirectly sets the maximum rate at which messages are received.
While it doesn’t allow you to directly set the exact number of messages received per second (that would depend on the time it takes to process a message and the number of messages being processed), it should avoid scenarios where you get a spike in publish rate.
If you really need to set a rate in messages received per second, AFAIK it’s not made available directly on the client libraries, so you’d have to implement it yourself using an asynchronous pull and using some timers to acknowledge the messages at your desired rate.
Related
I'm using KDA with a flink job which should analyse messages emitted by a different IOT device sources. There is a kinesis stream with 4 shards with each of them contains more or less the same amount of data (there are no hot shards). The kinesis stream gets filled by AWS Greengrass Streammanager which is using an increasing sequence number as partition key. Each message contains a single value (something like temperature = 5).
As with this setup the stream read by the kinesis consumer in flink is unordered. But I need to preserve the order of the messages. To do so I have written a small buffer function which is more or less the logic from CepOperator to buffer messages and restore the order. Therefore the stream is keyed by the id of a message. Let's say a temperature message has always a unique id and therefore the stream is keyed by this id.
To create the respective watermarks I'm using the FlinkKinesisConsumer and register there a BoundedOutOfOrdernessTimestampExtractor. If I now use a out-of-orderness time of 10 seconds everything works fine except that I have almost 50% of late arrivals which is not the desired behaviour. But now if I increase the time to 60 seconds the iterator of the kinesis stream falls significantly behind (linear growing over time). The documentation of the Kinesis Consumer does say a little about the settings here. I have also tried to register a JobManagerWatermarkTracker but it seems that it does not change the behaviour.
I do not understand the circumstances why a higher out of orderness leads the iterator to fall behind increasingly but a smaller time setting drops a significant amount of messages. What measures do I need to take to find the proper settings or is my implementation wrong?
UPDATE:
While investigating the issue I have found that if the JobManagerWatermarkTracker isn't properly configured (I still don't understand how to configure) the alignment to the global watermark stops subtasks from reading from the kinesis stream which causes the iterator to fall back. I have calculated a delta how much "latency" a dropped event has and set this as and out-of-orderness (in this case 60 secs). With deactivating the JobManagerWatermarkTracker everything work as expected.
Furthermore it seems that the AWS Greengrass Streammanager isn't optimal for such use cases as it distributes the load evenly across shards but with an increasing number of shards this isn't optimal since one temperature datapoint might be spread across all shards of a stream. That introduces a lot unnecessary latency. I appreciate any input howto configure the JobManagerWatermarkTracker
We are using Flink 1.9.1.
We have a source, a process function, and a sink. The application consumes and produces to kinesis.
The input rate (produced by a simulator) is 20 events per second. The per second output rate for the process function shows 14 per second. The back pressure metrics for the source is shown as OK (green). The event count (Number of events sent by the source) and the number of events received by the process function also match with very little delay.
But this count does not match the event count pushed by the simulator. This count matches the 14 per second rate.
Now my question is, does Flink regulate the input rate automatically?
In my case, how is the input rate controlled at 14 per second?
If it is not, is there any other metric that I should be looking at that I'm missing?
It's not possible to force a Flink pipeline to consume events at a particular rate. By design, there is limited buffering in the network stack, and the slowest task in the execution graph will dictate the rate at which the pipeline will consume and process events.
The back pressure monitoring (that green OK signal) is not a definitive guide to whether back pressure is occuring. So long as the job is able to make steady forward progress, it probably won't indicate that there's a problem. You could examine some of the network queue metrics to get more insight: e.g., inPoolUsage, outPoolUsage, inputQueueLength. See Flink Network Stack Vol. 2: Monitoring, Metrics, and that Backpressure Thing for a lot more on this topic.
20 events per second seems very slow, so I am a bit surprised that something can't keep up with that rate, but that appears to be what's happening.
how can we keep fixed number of active concurrent users/requests at time for a scenario.
I have an unique testing problem where I am required to do the performance testing of services with fixed number of request at a given moment for a given time periods like 10 minutes or 30 minutes or 1 hour.
I am not looking for per second thing, what I am looking for is that we start with N number of requests and as any of request out of N requests completes we add one more so that at any given moment we have N concurrent requests only.
Things which I tried are rampUsers(100) over 10 seconds but what I see is sometimes there are more than 50 users at a given instance.
constantUsersPerSec(20) during (1 minute) also took the number of requests t0 50+ for sometime.
atOnceUsers(20) seems related but I don't see any way to keep it running for given number of seconds and adding more requests as previous ones completes.
Thankyou community in advance, expecting some direction from your side.
There is a throttling mechanism (https://gatling.io/docs/3.0/general/simulation_setup/#throttling) which allow you to set max number of requests, but you must remember that users are injected to simulation independently of that and you must inject enough users to produce that max number of request, without that you will end up with lower req/s. Also users that will be injected but won't be able to send request because of throttling will wait in queue for they turn. It may result in huge load just after throttle ends or may extend your simulation, so it is always better to have throttle time longer than injection time and add maxDuration() option to simulation setup.
You should also have in mind that throttled simulation is far from natural way how users behave. They never wait for other user to finish before opening page or making any action, so in real life you will always end up with variable number of requests per second.
Use the Closed Work Load Model injection supported by Gatling 3.0. In your case, to simulate and maintain 20 active users/requests for a minute, you can use an injection like,
Script.<Controller>.<Scenario>.inject(constantConcurrentUsers(20) during (60 seconds))
I want to make the "TRAP AGENT" library. The trap agent library keeps the tracks of the various parameter of the client system. If the parameter of the client system changes above threshold then trap agent library at client side notifies to the server about that parameter. For example, if CPU usage exceeds beyond threshold then it will notify the server that CPU usage is exceeded. I have to measure 50-100 parameters (like memory usage, network usage etc.) at client side.
Now I have the basic idea about the design, but I am stuck with the entire library design.
I have thought of below solutions:
I can create a thread for each parameter (i.e. each thread will monitor single parameter).
I can create a process for each parameter (i.e. each process will monitor single parameter).
I can classify the various parameters into the various groups, like data usage parameter will fall into network group, CPU memory usage parameter will fall into the system group, and then will create thread for each group.
Now 1st solution is looking good as compare to 2nd. If I am adopting 1st solution then it may fail when I want to upgrade my library for 100 to 1000 parameters. Because I have to create 1000 threads at that time, which is not good design (I think so; if I am wrong correct me.)
3rd solution is good, but response time will be high since many parameters will be monitored in single thread.
Is there any better approach?
In general, it's a bad idea to spawn threads 1-to-1 for any logical mapping in your code. You can quickly exhaust the available threads of the system.
In .NET this is very elegantly handled using thread pools:
Thread vs ThreadPool
Here is a C++ discussion, but the concept is the same:
Thread pooling in C++11
Processes are also high overhead on Windows. Both designs sound like they would ironically be quite taxing on the very resources you are trying to monitor.
Threads (and processes) give you parallelism where you need it. For example, letting the GUI be responsive while some background task is running. But if you are just monitoring in the background and reporting to a server, why require so much parallelism?
You could just run each check, one after the other, in a tight event loop in one single thread. If you are worried about not sampling the values as often, I'd say that's actually a benefit. It does no help to consume 50% CPU to monitor your CPU. If you are spot-checking values once every few seconds that is probably fine resolution.
In fact high resolution is of no help if you are reporting to a server. You don't want to denial-of-service-attack your server by doing a HTTP call to it multiple times a second once some value triggers.
NOTE: this doesn't mean you can't have a pluggable architecture. You could create some base class that represents checking a resource and then create subclasses for each specific type. Your event loop could iterate over an array or list of objects, calling each one successively and aggregating the results. At the end of the loop you report back to the server if any are out of range.
You may want to add logic to stop checking (or at least stop reporting back to the server) for some "cool down period" once a trap hits. You don't want to tax your server or spam your logs.
You can follow below methodology:
1.You can have two threads one thread is dedicated to measure emergency parameter and second thread monitors non emergency parameter.
hence response time for emergency parameter will be less.
2.You can define 3 threads.First thread will monitor the high priority(emergency parameter).Second thread will monitor the intermediate priority parameter. and last thread will monitor lowest priority parameter.
So overall response time will be improved as compared to first solution.
3.If response time is not concern then you can monitor all the parameters in single thread.But in this case response time becomes worst when you upgrade your library to monitor 100 to 1000 parameters.
So in 1st case there will be more response time for non emergency parameter.While in 3rd case there will be definitely very high response time.
So solution 2 is better.
Does ScraperWiki somehow automatically rate limit scraping, or should I add something like sleep(1 * random.random()) to the loop?
There is no automatic rate limiting. You can add a sleep command written in your language to add rate limiting.
Very few servers check for rate limiting, and usually servers containing public data don't.
It is, however, good practice to make sure you don't overrun the remote server. By default, scrapers only run in one thread, so there is a built in limit to the load you can produce.