Hystrix Configuration clarification: metrics.rollingStats.timeInMilliseconds - hystrix

I am trying to understand how the metrics.rollingStats.timeInMilliseconds and metrics.rollingStats.numBuckets work together.
If I have the following configuration:
circuitBreaker.requestVolumeThreshold=20
circuitBreaker.errorThresholdPercentage=50
metrics.rollingStats.timeInMilliseconds=10000
metrics.rollingStats.numBuckets=10
To me this means:
1) I need a min of 20 request in my window before a decision will be made
2) At or more than 50% of the requests will need to fail for the breaker to open
But how does the number of buckets come into play? Is the requestVolumeThreshHold and error threshold per bucket? I am trying to understand if/how the buckets are used in determining to trip the breaker.

I was able to answer my own question by creating a simple app that tests the circuit breaker.
The equation to determine when to open the circuit uses request volume threshold, error threshold percentage during the entire rolling stats timeInMilliseconds window. The buckets are only used for how/when the rolling window is updated.

Related

Efficient retrieval of lat-lon points that are within a square boundary

I have a react-native application that populates pins on a map that have been submitted by users. The front end gets the corners of the window and then the back end goes through each pin to check if it falls within the boundary, and returns the ones that do.
This is taking too long on the backend and I want to ask the community for ideas, because I doubt I have the best one.
My idea is to store tables of pins grouped by quadrants, effectively a cache, and then I can in almost constant time return the pins from the quadrants involved.
Is there a simpler way to do this?
Maybe using NoSQL?
🙏🏻
A month later it seems geohashing is probably the best way, plus AWS has a library for automatically handling this with dynamodb. Apparently it takes the corners of the screen, lat/lon, and automatically returns the items from the DB in the view, in, I assume, constant time, since that's the whole point of geohashing, getting performance that works at scale..
https://www.npmjs.com/package/dynamodb-geo
https://aws.amazon.com/blogs/compute/implementing-geohashing-at-scale-in-serverless-web-applications/
Otherwise, using a geohashing library that is built for serving mobile apps likely exists.

Non Redundant Image Extraction From Video

I am collecting data for a project. The data collection is done by recording videos of the subjects and the environment. However, while training the network, I would not want to train it with all the images collected in the video sequence.
The main objective is to not train the network with redundant images. The video sequence collected at 30 frames/sec can have redundant images (images that are very similar) within the short intervals. T(th) frame and (T+1)th frame can be similar.
Can someone suggest ways to extract only the images that can be useful for training ?
Update #2: Further resources,
https://github.com/JohannesBuchner/imagehash
https://www.pyimagesearch.com/2017/11/27/image-hashing-opencv-python/
https://www.pyimagesearch.com/2020/04/20/detect-and-remove-duplicate-images-from-a-dataset-for-deep-learning/
Update #1: You can use this repo to calculate similarity between given images. https://github.com/quickgrid/image-similarity**
If frames with certain objects(e.g., vehicle, device) are important, then use pretrained object detectors if available, to extract important frames.
Next, use a similarity method to remove similar images in nearby frames. Until a chosen threshold is exceeded keep removing nearby N frames.
This link should be helpful in finding right method for your case,
https://datascience.stackexchange.com/questions/48642/how-to-measure-the-similarity-between-two-images
This repository below should help implement the idea with few lines of code. It uses CNN to extract features then calculates there cosine distance as mentioned there.
https://github.com/ryanfwy/image-similarity

Gatling Throttle holdfor not working

New to Gatling world but an experienced Loadrunner user.
I created a sample simulation to run two scenarios, each with 10 users and want to run it for 10 minutes. Below is what I have in my setUp function. But each time I run the simulation, it only runs for 136 seconds. The holdFor doesn't seem to take into effect.
setUp(
scn.inject(rampUsers(10) over (10 seconds)),
scen.inject(rampUsers(10) over (10 seconds))
)
.protocols(httpProtocol)
.throttle(
reachRps(2) in (10 seconds),
holdFor(10 minutes)
)
I am using Gatling 2.2.2 bundle.
Output: Simulation computerdatabase.BasicSimulation completed in 136 seconds
The throttle works as a bottleneck, effectively working as an upper boundary for how many requests will be sent. If your scenarios + injection profiles aren't able to generate as many requests as you would like in the first place, the ones that are generated simply pass through the throttle unhindered. The throttle cannot increase the load to match the desired RPS, it can only decreases it.
You will need to inject enough users into your scenarios for them to be able to generate the 2 RPS you want in the first place, and keep adding more of them over the course of the simulation, in order for the throttle to do what you are looking for.
Try changing your injection profiles to for example something like this (and adjust the constantUsersPerSec value as needed), I believe this might give you a load-profile a step closer to what you are looking for:
scn.inject(constantUsersPerSec(1) during(10 minutes))),
scen.inject(constantUsersPerSec(1) during (10 minutes)))
The example above was just a very quick and dirty way to illustrate the point of having to inject users over time, but as chance would have it, injecting 600 users in total over 10 minutes into a scenario is 10 users every ten seconds and should be exactly what you want, unless I'm falling ass first into a basic arithmetic error and/or misunderstanding.
It will also naturally ramp up and down to some extent, although you can more explicitly control the ramp up by chaining injection steps if you need, for example like this:
scn.inject(
rampUsers(10) over (1 minute),
constantUsersPerSecond(1) during (10 minutes)
)
For another approach to more explicitly control the ramp over time, you could also play around with a configuration like this:
scn.inject(
splitUsers(600) into(rampUsers(10) over(10 seconds)) separatedBy(10 seconds)
)

ArduPilot, Dronekit-Python, Mavproxy and Mavlink - Hunt for the Bottleneck

I have Ardupilot on plane, using 3DR Radio back to Raspberry Pi on the ground doing some advanced geo and attitude based maths, and providing audio feedback to pilot (rather than looking to screen).
I am using Dronekit-python, which in turn uses Mavproxy and Mavlink. What I am finding is that I am only getting new attitude data to the Pi at about 3hz - and I am not sure where the bottleneck is:
3DR is running at 57.6 khz and all happy
I have turned off the automatic push of logs from Ardupilot down to Pi (part of Mavproxy)
The Pi can ask for Attitude data (roll, yaw etc.) through the DroneKit Python API as often as it likes, but only gets new data (ie, a change in value) about every 1/3 second.
I am not deep enough inside the underlying architecture to understand what the bottleneck may be -- can anyone help? Is it likely a round trip message response time from base to plan and back (others seem to get around 8hz from Mavlink from what I have read)? Or latency across the combination of Mavproxy, Mavlink and Drone Kit? Or is there some setting inside Ardupilot or Telemetry that copuld be driving this.
I am aware this isn't necessarily a DroneKit issue, but not really sure where it goes as it spans quite a few components.
Requesting individual packets should work, but that was never meant to be requested lots of times per second.
In order to get a certain packet many times per second, set up streams. A stream will trigger a certain number of times per second, and will then send whichever packet is associated with it, automatically. The ATTITUDE message is in the group called EXTRA1.
Let's suppose you want to receive 10 ATTITUDE messages per second. The relevant parameter is called SR0_EXTRA1. This defines the number of Attitude packets sent per second. The default is 4. Try increasing that parameter to 10.

Create a histogram of session length in a given time period using Keen IO

We are trying to build a histogram of session length in a given time period. Currently, we have sess:start and sess:end events which contains the session id and user id. I am wondering what's the best way to compute this data? Can this be achieve using the funnel api?
Have you checkout out the recipes section in Keen IO's docs? Here is an excerpt from the section on histogram recipes for Session Length that might be really helpful.
Excerpt
To create a histogram for session lengths, like the one shown above,
you can run a count analysis on an event collection for completed
sessions (e.g. session_end). Along the x-axis you’ll have segments of
time lapsed in a session, and along the y-axis you’ll have the
percentage of sessions that fit into a given session length cohort.
Note: this recipe incorporates the D3 histogram recipe, which is
explained further in the documentation.
histogram('chart-1', {
segment_length: 60, // In seconds
data_points: 10, // i.e. There will be 10 bars on our chart
analysis_type: 'count',
query_parameters: {
event_collection: 'session_end',
timeframe: timeframe,
filters: []
}
});
More information
Keen IO - Analytics for Developers
Keen IO - Documentation
Code excerpt: Keen IO - Recipes for Histograms
Lots of good stuff behind the link that Stephanie posted.
One extra thing I'll venture is that putting an integer sess:length property in the sess:end event would make things easier. You'd have to keep the start time for each session somewhere in your database so that you can compute the difference for the sess:end event. But then you'd have the difference as a plain old number of seconds and can do any type of numerical analysis on it.

Resources