My task is to implement DAU and MAU metrics using Prometheus as data storage.
In order to do this I created counter active_users
Counter activeUsers = new Counter().build().name("active_users").labelNames("username").register();
And on each new connection I do
activeUsers.labels(user.name).inc();
My question is how can write query to Prometheus to extract DAU and MAU from active_users time series? How can I count incrementation of distinct username in 24 hours/30 days window?
Prometheus isn't really the right tool for this as it's more about system-level metrics rather than those of individual requests. I'd recommend doing this via an event logging system like the ELK stack.
Related
In Apache Flink exists 4 types of metrics.
The goal is to add additional info to the histogram metric: user_id and speed_of_processing
For example:
user_id: speed_of_processing
1:30 sec
2:15 sec
...
How can I add such type of information?
It looks like you want to measure end-to-end latency and report it separately for each user_id. That's not a good match for flink's metrics -- it's not a good idea to have per-key (e.g., per user) metrics -- Flink's metrics system isn't designed with that sort of scale in mind.
You could instead perform these measurements and send the results to a sink. Or use a histogram metric that only reports results aggregated across all users.
I am writing a Flink application which consumes time series data from kafka
topic. Time series data has components like metric name, tag key value
pair, timestamp and a value. I have created a tumbling window to aggregate
data based on a metric key (which is a combination of metric name, key
value pair and timestamp). Here is the main stream looks like
kafka source -> Flat Map which parses and emits Metric -> Key by metric
key -> Tumbling window of 60 seconds -> Aggregate the data -> write to the
sink.
I also want to check if there is any metric which arrived late outside the
above window. I want to check how many metrics arrived late and calculate
the percentage of late metrics compared to original metrics. I am thinking
of using the "allowedLateness" feature of flink to send the late metrics to
a different stream. I am planning to add a "MapState" in the main
"Aggregate the data" operator which will have the key as the metric key and
value as the count of the metrics that arrived in the main window.
kafka source -> Flat Map which parses and emits Metric -> Key by metric key
-> Tumbling window of 60 seconds -> Aggregate the data (Maintain a map
state of metric count) -> write to the sink.
\
\
Late data -> Key by
metric key -> Collect late metrics and find the percentage of late metrics
-> Write the result in sink
My question is can "Collect late metrics and find the percentage of late
metrics" operator access the "MapState" which got updated by the
mainstream. Even though they are keyed by the same metric key, I guess they
are two different tasks. I want to calculate (number of late metrics /
(number of late metrics + number of metrics arrived on time)).
There are several different ways you could approach this.
You could store the per-window state in the KeyedStateStore windowState() provided by the Context passed to your WindowProcessFunction. Used in combination with allowedLateness, you could compute the late event statistics as late firings occur. (No need for MapState with this approach, since the windowState is already scoped to a specific window and specific key. ValueState will suffice.)
Another idea would be to capture a side output stream of the late events from the primary window and send those late events through another window that counts them over some time frame. Then send both that late event analytics stream and the output of the first (main) window into a KeyedCoProcessFunction (or RichCoFlatMap) that can compute the late event vs on-time event statistics. (Here you will need MapState, since you may need to have several windows open simultaneously for each key of the keyed stream.)
Or you could use a simple process function to split the initial stream into two (by comparing the timestamps to the current watermark) -- one for the late and another for the not-late events -- and then use Flink SQL to compute all of the statistics.
Or just implement the whole thing in one KeyedProcessFunction. See https://ci.apache.org/projects/flink/flink-docs-stable/docs/learn-flink/event_driven/ for an example.
Currently, we are using the SubmitFeed API section (FeedType _POST_INVENTORY_AVAILABILITY_DATA_) but because of the large number of SKUs (over 200k) we sometimes have problems to update our stock on time. I wrote a sync job that determines inventory changes constantly and then, every 30 minutes, creates a new inventory feed, but is still not enough.
Do you know any alternative API Sections for SubmitFeed (_POST_INVENTORY_AVAILABILITY_DATA_) or do you have an optimized solution for this without getting into throttling?
The smallest and, thus, quickest feed to send is
_POST_FLAT_FILE_PRICEANDQUANTITYONLY_UPDATE_DATA_
It's basically a CSV file with sku, new stock and new price. Amazon handles it in a reasonable short while. We handle about 120k SKUs and send a full hourly update with no issues.
Talking about Counters with respect to StatsD, the way it works is that you keep posting a value of a counter eg. numOfRequests|c:1 whenever app get a request to the StatsD Daemon. The daemon has a flush interval set, when it pushes the aggregate of this counter in that period to an external backend. Additionally it also resets the counter to 0.
Trying to map this to Flink Counters.
Flink counters only has inc and dec methods so till the reporting time comes, app can call inc or dec to change the value of a counter.
At the time of reporting the latest value of counter is reported to StatsD daemon but the Flink counter value is never reset(Not able to find any code).
So should the flink counter be reported as a gauge value to StatsD. Or Flink does reset the counters?
Flink Counters are basically kind of gauge values. The counters are never reset. So numRecordsIn/numRecordsOut or any other counter metrics kept increasing over the lifetime of a job. If you want to visualise the count over a duration, you need to calculate and send the delta to the external backend yourself in the report method or use the external backend solution capabilities to graph the delta.
We use Datadog and used following to graph the delta over a duration:
diff(sum:numRecordsIn{$app_name,$env}.rollup(max))
I have a backtesting framework that needs to replay tick level market data in order. I am currently using Cassandra where my schema is structured to have all ticks for a single trade date in 1 row. Each column represents a single tick. This makes the backtesting framework simple because it can play date ranges by pulling one date at a time in sequence.
I would like to use ChronicleMap and compare its performance with Cassandra.
How do you model ChronicleMap to support the schema of 1 row per tick data?
ChronicleMap is designed to be a random access key-value store.
For back testing, most people use Chronicle Queue to store ordered events. You can use it to store any type of data in order. To look up by time you can search on a monotonically increasing field with a binary search or a range search.
Note: Chronicle Queue is designed to record data in your application in realtime, i.e. less than a micro-second overhead. You can replay this data either as it happens or later as historical data. It is designed to support GC free writing and reading.