How to make this query using Prometheus? - database

I'm really new to Prometheus and for the moment I want to do some tests with the query to be a bit more familiar with it.
So with the query container_last_seen[10s] it returns me an array :
container_last_seen{container_label_com_docker_compose_config_hash="dc8a2ab1347ad16ab37ff0ad03f3a00f86b381ea2d85d45a11367331526c3640",container_label_com_docker_compose_container_number="1",container_label_com_docker_compose_oneoff="False",container_label_com_docker_compose_project="dockprom",container_label_com_docker_compose_service="cadvisor",container_label_com_docker_compose_version="1.10.0",container_label_org_label_schema_group="monitoring",id="/docker/2b448d19a33b50411941a55435b03f5a4af19e3b3e9581054a67e4da3363ef19",image="google/cadvisor:v0.24.1",instance="cadvisor:8080",job="cadvisor",name="cadvisor"}
And I want to get only the attribute name.
So my idea was to do something like this:
container_last_seen[10s][name]
But I have a parse error. So how can I make this query ?

It may seem a little counterintuitive for this purpose, but the aggregation operators allow reducing labels with the by and without clauses.
sum by(name) (container_last_seen{..criteria..})
should get you closer to what you are wanting by returning objects with only the name key.
I think you want a little further though - you don't want values and you don't want the object part - you just want strings. Unfortunately Prometheus deals with numeric metrics that can have labels, and specifically not string metrics.
While it requires additional software, it is officially recommended by Prometheus so I will mention it here as it gets you very close to what I believe is your desired solution:
If you were to chart that query in Grafana, either with all the keys or just the name key, the legend format {{name}} should get you exactly what you want. Grafana also provides label_values to help with this purpose in regards to filtering.
Lastly if this is not the right direction for you, for intensive string-based metrics ELK/EFK stack may be a better fit. There are projects like prometheus-es-exporter that can report the results from ElasticSearch queries as metrics.

This is not possible as labels like 'name' are separate to the metric value. You should look at the JSON the query and query_range endpoints return to see how this is exposed.

Related

Azure form-recognizer - prebuilt-invoice doesnt recognize currency, other custom fields

Azure form-recognizer, prebuilt-invoice doesnt recognize currency and some of my other custom fields from my invoice pdf. General Document gets me all key values. But for General document keyvalues I need to write algorithm to categorize the invoice related fields, which are already done in prebuilt-invoice.
I need all keyvalues from prebiult-invoice api, so I can find the missing elements by myself.
Anybody faced this? how do you overcome? one way I think of is, we can call both apis for same document. But it affects performance and increases cost.
any suggestion?

Local aggregation for data stream in Flink

I'm trying to find a good way to combine Flink keyed WindowedStream locally for Flink application. The idea is to similar to a combiner in MapReduce: to combine partial results in each partition (or mapper) before the data (which is still a keyed WindowedStream) is sent to a global aggregator (or reducer). The closest function I found is: aggregate but I was't be able to find a good example for the usage on WindowedStream.
It looks like aggregate doesn't allow a WindowedStream output. Is there any other way to solve this?
There have been some initiatives to provide pre-aggregation in Flink. You have to implement your own operator. In the case of stream environment you have to extend the class AbstractStreamOperator.
KurtYoung implemented a BundleOperator. You can also use the Table API on top of the stream API. The Table API is already providing a local aggregation. I also have one example of the pre-aggregate operator that I implemented myself. Usually, the drawback of all those solutions is that you have to set the number of items to pre-aggregate or the timeout to pre-aggregate. If you don't have it you can run out of memory, or you never shuffle items (if the threshold number of items is not achieved). In other words, they are rule-based. What I would like to have is something that is cost-based, more dynamic. I would like to have something that adjusts those parameters in run-time.
I hope these links can help you. And, if you have ideas for the cost-based solution, please come to talk with me =).

How do I get both objects and values() from a single Django query?

I want to export some database objects as JSON. The export format more or less matches the format of Example.objects.values(*columns), except each dict should also contain a value computed by calling a method on each of Example.objects.all(), that is, on the class Example(Model) instances.
How do I produce a suitable dictionary with only one database query?
My motivation for collapsing two queries (my current solution) into one is consistency: we run Django against PostgreSQL at the default "read committed" level, which may make my two queries return inconsistent data. (The better performance is a benefit too, but not the main motivation.)
I see two paths:
query for objects and convert the objects to dictionaries in my python code.
query for dictionaries and convert the dictionaries to objects in my python code.
Which is better? What should I watch out for? How do I do either one correctly? Django does a lot of complex stuff behind the scenes—the code I looked at seemed somewhat complex, anyways—so I'm not sure: which path is the most robust?
NOTE: the solution needs to span __ lookups, i.e. I want to have e.g. {"other_model__pk": 1} be a possible output. A quick tests suggests that django.forms.models.model_to_dict doesn't do this.

CakePHP virtual HABTM table/relation, something like fixture

first of all I'd like to tell you that you're terrific audience.
I'm making an application where I have model Foo with table Foos. And I'd like to give Foo another parameter, HABTM parameter, lets say Bar. But I'd rather don't create table for Bar. Because Bar will have like 5 positions on start and in 5 years it will grow to maybe 7 positions or not at all. So I don't see a need to create another table and make CakePHP look into that table with another SELECT. Anyone have an idea this can be achieved ?
One solution I think is making an fixture for Bars table and adding only Bars_Foos table for real (it won't be big anyway). But I can't find a way to use test fixtures in normal Controller
Second solution is to save a JSON or serialized array in Foo one field and move logic to model, but I don't know if it is best solution. Something like virtual field.
Real life example:
So I have like Bikes. And every Bike have its main_type. Which is for now {"MTB","Road","Trekking","City","Downhill"}. I know that in long time this list would not grow much. Maybe 2 or 5 positions in few years. Still it will relatively short.
(For those who say that there maybe a hundred of specialized bike types. I have another parameter column specialized_type)
It needs to be a HABTM relation, but main_types table will be very small, so Id like to avoid creating it and find a way for simpler solution.
Because
It bothers MySQL for such small amount of data
It complicates MySQL queries
I have to make additional model for MainType
I have more models to unbind when I don't need most of data and would like use recursive
Insert here anything you'd like...
Judging from your real life example, I'd say you're on the wrong track. The queries won't be complicated, CakePHP uses additional queries for HABTM relations, it would be just one additional query which shouldn't be very costly, also it's very easy to sparse it out by using the containable behaviour. And if you really need to use recursive only (for whatever reason), then it's just one single additional model to unbind, that doesn't seem like overkill to me.
This might not be what you wanted to hear, but I really think a proper database solution is better than trying to hack in "virtual data". Also note that fixtures as used in tests, only define data which is written to the database on the fly when running the test, so that would be definitely more costly than using data that already exists in the database.
Maybe you'll get a small performance boost for selects that do not query the main type when using an additional column to store the data, but you'll definitely lose all the flexibility that the RDBMS has to offer, including faster selects using proper indexing, affecting multiple records by updating a single related value, etc. That doesn't sound like a good trade-off to me. Think about it, how would you select all Downhill Tracking bikes when this information is stored as a string in a single column? You would probably end up using ugly LIKE selects.
Now wait, there's a SET data type in MySQL hat can hold multiple values. Right, and it looks easier and less complex. Right, but in the background it isn't, while using a complex looking join-query can be pretty fast using proper indexing, the query for the SET type will have to scan every single row since the data stored in the column cannot be indexed appropriately in order to make more specific selects.
In the end it probably depends on your data, so I'd suggest testing both methods in your specific environment and see how they compare under workload.

Apply Solr filter query to only part of the search results

I have a Solr solution working which requires two queries, but I'm looking for a way to do it in a single query. My idea is that if I can figure out a way to do this, I wont have to incur the overhead of twice the load on the Solr cluster.
The details: I'm running a simple query like "q=camera" with a query filter of say "fq=type:digital". The second query is identical to the first, but the filter is the inverse, like "fq=-type:digital" I'm imagining that if there's a way to run a single query while applying the first filter to get the first set of topDocs, then generate a second set with the second filter the results could be merged and returned ( it doesn't matter if sorting resorts and mixes the two sets).
I experimented with partitioning the data by marking a specific field during indexing, into two different groups and then using Solr "grouping" queries, but the response time for these wasn't acceptable in my setup.
I'm looking for suggestions the most Solr congruent approach to experiment with: tuning to improve the two-query solution performance, or investigating a kind of custom Solr post-filter ( I read Yonik's 2/2012 blog post ).
I have to implement this in Solr 3.5, although if there's a slam dunk solution in 4.0 I'll eventually be able to move to that.
I can think of two alternate approaches :-
Instead of filter the results, use a variable higher boost so that all the results for type:digital come on top and rest of the documents would follow. No need for separate queries. The boost can be changes as per the type value.
Other approach is not to display the results for type other then digital. However, you can display the facets for the other types with the counts for the same for users to know if the other types exist for the search term. You can check on tagging and excluding filters
Result grouping might give you what you want. Just group by that parameter and specify sufficient top number of documents in each group.
But I would test whether its performance is any better than two queries. Just because it mentions performance in limitations section.

Resources