Prometheus sum get no data , can i set to default value? - export

i am trying to sum network traffic in/out from different IDC, also using snmp_export to get those information, but sometimes the snmp export can't get some switch's infomations ,maybe timeout or lost. so there is no date update for this switch and "/metric" will only show parts traffic info. The problem is when i using
sum(irate(ifInOctets{ifIndex=...,instance=...})) +
sum(irate(ifInOctets{ifIndex=...,instance=...}))+
sum(irate(ifInOctets{ifIndex=...,instance=...}))
to get all traffic total value , the expr will return no data and break the graph.
I am newbie to prometheus. not sure if the using method is wrong .
Thanks

The way to approach this is to use rate() with a long enough range to tolerate a failed scrape. For example if you are scraping once a minute then 5m is enough so you would use sum without(instance) (rate(ifInOctects[5m]))

Related

How to Implement Patterns to Match Brute Force Login and Port Scanning Attacks using Flink CEP

I have a use case where a large no of logs will be consumed to the apache flink CEP. My use case is to find the brute force attack and port scanning attack. The challenge here is that while in ordinary CEP we compare the value against a constant like "event" = login. In this case the Criteria is different as in the case of brute force attack we have the criteria as follows.
username is constant and event="login failure" (Delimiter the event happens 5 times within 5 minutes).
It means the logs with the login failure event is received for the same username 5 times within 5 minutes
And for port Scanning we have the following criteira.
ip address is constant and dest port is variable (Delimiter is the event happens 10 times within 1 minute). It means the logs with constant ip address is received for the 10 different ports within 1 minute.
With Flink, when you want to process the events for something like one username or one ip address in isolation, the way to do this is to partition the stream by a key, using keyBy(). The training materials in the Flink docs have a section on Keyed Streams that explains this part of the DataStream API in more detail. keyBy() is the roughly same concept as a GROUP BY in SQL, if that helps.
With CEP, if you first key the stream, then the pattern will be matched separately for each distinct value of the key, which is what you want.
However, rather than CEP, I would instead recommend Flink SQL, perhaps in combination with MATCH_RECOGNIZE, for this use case. MATCH_RECOGNIZE is a higher-level API, built on top of CEP, and it's easier to work with. In combination with SQL, the result is quite powerful.
You'll find some Flink SQL training materials and examples (including examples that use MATCH_RECOGNIZE) in Ververica's github account.
Update
To be clear, I wouldn't use MATCH_RECOGNIZE for these specific rules; neither it nor CEP is needed for this use case. I mentioned it in case you have other rules where it would be helpful. (My reason for not recommending CEP in this case is that implementing the distinct constraint might be messy.)
For example, for the port scanning case you can do something like this:
SELECT e1.ip, COUNT(DISTINCT e2.port)
FROM events e1, events e2
WHERE e1.ip = e2.ip AND timestampDiff(MINUTE, e1.ts, e2.ts) < 1
GROUP BY e1.ip HAVING COUNT(DISTINCT e2.port) >= 10;
The login case is similar, but easier.
Note that when working with streaming SQL, you should give some thought to state retention.
Further update
This query is likely to return a given IP address many times, but it's not desirable to generate multiple alerts.
This could be handled by inserting matching IP addresses into an Alert table, and only generate alerts for IPs that aren't already there.
Or the output of the SQL query could be processed by a de-duplicator implemented using the DataStream API, similar to the example in the Flink docs. If you only want to suppress duplicate alerts for some period of time, use a KeyedProcessFunction instead of a RichFlatMapFunction, and use a Timer to clear the state when it's time to re-enable alerts for a given IP.
Yet another update (concerning CEP and distinctness)
Implementing this with CEP should be possible. You'll want to key the stream by the IP address, and have a pattern that has to match within one minute.
The pattern can be roughly like this:
Pattern<Event, ?> pattern = Pattern
.<Event>begin("distinctPorts")
.where(iterative condition 1)
.oneOrMore()
.followedBy("end")
.where(iterative condition 2)
.within(1 minute)
The first iterative condition returns true if the event being added to the pattern has a distinct port from all of the previously matching events. Somewhat similar to the example here, in the docs.
The second iterative condition returns true if size("distinctPorts") >= 9 and this event also has yet another distinct port.
See this Flink Forward talk (youtube video) for a somewhat similar example at the end of the talk.
If you try this and get stuck, please ask a new question, showing us what you've tried and where you're stuck.

Trying to obtain specific data from an API with different ID's but nothing logs to the console

I'm trying to access data from an API that has 10 different ID's (1 - 10) with each ID containing bits of mock information for a client. The base URL is
https://europe-west2-mpx-tools-internal.cloudfunctions.net/frontend-mock-api/clients
Adding the {clientId} to the end of the URL return the last 30 days of data for a specific client for instance: https://europe-west2-mpx-tools-internal.cloudfunctions.net/frontend-mock-api/clients/1 returns data like so:
But when I try to console.log data containing the (date, cost, impressions, clicks, conversions), nothing appears, I'm not getting an error
Here's my code from both files:
I'm essentially wondering how best to ensure I'm able to access the specific bits of data for each day and for each client from 1 - 10.
I'm not getting an error but believe I'm going wrong somewhere if anyone can help.
Thanks in advance and sorry if I've missed key info that's needed as I'm still learning!

Aggregation Strategy based on message size of aggregate

I would like to aggregate exchanges, and when then exchange hits a certain size (say 20KB) I would like to mark the exchange as closed.
I have a rudimentary implementation that checks size of the exchange and if it is 18KB I return true from my predicate. However, if a messages comes in that is 4KB and it is currently 17KB that will mean I will complete the aggregation when it is 21KB which is too big.
Any ideas on how to solve this? Can I do something in the aggregation strategy to reject the join and start a new Exchange to aggregate on?
I figured I could put it through another process to check actual size remove messages off the end of the message to fit the size, and for each removed message, push them back through...but that seems a little ugly because I have a constantly compensating routine that would likely execute.
Thanks in advance for any tips.
I think there is an eager complete option you can use to mark it as complete when you have that 17 + 4 > 20 situation. Then it will complete the 17, and start a new group with the 4.
See the docs at: https://github.com/apache/camel/blob/master/camel-core/src/main/docs/eips/aggregate-eip.adoc
And you would also likely need to use `PreCompleteAggregationStrategy' and return true in that 17 + 4 > 20 situation, as otherwise it would group them together first and complete, eg so it becomes 21. But by using both the eager completion check option and this interface you can do as you want.
https://github.com/apache/camel/blob/master/camel-core/src/main/java/org/apache/camel/processor/aggregate/PreCompletionAwareAggregationStrategy.java

Understanding Datastore Get RPCs in Google App Engine

I'm using sharded counters (https://cloud.google.com/appengine/articles/sharding_counters) in my GAE application for performance reasons, but I'm having some trouble understanding why it's so slow and how I can speed things up.
Background
I have an API that grabs a set of 20 objects at a time and for each object, it gets a total from a counter to include in the response.
Metrics
With Appstats turned on and a clear cache, I notice that getting the totals for 20 counters makes 120 RPCs by datastore_v3.Get which takes 2500ms.
Thoughts
This seems like quite a lot of RPC calls and quite a bit of time for reading just 20 counters. I assumed this would be faster and maybe that's where I'm wrong. Is it supposed to be faster than this?
Further Inspection
I dug into the stats a bit more, looking at these two lines in the get_count method:
all_keys = GeneralCounterShardConfig.all_keys(name)
for counter in ndb.get_multi(all_keys):
If I comment out the get_multi line, I see that there are 20 RPC calls by datastore_v3.Get totaling 185ms.
As expected, this leaves get_multi to be the culprit for 100 RPC calls by datastore_v3. Get taking upwards of 2500 ms. I verified this, but this is where I'm confused. Why does calling get_multi with 20 keys cause 100 RPC calls?
Update #1
I checked out Traces in the GAE console and saw some additional information. They show a breakdown of the RPC calls there as well - but in the sights they say to "Batch the gets to reduce the number of remote procedure calls." Not sure how to do that outside of using get_multi. Thought that did the job. Any advice here?
Update #2
Here are some screen shots that show the stats I'm looking at. The first one is my base line - the function without any counter operations. The second one is after a call to get_count for just one counter. This shows a difference of 6 datastore_v3.Get RPCs.
Base Line
After Calling get_count On One Counter
Update #3
Based on Patrick's request, I'm adding a screenshot of info from the console Trace tool.
Try splitting up the for loop that goes through each item and the actual get_multi call itself. So something like:
all_values = ndb.get_multi(all_keys)
for counter in all_values:
# Insert amazeballs codes here
I have a feeling it's one of these:
The generator pattern (yield from for loop) is causing something funky with get_multi execution paths
Perhaps the number of items you are expecting doesn't match actual result counts, which could reveal a problem with GeneralCounterShardConfig.all_keys(name)
The number of shards is set too high. I've realized that anything over 10 shards causes performance issues.
When I've dug into similar issues, one thing I've learned is that get_multi can cause multiple RPCs to be sent from your application. It looks like the default in the SDK is set to 1000 keys per get, but the batch size I've observed in production apps is much smaller: something more like 10 (going from memory).
I suspect the reason it does this is that at some batch size, it actually is better to use multiple RPCs: there is more RPC overhead for your app, but there is more Datastore parallelism. In other words: this is still probably the best way to read a lot of datastore objects.
However, if you don't need to read the absolute most current value, you can try setting the db.EVENTUAL_CONSISTENCY option, but that seems to only be available in the older db library and not in ndb. (Although it also appears to be available via the Cloud Datastore API).
Details
If you look at the Python code in the App Engine SDK, specifically the file google/appengine/datastore/datastore_rpc.py, you will see the following lines:
max_count = (Configuration.max_get_keys(config, self.__config) or
self.MAX_GET_KEYS)
...
if is_read_current and txn is None:
max_egs_per_rpc = self.__get_max_entity_groups_per_rpc(config)
else:
max_egs_per_rpc = None
...
pbsgen = self._generate_pb_lists(indexed_keys_by_entity_group,
base_req.ByteSize(), max_count,
max_egs_per_rpc, config)
rpcs = []
for pbs, indexes in pbsgen:
rpcs.append(make_get_call(base_req, pbs,
self.__create_result_index_pairs(indexes)))
My understanding of this:
Set max_count from the configuration object, or 1000 as a default
If the request must read the current value, set max_gcs_per_rpc from the configuration, or 10 as a default
Split the input keys into individual RPCs, using both max_count and max_gcs_per_rpc as limits.
So, this is being done by the Python Datastore library.

GWT how to setup Pager in DataGrid when using Objectify Cursors

I recently got it to the point where I can retrieve data with a Cursor (see this link: GWT pass Objectify Cursor from Server to Client with RequestFactory and show more pages in DataGrid)
what I am running into - when I get the data pack on the client side its only a List of 25 - when I go to set the data in the DataGrid the pager on the bottom says showing 1-25 of 25, there are obviously more records in the database I'm just retrieving 25 of them at a time with the cursor
What I tried doing is setting the following:
pager.setRangeLimited(false);
Unfortunately - while this allows me to page and select more from the database - it never actually gives me the amount in the database. What I am wondering is, if I'm using a Cursor on the server side - how do I set the total count in the Pager??
One thing i thought about doing is simply adding a total count variable to the ListCursor wrapper object i'm returning - unfortunately this would require that if i request it with a null initial query i go through and get the total count every time - this seems horribly inefficient - and then once i get this back I still have no idea how to actually tell the pager that more data is available than i actually gave it.
Any help on this would be really appreciated
You set the total count in the pager by telling the pager that the row count is exact :
asyncDataProvider.updateRowCount(int size, boolean exact);
If you don't tell the pager that the row count is exact, then you can obviously not navigate to the last page.
The core issue is how to get hold of the total row count. Querying the row count is indeed highly inefficient. A better bet would be to keep a counter in the data store that tracks the number of records. This can be quite inefficient too, because you have the increment this counter synchronized/transactional.
In my project, I dont keep track of the exact row count but I provide flexible search options.

Resources