Graphite summarize() function results are inconsistent for different 'from' values - analytics

I am using Graphite to record user login information.
When I run the following :
render?target=summarize(stats_counts.login.success,"1day")&format=json&from=-1days
I am getting the result :
[
{
"target": "summarize(stats_counts.login.success, \"1day\", \"sum\")",
"datapoints": [
[
5,
1435708800
],
[
21,
1435795200
]
]
}
]
But for the following query :
render?target=summarize(stats_counts.login.success,"1day")&format=json&from=-7days
I am getting the result :
[
{
"target": "summarize(stats_counts.login.success, \"1day\", \"sum\")",
"datapoints": [
[
0,
1435190400
],
[
1,
1435276800
],
[
0,
1435363200
],
[
0,
1435449600
],
[
5,
1435536000
],
[
16,
1435622400
],
[
6,
1435708800
],
[
21,
1435795200
]
]
}
]
Notice the value for the bucket : 1435708800 in both the results.
In one result it is : 5 and in the second result it is : 6
In the first query I am trying to get the number of user logins per day over the last week and in the second one I am trying to get the number of user logins per day yesterday and today.
What is the reason for this difference ?
UPDATE
Graphite Version : 0.9.10
Retention Settings :
[carbon]
pattern = ^carbon\.
retentions = 60:90d
[real_time]
priority = 200
pattern = ^stats.*
retentions = 1:34560000
[stats]
priority = 110
pattern = .*
retentions = 1s:24h,1m:7d,10m:1y

Try adding allign to true, since based on time it varies the number of data points it get on the bucket interval.
By default, buckets are calculated by rounding to the nearest interval. This works well for intervals smaller than a day. For example, 22:32 will end up in the bucket 22:00-23:00 when the interval=1hour.
Passing alignToFrom=true will instead create buckets starting at the from time. In this case, the bucket for 22:32 depends on the from time. If from=6:30 then the 1hour bucket for 22:32 is 22:30-23:30.
"summarize(ex.cpe.ex.xxx,'30s','avg', true)"

Related

How can I get list values that meet specific conditions?

I'm an introductory member of Python.
I are implementing code to organize data with Python.
I have to extract a value that meets only certain conditions out of the numerous lists.
It seems very simple, but it feels too difficult for me.
First, let me explain with the simplest example
solutions
Out[73]:
array([[ 2.31350622e-04, -1.42539948e-02, -7.17361833e-02,
2.17545418e-01, -3.38251827e-01, 1.88254191e-01],
[ 4.23523963e-82, -9.48255372e-81, 5.22018863e-80,
-1.11271010e-79, 1.03507672e-79, -3.55573390e-80],
[ 2.31350597e-04, -1.42539951e-02, -7.17361800e-02,
2.17545409e-01, -3.38251817e-01, 1.88254187e-01],
[ 2.58309722e-02, -6.21550000e-01, 3.41867505e+00,
-7.53828444e+00, 7.09091365e+00, -2.39409614e+00],
[ 2.31350606e-04, -1.42539950e-02, -7.17361809e-02,
2.17545411e-01, -3.38251820e-01, 1.88254188e-01],
[ 1.14525725e-02, -3.25174709e-01, 2.11632584e+00,
-5.16113713e+00, 5.12508331e+00, -1.78380602e+00],
[ 9.75839726e-03, -3.08729919e-01, 2.26983591e+00,
-6.16462170e+00, 6.76409438e+00, -2.55992476e+00],
[ 1.13190092e-03, -6.72042220e-02, 7.10413638e-01,
-2.39952623e+00, 2.94849402e+00, -1.18046338e+00],
[ 5.24406689e-03, -1.86240596e-01, 1.36500589e+00,
-3.61106144e+00, 3.75606312e+00, -1.34699295e+00]])
coeff
Out[74]:
array([[ 1.03177808e-04, -6.35700011e-03, -3.19929208e-02,
9.70209594e-02, -1.50853634e-01, 8.39576506e-02,
4.45980248e-01],
[ 5.13911499e-83, -1.15062991e-81, 6.33426960e-81,
-1.35018220e-80, 1.25598048e-80, -4.31459067e-81,
1.21341776e-01],
[ 1.03177797e-04, -6.35700027e-03, -3.19929194e-02,
9.70209556e-02, -1.50853630e-01, 8.39576490e-02,
4.45980249e-01],
[ 4.26209161e-03, -1.02555298e-01, 5.64078896e-01,
-1.24381145e+00, 1.16999559e+00, -3.95024121e-01,
1.64999272e-01],
[ 1.03177801e-04, -6.35700023e-03, -3.19929198e-02,
9.70209566e-02, -1.50853631e-01, 8.39576495e-02,
4.45980248e-01],
[ 2.27512838e-03, -6.45980810e-02, 4.20421959e-01,
-1.02529362e+00, 1.01813129e+00, -3.54364724e-01,
1.98656535e-01],
[ 1.42058482e-03, -4.49435521e-02, 3.30432790e-01,
-8.97418681e-01, 9.84687293e-01, -3.72662657e-01,
1.45575629e-01],
[ 2.46722650e-04, -1.46486353e-02, 1.54850246e-01,
-5.23029411e-01, 6.42688990e-01, -2.57307904e-01,
2.17971950e-01],
[ 1.30617191e-03, -4.63880878e-02, 3.39990392e-01,
-8.99429225e-01, 9.35545685e-01, -3.35503798e-01,
2.49076135e-01]])
In a matrix defined as 'numpy', called 'solutions', each row represents 'solutions[0]','solutions[1]', 'solutions[i]'... In addition, the 'coeff' is also defined as 'numpy', and the 'coeff[0]','coeff[1]','coeff[i]'... is matched to 'solutions[0]','solutions[1]','solutions[i]'...
What I want at this time is to find specific 'solution[i]' and 'coeff[i]' where all elements of solutions[i] are less than 10^-10 and all elements of coeff[i] are greater than 10^-3.
I wonder if there is an appropriate code to extract a list array in a situation that meets more than one condition. I'm a Python initiator, so please excuse me.
This can be accomplished using advanced indexing:
solution_valid = np.all(solutions < 10e-10, axis=1)
coeff_valid = np.all(coeff > 1e-3, axis=1)
both_valid = coeff_valid & solution_valid
valid_solutions = solutions[both_valid]
valid_coeffs = coeff[both_valid]
but perhaps you mean that the absolute value should be greater or below a certain threshold?
solution_valid = np.all(np.abs(solutions) < 10e-10, axis=1)
coeff_valid = np.all(np.abs(coeff) > 1e-3, axis=1)
both_valid = coeff_valid & solution_valid
valid_solutions = solutions[both_valid]
valid_coeffs = coeff[both_valid]

How to retrieve top n children of each node in a tree in one query

I am recently evaluating graph databases or any other databases on one specific requirement:
The ability to retrieve top n children of each node by a aggregated property of the node's direct children and their all direct and indirect children in a tree in one query. The result should return correct hierarchical structure.
Example
root + 11
++ 111
++ 1111
++ 112
++ 1121
++ 1122
++ 1123
++ 113
+ 12
++ 121
++ 122
++ 1221
++ 1222
++ 1223
++ 123
+ 13
++ 131
++ 132
++ 133
++ 134
+ 14
Each node has a property of how many direct children it has. And the tree has no more than 8 levels. Let's say I want to run a query of the entire tree, by all nodes at each level, whose top 2 children which has the most direct and indirect children. It would give us the following:
root + 11
++ 111
++ 1111
++ 112
++ 1121
++ 1122
+ 12
++ 121
++ 122
++ 1221
++ 1222
I am wondering if there is any graph database, or any other database that support such query efficiently and if yes, how ?
Using Neo4j
You can do this with Neo4j, but you'll need to ensure you're using the APOC Procedures plugin for access to some of the map and collection functions and procedures.
One thing to note first. You didn't define any criteria to use when selecting between child nodes when there is a tie between their descendent node counts. As such, the results of the following may not match yours exactly, as alternate nodes (with tied counts) may have been selected. If you do need additional criteria for the ordering and selection, you will have to add that to your description so I can modify the queries accordingly.
Create the test graph
First, let's create the test data set. We can do this through the Neo4j browser.
First let's set the parameters we'll need to create the graph:
:param data => [{id:11, children:[111, 112, 113]}, {id:12, children:[121, 122, 123]}, {id:13, children:[131,132,133,134]}, {id:14, children:[]}, {id:111, children:[1111]}, {id:112, children:[1121, 1122, 1123]}, {id:122, children:[1221,1222,1223]}]
Now we can use this query to use those parameters to create the graph:
UNWIND $data as row
MERGE (n:Node{id:row.id})
FOREACH (x in row.children |
MERGE (c:Node{id:x})
MERGE (n)-[:CHILD]->(c))
We're working with nodes of type :Node connected to each other by :CHILD relationships, outgoing toward the leaf nodes.
Let's also add a :Root:Node at the top level to make some of our later queries a bit easier:
MERGE (r:Node:Root{id:0})
WITH r
MATCH (n:Node)
WHERE NOT ()-[:CHILD]->(n)
MERGE (r)-[:CHILD]->(n)
The :Root node is now connected to the top nodes (11, 12, 13, 14) and our test graph is ready.
The Actual Query
Because the aggregation you want needs the count of all descendants of a node and not just its immediate children, we can't use the child count property of how many direct children a node has. Or rather, we COULD, summing the counts from all descendants of the node, but since that requires us to traverse down to all descendants anyway, it's easier to just get the count of all descendants and avoid property access entirely.
Here's the query in its entirety below, you should be able to run the full query on the test graph. I'm breaking it into sections with linebreaks and comments to better show what each part is doing.
// for each node and its direct children,
// order by the child's descendant count
MATCH (n:Node)-[:CHILD]->(child)
WITH n, child, size((child)-[:CHILD*]->()) as childDescCount
ORDER BY childDescCount DESC
// now collect the ordered children and take the top 2 per node
WITH n, collect(child)[..2] as topChildren
// from the above, per row, we have a node and a list of its top 2 children.
// we want to gather all of these children into a single list, not nested
// so we collect the lists (to get a list of lists of nodes), then flatten it with APOC
WITH apoc.coll.flatten(collect(topChildren)) as topChildren
// we now have a list of the nodes that can possibly be in our path
// although some will not be in the path, as their parents (or ancestors) are not in the list
// to get the full tree we need to match down from the :Root node and ensure
// that for each path, the only nodes in the path are the :Root node or one of the topChildren
MATCH path=(:Root)-[:CHILD*]->()
WHERE all(node in nodes(path) WHERE node:Root OR node in topChildren)
RETURN path
Without the comments, this is merely an 8-line query.
Now, this actually returns multiple paths, one path per row, and the entirety of all the paths together create the visual tree you're after, if you view the graphical results.
Getting the results as a tree in JSON
However, if you're not using a visualizer to view the results graphically, you would probably want a JSON representation of the tree. We can get that by collecting all the result paths and using a procedure from APOC to produce the JSON tree structure. Here's a slightly modified query with those changes:
MATCH (n:Node)-[:CHILD]->(child)
WITH n, child, size((child)-[:CHILD*]->()) as childDescCount
ORDER BY childDescCount DESC
WITH n, collect(child)[..2] as topChildren
WITH apoc.coll.flatten(collect(topChildren)) as topChildren
MATCH path=(:Root)-[:CHILD*]->()
WHERE all(node in nodes(path) WHERE node:Root OR node in topChildren)
// below is the new stuff to get the JSON tree
WITH collect(path) as paths
CALL apoc.convert.toTree(paths) YIELD value as map
RETURN map
The result will be something like:
{
"_type": "Node:Root",
"_id": 52,
"id": 0,
"child": [
{
"_type": "Node",
"_id": 1,
"id": 12,
"child": [
{
"_type": "Node",
"_id": 6,
"id": 122,
"child": [
{
"_type": "Node",
"_id": 32,
"id": 1223
},
{
"_type": "Node",
"_id": 31,
"id": 1222
}
]
},
{
"_type": "Node",
"_id": 21,
"id": 123
}
]
},
{
"_type": "Node",
"_id": 0,
"id": 11,
"child": [
{
"_type": "Node",
"_id": 4,
"id": 111,
"child": [
{
"_type": "Node",
"_id": 26,
"id": 1111
}
]
},
{
"_type": "Node",
"_id": 5,
"id": 112,
"child": [
{
"_type": "Node",
"_id": 27,
"id": 1121
},
{
"_type": "Node",
"_id": 29,
"id": 1123
}
]
}
]
}
]
}

Attribute Syntax for JSON query in check_json.pl

So, I'm trying to set up check_json.pl in NagiosXI to monitor some statistics. https://github.com/c-kr/check_json
I'm using the code with the modification I submitted in pull request #32, so line numbers reflect that code.
The json query returns something like this:
[
{
"total_bytes": 123456,
"customer_name": "customer1",
"customer_id": "1",
"indices": [
{
"total_bytes": 12345,
"index": "filename1"
},
{
"total_bytes": 45678,
"index": "filename2"
},
],
"total": "765.43gb"
},
{
"total_bytes": 123456,
"customer_name": "customer2",
"customer_id": "2",
"indices": [
{
"total_bytes": 12345,
"index": "filename1"
},
{
"total_bytes": 45678,
"index": "filename2"
},
],
"total": "765.43gb"
}
]
I'm trying to monitor the sized of specific files. so a check should look something like:
/path/to/check_json.pl -u https://path/to/my/json -a "SOMETHING" -p "SOMETHING"
...where I'm trying to figure out the SOMETHINGs so that I can monitor the total_bytes of filename1 in customer2 where I know the customer_id and index but not their position in the respective arrays.
I can monitor customer1's total bytes by using the string "[0]->{'total_bytes'}" but I need to be able to specify which customer and dig deeper into file name (known) and file size (stat to monitor) AND the working query only gives me the status (OK,WARNING, or CRITICAL). Adding -p all I get are errors....
The error with -p no matter how I've been able to phrase it is always:
Not a HASH reference at ./check_json.pl line 235.
Even when I can get a valid OK from the example "[0]->{'total_bytes'}", using that in -p still gives the same error.
Links pointing to documentation on the format to use would be very helpful. Examples in the README for the script or in the -h output are failing me here. Any ideas?
I really have no idea what your question is. I'm sure I'm not alone, hence the downvotes.
Once you have the decoded json, if you have a customer_id to search for, you can do:
my ($customer_info) = grep {$_->{customer_id} eq $customer_id} #$json_response;
Regarding the error on line 235, this looks odd:
foreach my $key ($np->opts->perfvars eq '*' ? map { "{$_}"} sort keys %$json_response : split(',', $np->opts->perfvars)) {
# ....................................... ^^^^^^^^^^^^^
$perf_value = $json_response->{$key};
if perfvars eq "*", you appear to be looking for $json_reponse->{"{total}"} for example. You might want to validate the user's input:
die "no such key in json data: '$key'\n" unless exists $json_response->{$key};
This entire business of stringifying the hash ref lookups just smells bad.
A better question would look like:
I have this JSON data. How do I get the sum of total_bytes for the customer with id 1?
See https://stackoverflow.com/help/mcve

lucene solr - how to know numCount of each word in query

i have a query string with 5 words. for exmple "cat dog fish bird animals".
i need to know how many matches each word has.
at this point i create 5 queries:
/q=name:cat&rows=0&facet=true
/q=name:dog&rows=0&facet=true
/q=name:fish&rows=0&facet=true
/q=name:bird&rows=0&facet=true
/q=name:animals&rows=0&facet=true
and get matches count of each word from each query.
but this method takes too many time.
so is there a way to check get numCount of each word with one query?
any help appriciated!
In this case, functionQueries are your friends. In particular:
termfreq(field,term) returns the number of times the term appears in the field for that document. Example Syntax:
termfreq(text,'memory')
totaltermfreq(field,term) returns the number of times the term appears in the field in the entire index. ttf is an alias of
totaltermfreq. Example Syntax: ttf(text,'memory')
The following query for instance:
q=*%3A*&fl=cntOnSummary%3Atermfreq(summary%2C%27hello%27)+cntOnTitle%3Atermfreq(title%2C%27entry%27)+cntOnSource%3Atermfreq(source%2C%27activities%27)&wt=json&indent=true
returns the following results:
"docs": [
{
"id": [
"id-1"
],
"source": [
"activities",
"activities"
],
"title": "Ajones3 Activity Entry 1",
"summary": "hello hello",
"cntOnSummary": 2,
"cntOnTitle": 1,
"cntOnSource": 1,
"score": 1
},
{
"id": [
"id-2"
],
"source": [
"activities",
"activities"
],
"title": "Common activity",
"cntOnSummary": 0,
"cntOnTitle": 0,
"cntOnSource": 1,
"score": 1
}
}
]
Please notice that while it's working well on single value field, it seems that for multivalued fields, the functions consider just the first entry, for instance in the example above, termfreq(source%2C%27activities%27) returns 1 instead of 2.

SOLR/Tika - I need to concat values from 2 columns from 2 entities

Please look at my schema: http://pastebin.com/uPxwq8Zs and my data config: http://pastebin.com/ebeDfPM9
So, I need to index page content and some other fields and also all file attachments text, linked for each page. Because there can be multiple attachments, "text" and the rest fileds regarding to file need to be declared as multivalued. So, example output could be:
{
"header": " ",
"page_id": 25352,
"title": "Informacje, których nie ma w BIP",
"content": [
"<p> TEST test test"
],
"file_name": [
"Wniosek",
"Wniosek"
],
"file_desc": [
"Wniosek o udostępnienie informacji publicznej - format PDF",
"Wniosek o udostępnienie informacji publicznej - format RTF"
],
"file_path": [
"/mnt/storage/content/www/html/smartsite/src/data/resource/1/2/422/zalacznik_nr_1_wniosek.pdf",
"/mnt/storage/content/www/html/smartsite/src/data/resource/1/2/422/zalacznik_nr_1_wniosek.rtf"
],
"text": [
"Dzień dobry - dokument testowy.\nTEST.\n"
],
"_version_": 1479310282003054600
},
As you can see, it generallly works, but it's useless for me. In my example, there are 2 file attachments declared in database for page_id 25352. First attachment doesn't exist on disk on server, so tika was unable to index it, second attachment was indexed succesfully and this is text extracted:
"text": [
"Dzień dobry - dokument testowy.\nTEST.\n"
],
But I need to know, from which attachment was it extracted. So my idea is to concat my "text" field value with "file_patch" value and some separator, so I will get result like this:
"text": [
/mnt/storage/content/www/html/smartsite/src/data/resource/1/2/422/zalacznik_nr_1_wniosek.rtf*"Dzień dobry - dokument testowy.\nTEST.\n"
],
How to get result like this using solr/tike in my case?

Resources