sum of squares or square of total? - newrelic-platform

According to https://newrelic.com/docs/plugin-dev/metric-data-for-the-plugin-api plugins should also report the sum of squares for the specified period. Taking a look at the JSON example, I'd say this is more a square of the total value. E.g.:
{
"name": "Primary MySQL Database",
"guid": "com.your_company_name.plugin_name",
"duration" : 60,
"metrics" : {
"Component/ProductionDatabase[Queries/Second]": 100,
"Component/AnalyticsDatabase[Queries/Second]": {
"min" : 2,
"max" : 10,
"total": 12,
"count" : 2,
"sum_of_squares" : 144
}
}
}
Obviously there were two values, 2 and 10. According to the sample the sum_of_squares is (10+2)^2=144 which I would define as "square of total".
For me however, the term "sum of squares" would be 2^2+10^2=104.
So when taking about multi-value metrics - which one is the correct number?

You're absolutely right that the example values as documented are incorrect. I'll notify the relevant parties to update the documentation.
In the common case where only a single metric value is being reported, the count will be (1), in which case squaring the "total" value yields the correct result.

Related

google sheet : How to vlookup by matching value in between max and min?

I have 2 sheets like this :
In that 2nd sheet, i want to lookup the id (S/M/L/XL) by checking if value is in between the Min value and Max value. So the goal is to have something like that 2nd sheet where in 'level' column, i need to put a formula so it will check the value is between what range so then it will retrieve the correct 'id' from 1st sheet.
the rule is if the value is >= minvalue and < max value
How can i do this ?
Thanks
use:
=INDEX(IF(A9:A="",,VLOOKUP(A9:A, {C2:C5, A2:A5}, 2, 1)))
Your first table, has overlapping values, so I suggest you think better about the rules you want to apply.
For example, 1, according your table can match both "S" and "M" , same for 3, which can be "M" or "L".
Once you have resolved that, you can use the QUERY function.
Example:
=QUERY($A$2:$D$5,
"select A,D where C<="&A2&" AND D >="&A2&" ORDER BY D DESC LIMIT 1 ")
Working solution can be found here:
https://docs.google.com/spreadsheets/d/1oLVwQqihT_df2y_ZQnfx7By77HnKSFz0bcbOzMuWqOM/edit?usp=sharing
Rather than have min and max columns, you could just use one column to list incremental values that determine the next size, and use vlookup() with a sort option of true - this avoids overlapping values:
=arrayformula({"level";if(A2:A<>"",VLOOKUP(A2:A,{Source!C:C,Source!A:A},2,1),)})

Calculate the count of an element of an array that are repeated in a range (100-1000) in solr

I have a field in Solr collection that it's an array like following example:
"hashtag": [
"#a"
     "#b"
     "#c"
     "#d"
]
the facet of this field is like belove:
[
"#a": 1000,
"#b": 970,
"#c": 960,
"#d": 950,
"#e": 850
....
]
I want to calculate the count of hashtags that repeated between 900-1000 times
in above example "#a","#b","#c","#d" repeated between 900-1000 times that's mean 4 hashtags
This count may be very large
I want a response like this:
{
"*-100" : 241
"100-1000" : 521,
"1000-10000": 251,
"10000-*" : 854
}
I want to convert this query to solr
How can I do it?
While it won't give you the count, you could use the Terms Component to fetch all the hashtags (.. but not get just the count):
terms.mincount
Specifies the minimum document frequency to return in order for a term to be included in a query response. Results are inclusive of the mincount (that is, >= mincount).
terms.maxcount
Specifies the maximum document frequency a term must have in order to be included in a query response. The default setting is -1, which sets no upper bound. Results are inclusive of the maxcount (that is, <= maxcount).
Your complete request would then be something like:
http://../solr/<core>/terms?terms.fl=extracted_hashtag&wt=xml

In MongoDB, how do you retrieve a set of results where Field X is greater than Field Y by Value N?

I'm trying to query my local MongoDB for documents where Field X is greater than Field Y by Value N (in this case, Value N = 0.25).
I currently have:
db.getCollection('collection1').find({ $expr: { $gt: [ "$field1" , "$field2" ] }}).sort({"dateAdded" : -1})
But this only returns results where field1 is greater than field2. I need field1 to be greater than field2 by 0.25.
I can't find this in the documentation anywhere, i'm sure i'm missing something easy...
Thanks for your help.
Create an expression that calculates the difference first and then you can do the greater than comparison.
Take for instance this query which uses the $subtract operator to get the difference:
db.getCollection('collection1').find(
{ "$expr": {
"$gt": [
{ "$subtract": ["$field1", "$field2"] },
0.25
]
}}
).sort({"dateAdded" : -1})

MongoDB time series database design: hour/minute/second-of-minute vs hour/second-of-hour?

I'm in an academic research project, and using MongoDB to store time series data for accelerometer values (IoT/telemetry data). The granularity is samples where sample rate can be anything between 1 to 100 Hz. Currently I use one hour of data per document, then there's a 3 dimensional array, first level is minutes, second level is seconds, and third level is samples (double data type). This is inspired by MongoDB for Time Series Data presentations (Part 1, Part 2).
e.g.
{
"_id": "2018011200:4", /* Jan 12, 2018 hour 00 UTC for sensor 4 */
"z": [
00: [ /* 00h00m */
00: [ 0.1, 0.0, -0.1, ... ], /* 00h00m00s */
01: [ 0.1, 0.0, -0.1, ... ], /* 00h00m01s */
02: [ 0.1, 0.0, -0.1, ... ], /* 00h00m02s */
...
59: [ 0.1, 0.0, -0.1, ... ] /* 00h00m59s */
], ...
]
}
In this way, to get subset of data using $slice can be done only at the minute level, for example if I want to get data from 00:00:00 to 00:00:01, I need to get the whole minute of 00:00 (containing 60 seconds) from MongoDB, then get the second(s) I need in application. Also if I want to get data from 00:00:59 to 00:01:01 then I'll need to get two whole minutes, then in application subset each of them then merge them back. There is a bit of IO waste in this, also some complexity in the app. BTW I have no need to retrieve individual samples, the smallest unit of retrieval (and storage) is a second.
I'm considering a slightly different approach where the hour document is divided directly into array of seconds (as there are 3600 seconds in an hours) and then array of samples. This means to get a data of 5 seconds I will retrieve exactly 5 second of arrays (even if in two different documents, if the time range crosses the hour). There will still be application logic of merging two parts of seconds in different documents, but simpler than the hour/minute/second hierarchy.
{
"_id": "2018011200:4", /* Jan 12, 2018 hour 00 UTC for sensor 4 */
"z": [
0: [ 0.1, 0.0, -0.1, ... ], /* 00h00m00s */
1: [ 0.1, 0.0, -0.1, ... ], /* 00h00m01s */
2: [ 0.1, 0.0, -0.1, ... ], /* 00h00m02s */
...
3599: [ 0.1, 0.0, -0.1, ... ] /* 00h59m59s */
]
}
However, I'm also worried that the alternative approach has weaknesses that I'm not aware of.
Which one do you recommend better? What are potential pitfalls that I need to consider? Or perhaps I should consider another design?
Thank you in advance.
I think you very much overcomplicated your data model.
Updating a document is much more complicated than simply inserting one. And since your granularity seems to be seconds, we are well within the granularity the BSON datatype UTC datetime provides: it is granular to the millisecond.
So as per your data model, assuming that you get a single value per write, simply use something like that:
{
_id: new ObjectId(),
value: 0.1,
sensor: 4,
ts: new ISODate()
}
With this data model, we make sure that writes are as cheap as possible without sacrificing information. Then, you can use MongoDB's aggregations to query your data for interesting values. A simple example would be to count the number of values you have for sensor 4 between 2018-01-01T00:00:00.000Z and 2018-01-02T23:59:59.999Z:
db.values.aggregate([
{"$match":{"sensor":4,"ts":{"$gte":ISODate("2018-01-01"),"$lt":ISODate("2018-01-02")}}},
{"$sort":{"ts":-1}},
{ "$group": {
"_id": {
"year": { "$year": "$ts" },
"dayOfYear": { "$dayOfYear": "$ts" },
"hourOfDay": {"$hour":"$ts"},
"minuteOfHour": {"$minute":"$ts"},
"secondOfMinute": {
"$subtract": [
{ "$second": "$ts" },
{ "$mod": [{ "$second": "$ts"}, 1] }
]
}
},
"count": { $sum: 1 }
}},
],{"allowDiskUse":true})
Even better, you can use the $out stage to save your aggregations for faster access.
EDIT: Please note that you have to make proper use of indexing to make this approach efficient. By itself, even with my rather limited test set of 50M sample documents, the aggregation takes seconds. With indexing, we are talking of around 80 msecs, to give you an impression.

Solr, multivalued field: how can I return documents where ALL values in the field are contained within a set?

For example, if I have these 2 Documents:
id: 1
multifield: 2, 5
id: 2
multifield: 2, 5, 9
Then say I have a set that I'm querying with, which is {2, 5, 7}. What I would want is document 1 returned because 2 and 5 are both contained in the set. But document 2 should not be returned because 9 is not in the set.
Both the multivalued field and my set are of arbitrary length. Hopefully that makes sense.
Figured this out. This was the inspiration, specifically the answer suggesting to use Function Queries.
Using the same data in the question, I will add a calculated field to my documents which contains the number of values in my multivalued field.
id: 1
multifield: 2, 5
nummultifield: 2
id: 2
multifield: 2, 5, 9
nummultifield: 3
Then I'll use an frange with some function queries. For each item in my set, I'll use the termfreq function which will return 1 or 0. I will then sum up all of these values. Finally, if that sum equals the calculated field nummultifield, then I know that for that document, every value in the document is present in the set. Remember my set is 2,5,7 so my function query will look something like this:
fq={!frange l=0 u=0}sub( nummultifield, sum( termfreq(multifield,2), termfreq(multifield,5), termfreq(multifield,7)))
If we fill in the values for Document 1 and 2, it will look like this:
Document 1: sub( 2, sum( 1,1,0 ) ) = 0 ' in my range of {0,0} so Doc 1 is returned
Document 2: sub( 3, sum( 1,1,0 ) ) = 1 ' not in the range of {0,0} so not returned
I've tested it out and it works great. You need to make sure you don't duplicate any values in multifield or you'll get weird results. Incidentally, this trick of using frange could be used whenever you want to fake a boolean result from one or more function queries.
Faceting may be the what you are looking for.
http://wiki.apache.org/solr/SolrFacetingOverview
http://www.lucidimagination.com/devzone/technical-articles/faceted-search-solr
how to search for more than one facet in solr?
I adapted this from the Lucid Imagination link.
Choose all documents that have values 2 or 5 or 7:
http://localhost:8983/solr/select?q=*
&facet=on
&facet.field=multifield
&fq=multifield:2
&fq=multifield:5
&fq=multifield:7
Incomplete: I dont know any options to exclude all other values.

Resources