How Influx data rollups works wrt to strings - database

I have below data, i want the data to be grouped by metric and aggregated by value and other columns like new to be retained as is provided the value is same.
name: test
time metric new value
1658769634245415350 pdu tr 0.65
1658769640910397903 pdu tr 0.66
1658769643676297843 pdu tr 0.67
But when i execute below query i get only mean of value but i want the other columns to be in response indicating what is the metric for the value is
select mean("value") from test GROUP BY "metric"
name: test
tags: metric=
time mean
0 0.662

Related

How to subtract a value in one table from a value in a different table based on common ID in calculation field

I'm trying to subtract a value in one field from a value in another table, depending on ID number using Filemaker Pro 19.x. I thought I'd done this before without any problems but now it isn't working and I don't know why.
First, I want to do this in a calculation field, not in a script. I don't understand scripting at all and use it only when there is no alternative. In this case, I think a calculation field should work.
I have 2 tables, I'll call them "Data" and "Categories"
Both tables have the field "CID" "Category ID".
The CID fields in both tables are linked in the Relationship Editor
The Data table has a field "Product ID"
The Categories table has several fields related to products. Two of those are "MIN PID" and "MAX PID". These are the minimum and maximum product ID numbers.
Product IDs are assigned in ranges. If it is within a certain range, it has to belong to a certain category. These categories have CID numbers.
To assign the CID number to the products listed on the Data table, I did run a script that essentially recreated all the data within the Categories table. It was inefficient (in my eyes) because the data was sitting right there in the table. I couldn't figure out how to reference it, so I gave up and ran the script instead. The other problem is that if the CID ever changes for a product, I have to rerun the script (or someone else, who might not know about the script)
That said, I now have the correct CID assigned for all 62 product categories. What I want to do now is to use the CID MIN and CID MAX values (among others) in calculation fields in the Data table.
For instance, if the product ID is "45,001", it belongs to the Category "16", which has a MIN value of "30,000" and a MAX value of "50,000". I want to subtract the "30,000" from the "45,001", (PID - CID MIN) to return the result "15,001". This allows me to locate a product within the category. Instead of being "Product 45,001", it will be "Product 16.15001".
The calculation I tried is:
If (CID = CID::CID ; PID - CID::CID MIN)
The result is a huge group of records where this field is empty.
I also tried:
PID - CID::CID MIN
Again, no results.
Both tables have the field "CID"
The two CID fields are linked in the relationship editor.
I have tried this with a test file and it works perfectly. It does not work in the database I am working within.
What am I doing wrong?

Unable to COUNTROWS dataset B from tablix with dataset A scope, looking for alternate solutions

I have 2 datasets (dsMain, dsHoliday). I've been trying to get the number of holidays between the DateStart and DateEnd of a tablix with a dataset scope dsMain, but as expected it isn't working since the dsHoliday is outside of the tablix's dataset scope.
I'd like to mention, dsMain is pulled from a SQL query while the dsHoliday is pulled from a Power BI dataset (it was originally an Excel file that I loaded into a PBI dataset so that I can pull it into PBI Report Builder).
I only need to get the number of holidays between the two dates for each record. That said, is there some other way I can make this work given the circumstances? I tried the formula below but was getting an error (depending on whether I add it as a calculated field or a field expression). Is it possible maybe to use both dates as a LOOKUPSET condition to pull the dates into just one text line and count the number of date values in the combined text? (eg: for ID:1, "05/02/2022, 05/03/2022")
Any leads or alternate solutions I can try?
=SUM(
IIF(Fields!Date.Value >= Fields!DateStart.Value AND Fields!Date.Value <= Fields!DateEnd.Value, 1, 0),
"dsHoliday"
)
Error (when adding formula as an expression):
The Value expression for the text box 'Textbox2' refers to the field 'DateStart'.
Report item expressions can only refer to fields within the current dataset scope or,
if inside an aggregate, the specified dataset scope. Letters in the names of fields
must use the correct case.
Error (when adding formula as a calculated field):
The expression used for the calculated field 'Test' includes an aggregate,
RowNumber, RunningValue, Previous or lookup function. Aggregate, RowNumber,
RunningValue, Previous and lookup functions cannot be used in calculated
field expressions.
Sample dataset dsMain:
ID
Country
DateStart
DateEnd
1
Tunisia
05/01/2022
05/03/2022
2
Tunisia
05/02/2022
05/04/2022
3
Vietnam
05/03/2022
05/03/2022
4
Vietnam
05/01/2022
05/04/2022
5
Tunisia
05/01/2022
05/01/2022
Sample dataset dsHoliday:
Country
Date
IsHoliday
IsWeekend
Tunisia
05/02/2022
TRUE
FALSE
Tunisia
05/03/2022
TRUE
FALSE
Vietnam
05/02/2022
TRUE
FALSE
Vietnam
05/03/2022
TRUE
FALSE

Updating cumulative data per time unit via trigger or via schedule

I have 100K records table that record activities of users.
I want to show some charts base on that data value per time unit. So I've created a chart base table.
Except time unit key column (1,2,3,4...) there are two more columns in the chart base table:
Absolute - example: 1, 2, 7, 20 ...
Accumulated and growing - example: 1, 3, 10 , 30 ...
Which is better?
Update Trigger (when status change) - CPU work light only on specific record on every change to the table. Accumulative value will be pass to the next time unit by internal select.
or
Aggregation Updating each time unit (by schedule) - CPU work hard on all records. Accumulative value will be calculated on all records till current time unit.

From the ouput of Cross Distances operator of Rapid Miner, how to find 'Request set' row number/numbers from the 'Reference set'

I am new to Rapid Miner learning studio and its operators, while working with Rapid Miner i got stuck with a strange doubt and the issue is described issue -
I have a data set of 100 rows and i am inputting this set to 'Filter Example Range' operator
Output of 'Filter Example Range' operator will be 'Example set' and 'Original Set'
'Filter Example Range' output is set as input to 'Cross Distances' operator. One is 'Request set' with - first example: 5 and last example: 5 (this is 'Example set' of 'Filter Example Range' and number 5 indicates the row number from the actual) The other input is 'Reference Set' - 100 rows of data (this is 'Original set' of Filter Example Range' operator)
From 'Cross Distances' operator we get three outputs. One is 'result set', 'Request set' and 'Reference set' (these both are inputs supplied too)
Now after getting the output from 'Cross Distances' operator, i want to know what is the row number of 'Request set' from the supplied 'reference set'.
Is there any chance to make a comparison of these both sets in 'Execute R' operator? or i request someone to please help me with any alternative.
The Cross Distances operator needs an id attribute and will add one if this is not present in the input example sets. The id attribute is a special attribute and is not used to calculate distances; only regular attributes are used for this. If the input example set contains an attribute called id that is regular, the operator changes this to be special thereby excluding it from the distance calculation.
The output is a distance between pairs and each pair is referred to using the id from each input.
So if the output looks like this (using the iris data set and selected the fifth one to be the request input and all the rest as the document input).
request document distance
id_5 id_5 0.0
id_5 id_1 0.141
it means that id_5 in the request and id_5 in the document are 0 distance apart, id_5 in the request and id_1 in the document are 0.141 apart.
For id_1 and id_5 in the iris data set, the data is as follows.
id a1 a2 a3 a4
id_1 5.1 3.5 1.4 0.2
id_5 5.0 3.6 1.4 0.2
The distance is
sqrt((5.1-5.0)^2 + (3.5-3.6)^2 + (1.4-1.4)^2 + (0.2-0.2)^2)
which is sqrt(0.01 + 0.01 + 0 + 0)
and this becomes 0.141.
Yes, you can do it in R with Execute R Operator. For this you need the compare() function from compare package. To compare two datasets with this function you need to check, if two columns from both datasets have the same type. By executing this function you can specify different arguments, for example if you suggest that the second data set is just a piece of the first one, then set "shorten=TRUE". Othe useful arguments are for example ignoreOrder, ignoreCase and ignoreColOrder.
What you can try in RapidMiner is just a join or Generate Attributes - for the second way you can extract macros from you "small" example set and check, if any row of the "larger" set hat these macros.
Cross Distances operator is IMHO to slow and not too much about data during the transforming procrss. Therefore it can be useful only at specific tasks.

Returning only specific rows (eg. every 10th: #1, #11, #21...) from query

I need to fetch only specific (kind of "nth rows") from a Solr index. For example, if the full result contains 10000 rows, I want to receive only the first and last row of each 100 item bucket.
items 1 and 100
items 101 and 200
items 201 and and 300...
This grouping is dynamic and dependent on the number of results. So, if there are only 5000 total result rows, bucket size is 50 instead of 100. I can calculate the actual indexes but the problem is how to fetch those from Solr.
There are no indexed fields that could be used directly as query parameters. In practise, I am doing a search "name starts with A" (or some other letter) and want to receive 1st item starting with A, 100th item starting with A, 101st item starting with A etc...
Query parameters http://wiki.apache.org/solr/CommonQueryParameters have "rows" and "start" but these can't skip items, so I would need to get each item with a separate query which is inefficient. I was also thinking about implementing a Filter Query which would just filter out items 2...99, 192...199 but I do not know how to implement that.
I don't know of an easy way to do this, but this will reduce the amount of data that needs to be passed back and forth: Do a regular query with the usual start and rows parameters, but tell Solr to only return the ID field of each document (via the fl parameter). In your client code, store the IDs of the first and last documents, and repeat the query with the next value for start. Once you reach the end of the search results, you have a list of the document IDs you want. Run a new query and give it the list of document IDs you want returned, and this time get the full documents.

Resources