apache solr max and min in single query - solr

We are migrating from mysql to apache solr since solr is fast in searching. Thank you. We had a scenario to
find 1) difference (max-min)
2) with group by date(timeStamp)
Given below is our mysql table :
And our mysql query is,
SELECT Date(eventTimeStamp), MAX(field) - MIN(field) AS Energy FROM PowerTable GROUP BY DATE(eventTimeStamp);
will results,
So we have to calculate difference per day, where date column is in datetime format.
To reflect/migrate above mysql query in apache solr, we are using result grouping as
group=true&group.query=eventTimeStamp:[2019-12-11T00:00:00Z TO 2019-12-11T23:59:59Z]&group.query=eventTimeStamp:[2019-12-12T00:00:00Z TO 2019-12-12T23:59:59Z]
Using Apache solr statistics option, we are able to calculate max and min for whole result, But we need max and min value per day basis.
When we try to get max and min value per day basis, we are able to fetch either min or max using following query.
&group.sort=event1 desc or &group.sort=event1 asc
Definitely you have to spend some effort/time to understand this question.
So how to find both min and max in single query (per group; not for whole result).

Related

Best way to handle time consuming queries in InfluxDB

We have an API that queries an Influx database and a report functionality was implemented so the user can query data using a start and end date.
The problem is that when a longer period is chosen(usually more than 8 weeks), we get a timeout from influx, query takes around 13 seconds to run. When the query returns a dataset successfully, we store that in cache.
The most time-consuming part of the query is probably comparison and averages we do, something like this:
SELECT mean("value") AS "mean", min("value") AS "min", max("value") AS "max"
FROM $MEASUREMENT
WHERE time >= $startDate AND time < $endDate
AND ("field" = 'myFieldValue' )
GROUP BY "tagname"
What would be the best approach to fix this? I can of course limit the amount of weeks the user can choose, but I guess that's not the ideal fix.
How would you approach this? Increase timeout? Batch query? Any database optimization to be able to run this faster?
In such cases where you allow user to select in days, I would suggest to have another table that stores the result (min, max and avg) of each day as a document. This table can be populated using some job after end of the day.
You can also think changing the document per day to per week or per month, based on how you plot the values. You can also add more fields like in your case, tagname and other fields.
Reason why this is superior to using a cache: When you use a cache, you can store the result of the query, so you have to compute for every different combination in realtime. However, in this case, the cumulative results are already available with much smaller dataset to compute.
Based on your query, I assume you are using InfluxDB v1.X. You could try Continuous Queries which are InfluxQL queries that run automatically and periodically on realtime data and store query results in a specified measurement.
In your case, for each report, you could generate a CQ and let your users to query it.
e.g.:
Step 1: create a CQ
CREATE CONTINUOUS QUERY "cq_basic_rp" ON "db"
BEGIN
SELECT mean("value") AS "mean", min("value") AS "min", max("value") AS "max"
INTO "mean_min_max"
FROM $MEASUREMENT
WHERE "field" = 'myFieldValue' // note that the time filter is not here
GROUP BY time(1h), "tagname" // here you can define the job interval
END
Step 2: Query against that CQ
SELECT * FROM "mean_min_max"
WHERE time >= $startDate AND time < $endDate // here you can pass the user's time filter
Since you already ask InfluxDB to run these aggregates continuously based on the specified interval, you should be able to trade space for time.

Solr - calculate duration from start and end dates

Using Solr 8.0.0, with each document holding a start timestamp field and an end timestamp field, how would I query in a way that returns just the duration between these dates? So I would be going for an equation like this:
(Endtime - Starttime) - 500 seconds = 23 seconds over expected duration.
But getting the result across all documents in the collection.
Would this be the subject of a streaming expression? Any example code you can give? I specifically want to keep this calculation load within the SolrCloud.
You can use a function query. The ms function gives you the difference in milliseconds between two dates. You can use sub to subtract 500 seconds from that number.
You can use the frange query parser to filter documents that match a given range. Which means we end up with something like:
q={!frange l=0}sub(ms(endtime,starttime),500000)

Sunspot solr index search and index on range data

I am storing availability timing for my users where they enter for each day of week what timings they would be available
for example - Mr X would be available on
sunday for 2-5, 8-12, 15-18
monday for 1-3, 5-8, 10-12
and so on entire week
what would be the best possible way of indexing and searching this data in solr
a database query for searching such a dataset would be like
select * from schedule inner join days on schedule.day_id = days.id
where days.name = 'Sunday' and schedule.start>=5 and schedule.end>=8
Use the DateRangeField which became available in Solr5. This allows you to query for documents that contain ranges that matches your query time.
fq={!field f=dateRange op=Contains}[2013 TO 2018]
Before Solr 5 there's a neat hack that uses the spatial support in Solr to query for overlapping durations (if this point is contained within the expected time area, etc.).
Depending on the needed resolution, you could also index seven different fields (monday - sunday) and then index an integer for each hour that the person is available. You can then query the field with a regular query, such as available_sunday:15 to find matching persons.

Tableau – Using Nested Aggregations to Establish a Weekday/Hour Baseline

Background Information: We have an incident time tracker that tracks how long each user spends with a representative before the issue can be closed. We want to determine the average volume of incidents that are being handled for each hour. To say this in another way: We want to get an hourly baseline for each day of the week that will show us the average total call length within the specific time period. Eg: We want to average the total length of every call on Monday from 9AM-10AM for all the weeks in the database, and the same for other hourly intervals.
The simplest way to think of this is that I want AVG(SUM) for the specific time periods, but Tableau does not allow me to do this.
Tableau Output:
This is the desired, target visualization that I am looking for from Tableau.
SQL Query:
I have written a SQL query that returns the answer:
We are looking at two columns: start_time (time stamp) and interval_seconds(float)
In the inner query I use the hour_start function which truncates the date/time value to the hour start, so I can group by the hour and day of the week in the outer query.
SQL Results:
Question:
Is there a way to solve this problem ENTIRELY in Tableau that would get me the result that I am looking for without having to write any SQL code?
Files Stored on Drive
CSV File:
https://drive.google.com/open?id=0B4nMLxIVTDc7NEtqWlpHdVozRXc
Tableau Worksheet:
https://drive.google.com/open?id=0B4nMLxIVTDc7M3A4Q0JxbGdlTE0
You can use Level of Detail expressions to compute the SUM(interval_seconds) at the hour level and then use AVG to calculate the number you are looking for.
I created a couple of calculations:
hour which is defined as: DATETRUNC('hour',[start_time])
this should be equivalent to your hour_start(start_time).
and interval_hours which is defined as {FIXED [hour] : SUM([interval_seconds])/3600 }
This calculates the aggregate for each start_time truncated to the hour.
After this, you simply calculate AVG(interval_hours) and use it in your view.
I put a workbook in dropbox: https://www.dropbox.com/s/3hfvz8w529g9f46/Interval%20Time%20Baseline.twbx?dl=0
Although the chart looks similar to yours, the numbers I came up with are somewhat different from the "SQL Results" you show. Was the data you provided slightly different?

Sum field by date range

I'm using solr 3.6 and I'm kinda stuck trying to perform a special query.
I'm actually using facets by date range, the face.date.gap is set to +1DAY. Of course, the facet is supposed to return the count of docs at a date range but I also need to get the sum of a special field at the same ranges used in facet. It's like I need to count how many votes I have daily monthly, weekly, whatever... it depends on the gap params.
Any ideas? Should I use the group.query or facet.query?
One suggestion I have is to treat the weeks, days separately, and index them. For ex. Today is part of 24th week. Another suggestion is not to rule out multiple searches to service one request. One to calculate all oth facets and one to return counts for given date range (based on search results from first query).

Resources