How can I query last 10 data in Influxdb2.0 - database

I tried to query the latest 10 of data, but I only saw from the official documentation how to query the latest 1 of data, has anyone tried it?
|> last()

You need two functions:
Use the sort function with the desc flag set to "true":
|> sort(columns: ["_value"], desc: true)
Then use the limit function to get only 10 rows of the output, which in your case are the 10 latest points since you sorted the output in descending order:
|> limit(n: 10)

Related

Calculate working days between two dates: null value returned

I'm trying to figure out the number of working days between two dates. The table (dfDates) is laid out as follows:
Key
StartDateKey
EndDateKey
1
20171227
20180104
2
20171227
20171229
I have another table (dfDimDate) with all the relevant date keys and whether the date key is a working day or not:
DateKey
WorkDayFlag
20171227
1
20171228
1
20171229
1
20171230
0
20171231
0
20180101
0
20180102
1
20180103
1
20180104
1
I'm expecting a result as so:
Key
WorkingDays
1
6
2
3
So far (I realise this isn't complete to get me the above result), I've written this:
workingdays = []
for i in range(0, len(dfDates)):
value = dfDimDate.filter((dfDimDate.DateKey >= dfDates.collect()[i][1]) & (dfDimDate.DateKey <= df.collect()[i][2])).agg({'WorkDayFlag': 'sum'})
workingdays.append(value.collect())
However, only null values are being returned. Also, I've noticed this is very slow and took 54 seconds before it errored.
I think I understand what the error is about but I'm not sure how to fix it. Also, I'm not sure how to optimise the command so it runs faster. I'm looking for a solution in pyspark or spark SQL (whichever is easiest).
Many thanks,
Carolina
Edit: The error below was resolved thanks to a suggestion from #samkart who said to put the agg after the filter
AnalysisException: Resolved attribute(s) DateKey#17075 missing from sum(WorkDayFlag)#22142L in operator !Filter ((DateKey#17075 <= 20171228) AND (DateKey#17075 >= 20171227)).;
A possible and simple solution:
from pyspark.sql import functions as F
dfDates \
.join(dfDimDate, dfDimDate.DateKey.between(dfDates.StartDateKey, dfDates.EndDateKey)) \
.groupBy(dfDates.Key) \
.agg(F.sum(dfDimDate.WorkDayFlag).alias('WorkingDays'))
That is, first join the two datasets in order to link each date with all the dimDate rows in its range (dfDates.StartDateKey <= dfDimDate.DateKey <= dfDates.EndDateKey).
Then simply group the joined dataset by the date key and count the number of working days in its range.
In the solution you proposed, you are performing the calculation directly on the driver, so you are not taking advantage of the parallelism that spark offers. This should be avoided when possible, especially for large datasets.
Apart from that, you are requesting repeated collects in the for-loop, even for the same data, resulting in a further slowdown.

neo4j How to get a subset using a forloop as a subquery?

For each person, I want to get the first 5 events and last 5 events (based on eventTime). I want to compare the winrate of the first 5 event to the last to see the most improved person. I am struggling to find a way to handle the for loop logic in neo4j.
GDB Schema:
(p: Person) -[:PlaysIn]-> (e:Event {eventTime:, eventOutcome:})
The apoc.coll.sortNodes function does the trick for you. See https://neo4j.com/labs/apoc/4.1/overview/apoc.coll/apoc.coll.sortNodes/
MATCH (p:Person)
WITH p, apoc.coll.sortNodes([(p)-[:PlaysIn]->(e:Event) | e ], 'eventTime') AS events
RETURN p,
events[0..5] AS first5Events,
events[..-5] AS last5Events

Show ALL items in order in Mongodb database

For some reason when I run db.products.find().pretty(), it doesn't list all the items in my database, and the ones it does list are not in order. Any idea why or how to list everything? It does give me the option to run 'it' after to show more, but it still doesn't show them all or in order. I just want to see all 100 products in order and pretty().
I can understand it not being on order, of productId, because I may not know to do so, but at least can I get it to list everything??
for setting order you can use sort().
db.sortData.find().sort({id:-1}).pretty()
here, -1 = Descending Order and 1 = Ascending Order on id field of collection.
By Default, mongo shell batch size returns 20 records at a time, then show more have to enter, if you want to changes size you can fire this command.
DBQuery.shellBatchSize = 30
so, now 30 records of collection mongo shell returns rather than 20.

ArangoDB: Insert as function of query by example

Part of my graph is constructed using a giant join between two large collections, and I run it every time I add documents to either collection.
The query is based on an older post.
FOR fromItem IN fromCollection
FOR toItem IN toCollection
FILTER fromItem.fromAttributeValue == toItem.toAttributeValue
INSERT { _from: fromItem._id, _to: toItem._id, otherAttributes: {}} INTO edgeCollection
This takes about 55,000 seconds to complete for my dataset. I would absolutely welcome suggestions for making that faster.
But I have two related issues:
I need an upsert. Normally, upsert would be fine, but in this case, since I have no way of knowing the key up front, it wouldn't help me. To get the key up front, I would need to query by example to find the key of the otherwise identical, existing edge. That seems reasonable as long as it doesn't kill my performance, but I don't know how in AQL to construct my query conditionally so that it inserts an edge if the equivalent edge does not exist yet, but does nothing if the equivalent edge does exist. How can I do this?
I need to run this every time data gets added to either collection. I need a way to run this only on the newest data so that it doesn't try to join the entire collection. How can I write AQL that allows me to join only the newly inserted records? They're added with Arangoimp, and I have no guarantees on which order they'll be updated in, so I cannot create the edges at the same time as I create the nodes. How can I join only the new data? I don't want to spend 55k seconds every time a record is added.
If you run your query as written without any indexes, then it will have to do two nested full collection scans, as can be seen by looking at the output of
db._explain(<your query here>);
which shows something like:
1 SingletonNode 1 * ROOT
2 EnumerateCollectionNode 3 - FOR fromItem IN fromCollection /* full collection scan */
3 EnumerateCollectionNode 9 - FOR toItem IN toCollection /* full collection scan */
4 CalculationNode 9 - LET #3 = (fromItem.`fromAttributeValue` == toItem.`toAttributeValue`) /* simple expression */ /* collections used: fromItem : fromCollection, toItem : toCollection */
5 FilterNode 9 - FILTER #3
...
If you do
db.toCollection.ensureIndex({"type":"hash", fields ["toAttributeValue"], unique:false})`
Then there will be a single full table collection scan in fromCollection and for each item found there is a hash lookup in the toCollection, which will be much faster. Everything will happen in batches, so this should already improve the situation. The db._explain() will show this:
1 SingletonNode 1 * ROOT
2 EnumerateCollectionNode 3 - FOR fromItem IN fromCollection /* full collection scan */
8 IndexNode 3 - FOR toItem IN toCollection /* hash index scan */
To only work on recently inserted items in fromCollection is relatively easy: Simply add a timestamp of the import time to all vertices, and use:
FOR fromItem IN fromCollection
FILTER fromItem.timeStamp > #lastRun
FOR toItem IN toCollection
FILTER fromItem.fromAttributeValue == toItem.toAttributeValue
INSERT { _from: fromItem._id, _to: toItem._id, otherAttributes: {}} INTO edgeCollection
and of course put a skiplist index on the timeStamp attribute in fromCollection.
This should work beautifully to discover new vertices in the fromCollection. It will "overlook" new vertices in the toCollection that are linked to old vertices in fromCollection.
You can discover these by interchanging the roles of the fromCollection and the toCollection in your query (do not forget the index on fromAttributeValue in fromCollection) and remembering to only put in edges if the from vertex is old, like in:
FOR toItem IN toCollection
FILTER toItem.timeStamp > #lastRun
FOR fromItem IN fromCollection
FILTER fromItem.fromAttributeValue == toItem.toAttributeValue
FILTER fromItem.timeStamp <= #lastRun
INSERT { _from: fromItem._id, _to: toItem._id, otherAttributes: {}} INTO edgeCollection
These two together should do what you want. Please find the fully worked example here.

SSRS MDX for Data from previous two weeks

I am trying to get the data fromt the present calender week to the previous two weeks. If I specify the date range as parameters in the MDX query, the table gets filled up properly.
SELECT NON EMPTY {[Measures].[planned_cumulative],[Measures].[Last Forecast CW] } ON COLUMNS,
NON EMPTY { ([Project].[Projekt-Task].[Task].ALLMEMBERS* { [Date].[Date - CW].[Week].&[2014]&[201420] : [Date].[Date - CW].[Week].&[2014]&[201422]})} ON ROWS FROM [DWH]
But if I try and use the lag function, I get an error. Here is the MDX query.
SELECT NON EMPTY {[Measures].[planned_cumulative],[Measures].[Last Forecast CW] } ON COLUMNS,
NON EMPTY { ([Project].[Projekt-Task].[Task].ALLMEMBERS* [Date].[DATECW - CW].[Week].CurrentMember.Lag(2) ) } ON ROWS
FROM [DWH]
Your first query (quite rightly) identifies a range of two weeks, by using two members from the dimension separated by a colon i.e. [Week X] : [Week Y].
Your second query uses the LAG function, which will return a member on your dimension, not a range of members.
Try this:
SELECT NON EMPTY {[Measures].[planned_cumulative],[Measures].[Last Forecast CW] } ON COLUMNS,
NON EMPTY {[Project].[Projekt-Task].[Task].ALLMEMBERS * TAIL([Date].[Date - CW].[Week],3) } ON ROWS
FROM [DWH]
Also, your first query refers to [Date].[Date - CW].[Week] whereas your second query uses [Date].[DATECW - CW].[Week]. I have gone with the first option as you say that query works.
Let me know if that helps.
Ash
EDIT: Apologies for that. I have amended the query to use the TAIL function, which will give you a number of tuples from a set starting at the end and working backwards. So with TAIL ({Set Of Weeks},3) this should give you the current week and the previous two weeks. I have tested a similar query on my cube and it seems produce the expected results.
Hope that solves it for you.
Ash

Resources