Compare a property value to an aggregated value in Gremlin - graph-databases

I'm trying to implement a sort of "window function" in a Gremlin query: I want to select all the edges leaving a vertex, which have a timestamp within 24 hours of the last update (local to the vertex).
For example if User A accessed the following resources:
Resource 1 at 2019/04/02 23:00
Resource 2 at 2019/04/02 01:00
Resource 3 at 2019/04/01 22:00
.. then I'd want the query to return resources 1 & 2, and omit resource 3 because it was accessed 25 hours before User A's latest access (outside the 24-hour window).
I've tried a few different approaches, for example using local and aggregate:
g.V()
.hasLabel(VertexLabel.User)
.local(__.outE(EdgeLabel.Accesses) // I also tried "sideEffect" here
.values(EdgeProperties.UpdateTime).max().math("_ - 24*60*60*1000")
.aggregate("windowStart"))
.where(
__.outE(EdgeLabel.Accesses)
.has(EdgeProperties.UpdateTime, P.gt("windowStart"))
)
This particular example gives me the error ClassCastException: java.lang.Double cannot be cast to org.apache.tinkerpop.gremlin.structure.Element.
And also using a sack:
g.V()
.hasLabel(VertexLabel.User)
.sack(Operator.assign).by(
__.outE(EdgeLabel.Accesses).values(EdgeProperties.UpdateTime).max())
.sack(Operator.minus).by(__.constant(24*60*60*1000)
.where(
__.outE(EdgeLabel.Accesses)
.not(__.sack().is(P.gt(__.values(EdgeProperties.UpdateTime))))
)
This gives me the error ClassCastException: org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.DefaultGraphTraversal cannot be cast to java.lang.Long.
I feel like I'm just getting hung up on Gremlin semantics--I'm trying to compare the values in the wrong form. What do I need to do to access the "windowStart" value for the current vertex in the traversal, in a gt/lt predicate?

I've made a few assumptions on what the answers to my comments could be. The following query will give you every user and his respective accessed resources within the last 24 hours (with the reference time being the times of the last resource access time):
g.V().hasLabel(VertexLabel.User).
match(__.as("user").map(outE(EdgeLabel.Accesses).
values(EdgeProperties.UpdateTime).max()).
math("_-24*60*60*1000").as("m"),
__.as("user").outE(EdgeLabel.Accesses).
where(gt("m")).
by(EdgeProperties.UpdateTime).
by().
inV().fold().as("resources")).
select("user","resources")

Related

Generate Sequential Application number

I have a requirement where I need to generate Unique Application Number and in sequential format.
Sample Application Number - APP-Date-0001 the 001 will keep on increasing for entire day and counter should be reset next day. So for next day it should again start from 001 with current date.
The problem will occur when 2 users are creating application at the same time.
Keep counter and last date it was used in custom setting or similar object.
But access that custom setting with normal SOQL, not via the special custom setting methods like getInstance().
Finally - in that SOQL query use FOR UPDATE. https://developer.salesforce.com/docs/atlas.en-us.soql_sosl.meta/soql_sosl/sforce_api_calls_soql_select_for_update.htm
If 2 operations start on same time - 1 will be held until other one finishes or timeout happens

Multi Condition Array With DateValue

I'm looking to create an Array Formula in Google Sheets that checks if two dates are within certain time periods.
Currently, I am using:
=ARRAYFORMULA(IFERROR(IF(DATEVALUE($B3:$B)-DATEVALUE($A3:$A)<E2,TRUE,FALSE),"")
Which is giving me the following results:
The issue is that if a value is True within 90 days, it remains true within 7300 days. I would like each column to be exclusive to its current period. I had hoped the following would work, however it just makes everything to True.
=ARRAYFORMULA(IFERROR(IF(DATEVALUE($B3:$B)-DATEVALUE($A3:$A)<90 & DATEVALUE($B3:$B)-DATEVALUE($A3:$A)>1095,TRUE,FALSE),""))
Anyone know if there is a way for this to work? I appreciate it a ton.
try:
=ARRAYFORMULA(IF(A3:A="",,{DAYS(B3:B, A3:A)<C2,
(DAYS(B3:B, A3:A)>=C2)*(DAYS(B3:B, A3:A)<D2)=1,
(DAYS(B3:B, A3:A)>=D2)*(DAYS(B3:B, A3:A)<E2)=1,
(DAYS(B3:B, A3:A)>=E2)*(DAYS(B3:B, A3:A)<F2)=1}))

Creating custom rollups with SSAS

I am currently working on a requirement as follows and would appreciate some help in figuring out a way to configure the aggregation of my measure:
I have a fact table that contains the following Item ID, DateID,StoreID, ReceivedComments. The way received comments work is that on a daily basis a new record is created that adds to the value of received comments (for example if Item 5 in Store 5 on 1 Jan had 23 Received Comments and it received 5 comments the following day, the row for Jan 2 would be Item 5, Store 5, Jan 2, 28)
We created a measure using MAX and it works fine whenever Item ID is used in the query. When we start moving to a higher level the max produces wrong results. Our requirement is to setup the measure to be as follows:
If the member selected is on the Item Level then MAX, if it's on any other level (Date or Store) then the measure should aggregate the Max of all Items under this date or store.
Due to the business rules and structure of the database Store and Item are different dimensions so I can not include them in 1 Hierarchy.
We have been playing around with Custom RollUps but so far haven't been able to get it to work.
Thanks
I would solve this by using a more traditional approach to your fact table. Instead of keeping a cumulative count in the ReceivedComments column, I would keep only the number of comments received THAT DAY.
That way, instead of using MAX, you can create your measure using SUM, and it will automatically rollup when you go to higher levels.
The only disadvantage I can see to this approach is that you will need to use a range of dates, instead of only the most recent date, to get a full total of all the comments for a given item/store/date. But that's a very small change to your MDX.
Someone suggested using ISLEAF to determine the level, Instead of using ISLeaf i went with AS CASE WHEN [Item].[ItemID].CURRENTMEMBER.LEVEL IS [Item].[ItemID].[(All)] so I don't have to account for other dimensions such as Date, Store, etc as I have several other dimensions that all behave the same way.
And then I went with this formula to determine the Sum of the Max of the items in a particular store like this:
SUM({[Item].[Item ID].children},[Measures].[ReceivedComments]), Now I expect some performance issues with this measure but we are currently running some tests to see if it's gonna be reliable to work with it on actual data.

strange appengine query result

What am I doing wrong in this query?
SELECT * FROM TreatmentPlanDetails
WHERE
accountId = 'ag5zfmRvbW9kZW50d2ViMnIRCxIIQWNjb3VudHMYtcjdAQw' AND
status = 'done' AND
category = 'chirurgia orale' AND
setDoneCalendarEventStartTimestamp >= [timestamp for 6 june 2012] AND
setDoneCalendarEventStartTimestamp <= [timestamp for 11 june 2012] AND
deleteStatus = 'notDeleted'
ORDER BY setDoneCalendarEventStartTimestamp ASC
I am not getting any record and I am sure there are records meeting the where clause conditions. To get the correct records I have to widen the timestamp interval by 1 millisecond. Is it normal? Furthermore, if I modify this query by removing the category filter, I am getting the correct results. This is definitely weird.
I also asked on google groups, but I got no answer. Anyway, for details:
https://groups.google.com/forum/?fromgroups#!searchin/google-appengine/query/google-appengine/ixPIvmhCS3g/d4OP91yTkrEJ
Let's talk specifically about creating timestamps to go into the query. What code are you using to create the timestamp record? Apparently that's important, because fuzzing with it a little bit affects the query. It may be relevant that in the datastore, timestamps are recorded as integers representing posix timestamps with microseconds, i.e. the number of microseconds since 1/1/1970 UTC (not counting leap seconds). It's also relevant that dates (i.e. without a time) are represented as midnight, i.e. the earliest time on that day. But please show us the exact code. (It may also be important to show the actual content of the record that you're attempting to retrieve.)
An aside that is not specific to your question: Entity property names count as part of your storage quota. If this is going to be a huge dataset, you might pay more $$ than you'd like for property names like setDoneCalendarEventStartTimestamp.
Because you write :
if I modify this query by removing the category filter, I am getting
the correct results
this probably means that the category was not indexed at the time you write the matching records to the data store. You have to re-write your records to the data store if you want them added to the newly created index.

How to combine two search results effectively?

I'm programming a site in PHP/MySQL that gets search results for products via API from an external site. This site also will have it's own products and the owners of the site want the search results to be inter-connected.
If someone searches for VIDEO, ordered by date then the results should be all in order regardless of the source it came from.
eg.
July 31 - Video A - our database
July 30 - Video B - via API
July 29 - Video C - via API
July 28 - Video D - our database
...
The problem I'm having is figuring out a way to do this effectively especially regarding viewing multiple pages of results. If someone clicks to the 2nd page of results then I need to figure out the last item on the first page of results (and the last item from the API), then only get the items from the API starting after the last API item viewed on the previous page and then do the same for our database results and re-combine them again.
In order to avoid this complex algorithm, another idea I had was to limit the results to a large amount - like 500 results and grab them all at once and order them. Then if the user goes forward a few pages, I do not have to re-grab all the data.
Does anyone have suggestions on good algorithms to use to combine two search results?
Whether you use it for caching or not, you will need to grab at least a page worth of results from both sources, in case all the next results will come from that source.
Grabbing a lot of results and caching them (in the session) is one solution you could use.
If for some reason you don't want to cache all the results (if the operation is expensive and you need this optimized), you could store a simple array in the session that contains the location of the results, and then you would know the starting number for the next page.
For example (pseudo code)
**Request 1**
Get 10 results from API
Get 10 results form Database
Merge the results
Display first 10 and save the order to an array
(A for API, D for Database, ex: A,A,A,D,A,D,D,A,D,A)
User clicks page 2
**Request 2** (Page 2)
Get 10 results from API starting at 5
Get 10 results from Database starting at 7
Repeat merge and display above.
You could also optionally cache what you have needed to retrieve so far (and you will have 10 extra results). This would make the first request longer, but could possibly make the second request much faster.
If the user jumps forward several pages, you would need to get the largest number of results that could have been displayed in the preceeding unknown pages from each source.
If you are not too worried about performance from either source, I would retrieve up to a large number like you said and cache all results temporarily. As soon as a new search is executed, dump the old results.

Resources