Querying Recurring Events In A Fauna FQL Database - database

I'm having some trouble querying recurring events using Fauna DB / FQL. I'm storing the events like so:
{
id: 1,
userId: 1,
title: "A Very Cool Title",
description: "A Basic Description",
date: {
day: 23,
month: 11,
year: 2022,
hour: 0,
minute: 0
},
frequency: {
minutes: 1,
hours: 1,
days: 1,
weeks: 1,
months: 1,
years: 1
}
}
The date is the original date of this event, and the frequency is how often it occurs. The frequency can be as low as 1 minute. I would like to be able to lookup all events that fall between a start and end date for a specific userId.
My first instinct is to just add the frequency to the date and check if the sum of that falls within the queried start and end date, repeating that until it exceeds the end date. However, since my frequency can be as low as 1 minute, I would have to repeat that a crazy number of times and it just seems inefficient.
This could be a problem with querying but it also may simply be a problem with storing the events differently, I'm not sure.

I would like to be able to lookup all events that fall between a start and end date for a specific userId.
Except for the starting events, all of your events are virtual and not amenable to "lookup". Instead, you'd have to calculate events.
There is no do { ... } while condition in FQL, and there are transaction limits that would prohibit processing many events.
You should definitely use the Timestamp type instead of an object to record dates. Then you can use the various Time() or Date() functions to perform the necessary time calculations.
Your model includes multiple levels of frequency, minutes, hours, etc. The problem is notably harder to solve if an event can recur every 1 year, 1 month, 1 week, 1 day, 1 hour, and 1 minute.
It becomes easier when you limit the expression of the frequency to an amount and a unit. For example:
frequency: { amount: 3, unit: "days" }
With that model, you could determine the number of events with TimeDiff(start_date, end_date, unit). For example, if an event started on October 17, the number of daily events between then and now would be:
> TimeDiff(Time("2022-10-17T00:00:00Z"), Now(), "days")
37
If you need to produce entries for each virtual event, then you'd bump into another limitation of FQL: there is no for loop. There is ForEach() or Map(), but you have to already have a set/array to iterate on.
So, you would have to combine a couple of tricks.
Once you have the number of virtual events, you can use Repeat() to generate a string based on a template that includes a delimiter:
> Repeat("a ", 37)
'a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a '
With that string, you can use RTrim() to remove the trailing space.
The you can use the recipe for creating a SplitString function. The you can call the function:
> Call("SplitString", "a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a", " ")
[
'a', 'a', 'a', 'a', 'a', 'a', 'a',
'a', 'a', 'a', 'a', 'a', 'a', 'a',
'a', 'a', 'a', 'a', 'a', 'a', 'a',
'a', 'a', 'a', 'a', 'a', 'a', 'a',
'a', 'a', 'a', 'a', 'a', 'a', 'a',
'a', 'a'
]
Now you have an array that can be used for the scope of the events, and iterate with Reduce() to start at the starting date, use TimeAdd() to add the frequency interval once for each entry in the array, and compute the virtual event's timestamp.
Without knowing the structure of any index you might be using for your event documents, I can't provide a query that demonstrates the complete effort, but these tips should help with the mechanics of the computation.

Related

How to code this in O(n) time instead of O(n^2)?

I have 3 lists:
Added spaces between commas for better understanding of groupby that I'll be using.
a=[1,1,1,1 ,2,2,2 ,3,3,3,3,3,3, 4,4]
b=['l2','l5','l1','l1' ,'l5','l2','' , 'l1','l3','l6','l2','l5','l1' , 'l5','l1']
c=['z','z','a','s' ,'z','z','a', 's','z','z','a','s','d' , 's','' ]
In list 'a' I have groups, and according to those groups I want to make changes in my list 'c' with respect to list 'b'.
List 'a' has group of 1s, so for indexes of those 1s I am checking list 'b' and wherever I find string 'l5', I want to make all the further indexes of that group empty string ( '' ) in list 'c'.
For example:
If list 'a' has [1,1,1,1] and list b has ['l2','l5','l1','11'].
I want to make indexes after 'l5' i.e. index 2 and 3 empty in list c.
Expected output of list c would be:
c= ['z','z','','']
I have written a code for this which works perfectly fine but I think this works in time complexity of O(n^2). Is there any way to optimize this code and and to make this work in O(n) time to make it faster?
Here's my code:
a=[1,1,1,1 ,2,2,2 ,3,3,3,3,3,3, 4,4]
b=['l2','l5','l1','l1' ,'l5','l2','' , 'l1','l3','l6','l2','l5','l1' , 'l5','l1']
c=['z','z','a','s' ,'z','z','a', 's','z','z','a','s','d' , 's','' ]
from itertools import groupby
g= groupby(a)
m=0
for group,data in g:
n = len(list(data)) #length of each group
m+=n #this stores the sum of length of groups (To get the last index of each group)
while m:
h=m-n #to get the beginning index of each group(total-length_of_currentgroup)
nexxt=0
for i in range(h,m): #this loops for each group (where h is start and m is ending index of each group)
if b[i]=='l5' and nexxt==0:
nexxt=i+1
while nexxt<m and nexxt!=0:
c[nexxt] = ''
nexxt+=1
break
print(c)
Output:
['z', 'z', '', '', 'z', '', '', 's', 'z', 'z', 'a', 's', '', 's', '']
Is there any way to write this in O(N) time?

Cypher - How to call a procedure multiple times in loop?

In Neo4j Browser, I tried to call a procedure multiple times in a loop, but Neo4j reported the same error: Query cannot conclude with CALL (must be RETURN or an update clause). Specifically,
With UNWIND (documentation):
UNWIND [10, 20] AS age_num
MATCH (n:User {name: 'a', age: age_num})
CALL apoc.nodes.delete(n)
...got Neo.ClientError.Statement.SyntaxError:
Query cannot conclude with CALL (must be RETURN or an update clause) (line 3, column 1 (offset: 68))
"CALL apoc.nodes.delete(n)"
^
With apoc.periodic.iterate() (documentation):
CALL apoc.periodic.iterate(
"UNWIND [10, 20] AS age_num MATCH (n:User {name: 'a', age: age_num}) RETURN n",
"CALL apoc.nodes.delete(n)",
{batchMode: 'SINGLE', parallel: false}
)
...got errorMessages:
{
"Query cannot conclude with CALL (must be RETURN or an update clause) (line 1, column 15 (offset: 14))\r\n\" WITH $n AS n CALL apoc.nodes.delete(n)\"\r\n ^": 1
}
The procedure apoc.nodes.delete() here is just an example. Please don't advise me on using DETACH DELETE instead.
Question: In Cypher, how is it supposed to call a procedure multiple times in a loop, each time might have a different parameter, e.g. a different property value?
Environment: Neo4j Desktop v4.0.4, Windows 8.1 x64.
You have to add a RETURN statement at the end of the query like the error states. Basically, if you only call a single procedure, then cypher won't bug you with this. But if you do any kind of MATCH before a procedure call, you have to end the query with RETURN. You could also just use DETACH DELETE cypher statement instead.
Version with DETACH DELETE:
UNWIND [10, 20] AS age_num
MATCH (n:User {name: 'a', age: age_num})
DETACH DELETE n
Version with APOC:
UNWIND [10, 20] AS age_num
MATCH (n:User {name: 'a', age: age_num})
CALL apoc.nodes.delete(n) YIELD value
RETURN distinct 'done'
Edit: I have fixed the output as per the OP comment

how to append the rows using indexing (or any other method)?

I have a data set with 6 columns and 4.5 million rows, and I want to iterate through all the data set to compare the value of the last column with the value of the 1st column for each and every row in my data set and append the rows whose last column value matches the value of first column of a row to that row. the first and last columns are indexed, but none are integers.
I asked the same question in stackoverflow and received a good answer which was based on numpy and arraying the data, but I am afraid it is too slow for a rather big dataset.
let's assume this is my data set (in the real data set, the first and last elements are not integers):
x = [['2', 'Jack', '8'],['1', 'Ali', '2'],['4' , 'sgee' , '1'],
['5' , 'gabe' , '2'],['100' , 'Jack' , '6'],
['7' , 'Ali' , '2'],['8' , 'nobody' , '20'],['9' , 'Al', '10']]
the result should look something like this:
[['2', 'Jack', '8', '1', 'Ali', '2', '5' , 'gabe' , '2','7' , 'Ali' , '2'],
['1', 'Ali', '2', '4' , 'sgee' , '1'],
['8' , 'nobody' , '20', '2', 'Jack', '8']]
I think I can use indexing to make the process faster, but my knowledge of databases is very limited. Does anybody have a solution (using indexes or any other tool)?
the numpy solution for this question is below:
How to compare two columns from the same data set?
here is the link to a sample of the real data in sqlite: https://drive.google.com/open?id=11w-o4twH-hyRaX8KKvFLL6dQtkTKCJky
A potential SQL-based solution could go as follows (I'm using your big sample DB as a reference):
To make my proposed solution efficient I would do the following:
Create an index on the last column and create a partial index to eliminate rows where the first and last columns are the same. This is optional so you may remove this from the later query if you think this causes a problem. But if you do you should create a full index on col 0. All three are included here for completeness.
CREATE INDEX [index_my_tab_A] ON [tab]([0]);
CREATE INDEX [index_my_tab_B] ON [tab]([5]);
CREATE INDEX [index_my_tab_AB] ON [tab]([0]) where [0] != [5];
ANALYZE;
Then I would take advantage of join behavior to generate the listing you need to produce the result you are after. By joining the table to itself you can get multiple return rows for each row considered.
SELECT * from tab t1
JOIN tab t2 on t2.[5] = t1.[0]
WHERE t1.[0] != t1.[5]
AND t2.[5] != 'N/A' -- Optional
ORDER by t1.[0];
Running that SQL against your big sample database (After ANALYZE step had completed) took 0.2 seconds on my machine. It produced three rows that matched which I presume to be correct.
It may not be immediately obvious what the resulting table means so here is the result the above query gives when run against the small sample you gave in your original post. (that SQL was modified ever so slightly to deal with the reduced number of columns) … when run it produced the following result which is equivalent to your original desired result:
1 Ali 2 4 sgee 1
2 Jack 8 1 Ali 2
2 Jack 8 5 gabe 2
2 Jack 8 7 Ali 2
8 Nobody 20 2 Jack 8
All you would have to do is run through this resulting list and combine the rows to produce the list you specified. The general idea here is to add the second trio of entries to the first trio of entries until the first trio of entries changes but only include the first trio of entries once.
So starting with the first line you would combine the Ali trio with the sgee trio giving you ['1', 'Ali', '2', '4' , 'sgee' , '1']
You would then then combine the three Jack rows giving ['2', 'Jack', '8', '1', 'Ali', '2', '5' , 'gabe' , '2','7' , 'Ali' , '2']
then the final row combines to form ['8' , 'nobody' , '20', '2', 'Jack', '8']
This matches the three arrays you specified (although not in the same order)
Note: Your original question did not indicate what result you expected for the case where the first and last column match in the same row ... [3, George, 3] so ... The where clause eliminates two kinds of entries. I noticed in your big sample data that there were many rows when col 0 and col 5 were the same. So the where clause eliminates these rows from consideration. The second thing I noticed was that many rows have 'N/A' in col 5 so I removed those from consideration too.

SQL assignment: Using Variables and using IF statements

Okay, So here is the first question on the assignment. I just don't know where to start with the problem. If anyone could just help me get started I'd be able to figure it out probably. Thanks
Set two variable values as follows:
#minEnrollment = 10
#maxEnrollment = 20
Determine the number of courses with enrollments between the values assigned to #minEnrollment and #maxEnrollment. If there are courses with enrollments between these two values, display a message in the form
There is/are __class(es) with enrollments between __ and __..
If there are no classes within the defined range, display a message in the form
“
There are no classes with an enrollment between __ and __ students.”
.....
And here is the database to use:
CREATE TABLE Faculty
(Faculty_ID INT PRIMARY KEY IDENTITY,
LastName VARCHAR (20) NOT NULL,
FirstName VARCHAR (20) NOT NULL,
Department VARCHAR (10) SPARSE NULL,
Campus VARCHAR (10) SPARSE NULL);
INSERT INTO Faculty VALUES ('Brown', 'Joe', 'Business', 'Kent');
INSERT INTO Faculty VALUES ('Smith', 'John', 'Economics', 'Kent');
INSERT INTO Faculty VALUES ('Jones', 'Sally', 'English', 'South');
INSERT INTO Faculty VALUES ('Black', 'Bill', 'Economics', 'Kent');
INSERT INTO Faculty VALUES ('Green', 'Gene', 'Business', 'South');
CREATE TABLE Course
(Course_ID INT PRIMARY KEY IDENTITY,
Ref_Number CHAR (5) CHECK (Ref_Number LIKE '[0-9][0-9][0-9][0-9][0-9]'),
Faculty_ID INT NOT NULL REFERENCES Faculty (Faculty_ID),
Term CHAR (1) CHECK (Term LIKE '[A-C]'),
Enrollment INT NULL DEFAULT 0 CHECK (Enrollment < 40))
INSERT INTO Course VALUES ('12345', 3, 'A', 24);
INSERT INTO Course VALUES ('54321', 3, 'B', 18);
INSERT INTO Course VALUES ('13524', 1, 'B', 7);
INSERT INTO Course VALUES ('24653', 1, 'C', 29);
INSERT INTO Course VALUES ('98765', 5, 'A', 35);
INSERT INTO Course VALUES ('14862', 2, 'B', 14);
INSERT INTO Course VALUES ('96032', 1, 'C', 8);
INSERT INTO Course VALUES ('81256', 5, 'A', 5);
INSERT INTO Course VALUES ('64321', 2, 'C', 23);
INSERT INTO Course VALUES ('90908', 3, 'A', 38);
Your request is how to get started, so I'm going to focus on that instead of any specific code.
Start by getting the results that are being asked for, then move on to formatting them as requested.
First, work with the Course table and your existing variables, #minEnrollment = 10 and #maxEnrollment = 20, to get the list that meets the enrollment requirements. Hint: WHERE and BETWEEN. (The Faculty table you have listed doesn't factor into this at all.) After you're sure you have the right results in that list, use the COUNT function to get the number you need for your answer, and assign that value to a new variable.
Now, to the output. IF your COUNT variable is >0, CONCATenate a string together using your variables to fill in the values in the sentence you're supposed to write. ELSE, use the variables to fill in the other sentence.
Part of the problem is, you've actually got 3 or so questions in your post. So instead of trying to post a full answer, I'm instead going to try to get you started with each of the subquestions.
Subquestion #1 - How to assign variables.
You'll need to do some googling on 'how to declare a variable in SQL' and 'how to set a variable in SQL'. This one won't be too hard.
Subquestion #2 - How to use variables in a query
Again, you'll need to google how to do this - something like 'How to use a variable in a SQL query'. You'll find this one is pretty simple as well.
Subquestion #3 - How to use IF in SQL Server.
Not to beat a dead horse, but you'll need to google this. However, one thing I would like to note: I'd test this one first. Ultimately, you're going to want something that looks like this:
IF 1 = 1 -- note, this is NOT the correct syntax (on purpose.)
STUFF
ELSE
OTHERSTUFF
And then switch it to:
IF 1 = 2 -- note, this is NOT the correct syntax (on purpose.)
STUFF
ELSE
OTHERSTUFF
... to verify the 'STUFF' happens when the case is true, and that it otherwise does the 'OTHERSTUFF'. Only after you've gotten it down, should you try to integrate it in with your query (otherwise, you'll get frustrated not knowing what's going on, and it'll be tougher to test.)
One step at a time. Let me give you some help:
Set two variable values as follows: #minEnrollment = 10 #maxEnrollment
= 20
Translated to SQL, this would look like:
Declare #minEnrollment integer = 10
Declare #maxEnrollment integer =15
Declare #CourseCount integer = 0
Determine the number of courses with enrollments between the values
assigned to #minEnrollment and #maxEnrollment.
Now you have to query your tables to determine the count:
SET #CourseCount = (SELECT Count(Course_ID) from Courses where Enrollment > #minEnrollment
This doesn't answer your questions exactly (ON PURPOSE). Hopefully you can spot the mistakes and fix them yourself. The other answers gave you some helpful hints as well.

Is there any way to combine the functionality of LIKE and BETWEEN in a query?

Im working on a query where the most obvious solution would be to combine the functionality of Like and Between, but im not sure if its possible.
My goal is to get ID's between a certain range, where the IDs are constructed of a leading code, followed by datetime, and the possible a couple more caracters like so:
ABC180715051623XYZ
The range would be from the current time, to 10 minutes prior. the leading characters don't matter for what is chosen, just the date time numbers in the center. an additional issue is that these leading characters can vary in length, somtimes 2, and other times 4.
On thatnote, ive been trying to use wildcards, and the like function, but they dont work as needed on there own. Is there any way to combine them?
Thank You
Assuming that:
The leading ID characters are always letters (actually just not numbers)
The date is always in the format yyMMddhhmmss
Then you could have your query as such:
SELECT * FROM [your_table]
WHERE CONVERT( -- Converts to a datetime
DATETIME, STUFF( -- STUFF #1
STUFF( -- STUFF #2
STUFF( -- STUFF #3
SUBSTRING([id_column],
PATINDEX('%[0-9]%',[id_column]), -- gets the index of the first number
12), -- gets the 12 digit date string (SUBSTRING)
7, 0, ' '), -- adds a space between date and time portions (STUFF #3)
10, 0, ':'), -- adds a ':' between hours and minutes (STUFF #2)
13, 0, ':')) -- adds a ':' between minutes and seconds (STUFF #1)
BETWEEN DATEADD(minute,-10, CURRENT_TIMESTAMP) -- 10 minutes ago
AND CURRENT_TIMESTAMP; -- now
This is taking the date and time part of the ID and forming it into a string that can then be converted into a datetime which can then be used in a the between of 10 mins ago and now. I've tried to format it so that it can be read but I'll explain the parts from the inside out below so you can edit if required.
Using your given value for ID of
ABC180715051623XYZ
First PATINDEX('%[0-9]%',[id_column]) This gets the (1-based) index first number in the id column. So this would be 4
This makes SUBSTRING([id_column], 4, 12) which gets out 180715051623
So then STUFF('180715051623', 7, 0, ' ') puts a space at the 7th index, giving 180715 051623
Then STUFF('180715 051623', 10, 0, ':') puts a ':' at the 10th index, giving 180715 05:1623
Then STUFF('180715 05:1623', 13, 0, ':') puts a ':' at the 13th index, giving 180715 05:16:23
This is then converted into the date '2018-07-15 05:16:23.000' and then used in the the between clause of two other datetimes.

Resources