Cypher statement with distinct match conditions is returning the same result - database

I am using Neo4j as a database to store voting information related to another database object.
I have a Vote object which has fields:
type:String with values of UP or DOWN.
argId:String which is a string ID value linking to a unique argument object
I am trying to query the number of votes assigned to a given argId using the following queries:
MATCH (v:Vote) WHERE v.argId = '214' AND v.type='DOWN'
RETURN {downvotes: COUNT(v)} AS votes
UNION
MATCH (v:Vote) WHERE v.argId = '214' AND v.type='UP'
RETURN {upvotes: COUNT(v)} AS votes
Note that this above cypher -- works and returns the expected result result like so:
[
{
"downvotes": 1
},
{
"upvotes": 10
}
]
But I feel like the query could be a bit neater and want to write something like this:
MATCH (v:Vote) WHERE v.argId = '214' AND v.type='UP'
MATCH (b:Vote) WHERE b.argId = '214' AND b.type='DOWN'
RETURN {upvotes: COUNT(v), downvotes: COUNT(b)}
Just reading it through, I think it makes sense, b and v are declared as separate variables, so all should be good (so I thought).
But running it given me this:
{
"upvotes": 10,
"downvotes": 10
}
But it should be what I have above.
Why is this?
I'm kinda new to neo4j and cypher so I've probably not understood how cypher works fully.
Can anyone shine any light?
Thank you!
p.s. I'm using Neo4j 3.5.6 and running the queries via the Desktop web browser app.

I think if you run this query you will get a clearer picture of what is happeneing. Your query produces a cartesian product of the upvotes(10) and the downvotes(1). The product is a result set of 10 rows. When they are subsequently counted, there are ten of each.
MATCH (v:Vote) WHERE v.argId = '214' AND v.type='UP'
MATCH (b:Vote) WHERE b.argId = '214' AND b.type='DOWN'
RETURN v.type, b.type
In order to get the result you want you need to filter the values and count them individually.
Rather than have two match statements, have a single match statement that retreives all of the values of interest and then use a conditional statement to filter them into upvotes and downbotes buckets.
Something like this may suit you.
MATCH (v:Vote {argId: '214'})
WHERE v.type IN ['UP', 'DOWN']
RETURN {
upvotes: count(CASE WHEN v.type = 'DOWN' THEN 1 END),
downvotes: count(CASE WHEN v.type = 'UP' THEN 1 END)
} AS vote_result
Using APOC you could do something like this whereby you use the type values themselves to aggregate the counts and then use APOC to convert it to a map with the types as the keys in the map.
MATCH (v:Vote {argId: '214'})
WHERE v.type IN ['UP', 'DOWN']
WITH [v.type, count(*)] AS vote_pair
RETURN apoc.map.fromPairs(collect(vote_pair)) AS votes

Related

union request and pagination in cakephp4

I made two requests. The first one gives me 2419 results and I store the result in $requestFirst. The second, 1 result and I store the result in $requestTwo.
I make a union :
$requestTot = $requestFirst->union($requestTwo);
The total of the $requestTot is 2420 results so all is well so far.
Then :
$request = $this->paginate($requestTot);
$this->set(compact('request'));
And here I don't understand, on each page of the pagination I find the result of $requestTwo. Moreover the pagination displays me :
Page 121 of 121, showing 20 record(s) out of 2,420 total
This is the right number of results except that when I multiply the number of results per page by the number of pages I get 2540. This is the total number of results plus one per page.
Can anyone explain?
Check the generated SQL in Debug Kit's SQL panel, you should see that the LIMIT AND OFFSET clauses are being set on the first query, not appended as global clauses so that they would affect the unionized query.
It will look something like this:
(SELECT id, title FROM a LIMIT 20 OFFSET 0)
UNION
(SELECT id, title FROM b)
So what happens then is that pagination will only be applied to the $requestFirst query, and the $requestTwo query will be unionized on top of it each and every time, hence you'll see its result on every single page.
A workaround for this current limitation would be to use the union query as a subquery or a common table expression from which to fetch the results. In order for this to work you need to make sure that the fields of your queries for the union are being selected without aliasing! This can be achieved by either using Table::subquery():
$requestFirst = $this->TableA
->subquery()
->select(['a', 'b'])
// ...
$requestTwo = $this->TableB
->subquery()
->select(['c', 'd'])
// ...
or by explicitly selecting the fields with aliases equal to the column names:
$requestFirst = $this->TableA
->find()
->select(['a' => 'a', 'b' => 'b'])
// ...
$requestTwo = $this->TableB
->find()
->select(['c' => 'c', 'd' => 'd'])
// ...
Then you can safely use those queries for a union as a subquery:
$union = $requestFirst->union($requestTwo);
$wrapper = $this->TableA
->find()
->from([$this->TableA->getAlias() => $union]);
$request = $this->paginate($wrapper);
or as a common table expression (in case your DBMS supports them):
$union = $requestFirst->union($requestTwo);
$wrapper = $this->TableA
->find()
->with(function (\Cake\Database\Expression\CommonTableExpression $cte) use ($union) {
return $cte
->name('union_source')
->field(['a', 'b'])
->query($union)
})
->select(['a', 'b'])
->from([$this->TableA->getAlias() => 'union_source']);
$request = $this->paginate($wrapper);

Peewee select query with multiple joins and multiple counts

I've been attempting to write a peewee select query which results in a table with 2 counts (one for the number of prizes associated with the lottery, and the for the number of packages associated with the lottery), as well as the fields in the Lottery model.
I've managed to write select queries with 1 count working (seen below), and then I've had to convert the ModelSelects to lists and join them manually (which I think is very hacky).
I did manage to write a select query where the results were joined, but it would multiply the packages count with the prizes count (I've since lost that query).
I also tried using a .switch(Lottery) but I didn't have any luck with this.
query1 = (Lottery.select(Lottery,fn.count(Package.id).alias('packages'))
.join(LotteryPackage)
.join(Package)
.order_by(Lottery.id)
.group_by(Lottery)
.dicts())
query2 = (Lottery.select(Lottery.id.alias('lotteryID'), fn.count(Prize.id).alias('prizes'))
.join(LotteryPrize)
.join(Prize)
.group_by(Lottery)
.order_by(Lottery.id)
.dicts())
lottery = list(query1)
query3 = list(query2)
for x in range(len(lottery)):
lottery[x]['prizes'] = query3[x]['prizes']
While the above code works, is there a cleaner way to write this query?
Your best bet is to do this with subqueries.
# Create query which gets lottery id and count of packages.
L1 = Lottery.alias()
subq1 = (L1
.select(L1.id, fn.COUNT(LotteryPackage.package).alias('packages'))
.join(LotteryPackage, JOIN.LEFT_OUTER)
.group_by(L1.id))
# Create query which gets lottery id and count of prizes.
L2 = Lottery.alias()
subq2 = (L2
.select(L2.id, fn.COUNT(LotteryPrize.prize).alias('prizes'))
.join(LotteryPrize, JOIN.LEFT_OUTER)
.group_by(L2.id))
# Select from lottery, joining on each subquery and returning
# the counts.
query = (Lottery
.select(Lottery, subq1.c.packages, subq2.c.prizes)
.join(subq1, on=(Lottery.id == subq1.c.id))
.join(subq2, on=(Lottery.id == subq2.c.id))
.order_by(Lottery.name))
for row in query.objects():
print(row.name, row.packages, row.prizes)

How to check if the value matches the one from previous ?th row? (? is dynamic)

Here is my data set.
Data in
I'd like to check if the gender with "Potential Original" matched the gender with "Potential Duplicate'. There is no specified group but 1 duplicate + 1 or more original acted like a group.
Here is the output I want (for duplicate it's NA because it's comparing to itself).
Data out
Appreciate your help. Thanks.
Thanks Rahul for looking into this. This is what I tried and I think it worked. The logic is to create the seq # first for each block of Duplicate and Original and then pull the lag value with corresponding distance.
library(data.table)
setDT(df)[, counter := seq_len(.N), by = list(cumsum(Status == "Potential
Duplicate"))]
for (i in 1:nrow(df)) {
if (df$Status[i]=="Potential Duplicate") {
df$Gender_LAG[i] <-df2$Gender[i]
}
else {
df$Gender_LAG[i]<-df2$Gender[i-df2$counter[i]+1]
}
}
Thanks.
Looking forwards to seeing other options.

Filter SQL datatable according to different parameters, without a WHERE clause

I'm building an application that needs to allow the user to filter a data table according to different filters. So, the user will have three different filter posibilites but he might use only one, or two or the three of them at the same tame.
So, let's say I have the following columns on the table:
ID (int) PK
Sede (int)
Programa (int)
Estado (int)
All of those columns will store numbers, integers. The "ID" column is the primary key, "Sede" stores 1 or 2, "Programa" is any number between 1 and 15, and "Estado" will store numbers between 1 and 13.
The user may filter the data stored in the table using any of those filters (Sede, Programa or Estado). But the might, as well, use two filters, or the three of them at the same time.
The idea is that this application works like the data filters on Excel. I created a simulated table on excel to show what I want to achieve:
This first image shows the whole table, without applying any filter.
Here, the user selected a filter for "Sede" and "Programa" but leaved the "Estado" filter empty. So the query returns the values that are equal to the filter, but leaves the "Estado" filter open, and brings all the records, filering only by "Sede" (1) and "Programa" (6).
In this image, the user only selected the "Estado" filter (5), so it brings all the records that match this criteria, it doesn't matter if "Sede" or "Programa" are empty.
If I use a SELECT clasuse with a WHERE on it, it will work, but only if the three filters have a value:
DECLARE #sede int
DECLARE #programa int
DECLARE #estado int
SET #sede = '1'
SET #programa = '5'
SET #estado = '12'
SELECT * FROM [dbo].[Inscripciones]
WHERE
([dbo].[Inscripciones].[Sede] = #sede)
AND
([dbo].[Inscripciones].[Programa] = #programa)
AND
([dbo].[Inscripciones].[Estado] = #estado)
I also tryed changing the "AND" for a "OR", but I can't get the desired result.
Any help will be highly appreciated!! Thanks!
common problem: try using coalesce on the variable and for the 2nd value use the field name you're comparing to. Be careful though; Ensure it's NULL and not empty string being passed!
What this does is take the first non-null value of the variable passed in or the value you're comparing to.. Thus if the value passed in is null the comparison will always return true.
WHERE
[dbo].[Inscripciones].[Sede] = coalesce(#sede, [dbo].[Inscripciones].[Sede])
AND
[dbo].[Inscripciones].[Programa] = coalesce(#programa, [dbo].[Inscripciones].[Programa])
AND
[dbo].[Inscripciones].[Estado] = coalesce(#estado, [dbo].[Inscripciones].[Estado])
If sede is null and programa and estado are populated the compare would look like...
?=? (or 1=1)
?=programa variable passed in
?=Estado variable passed in
Boa Sorte!
Thank you all for your anwers. After reading the article posted in the comments by #SeanLange I was finally able to achieve what was needed. Using a CASE clause in the WHERE statement solves the deal. Here's the code:
SELECT
*
FROM [dbo].[Inscripciones]
WHERE
([dbo].[Inscripciones].[Sede] = (CASE WHEN #sede = '' THEN [dbo].[Inscripciones].[Sede] ELSE #sede END))
AND
([dbo].[Inscripciones].[Programa] = (CASE WHEN #programa = '' THEN [dbo].[Inscripciones].[Programa] ELSE #programa END))
AND
([dbo].[Inscripciones].[Estado] = (CASE WHEN #estado = '' THEN [dbo].[Inscripciones].[Estado] ELSE #estado END))
AND
([dbo].[Inscripciones].[TipoIngreso] = (CASE WHEN #tipoingreso = '' THEN [dbo].[Inscripciones].[TipoIngreso] ELSE #tipoingreso END))
Thanks again!!

Linq - how get the minimum, if value = 0, get the next value

I have a test database which logs data from when a store logs onto a store portal and how long it stays logged on.
Example:
(just for visualizing purposes - not actual database)
Stores
Id Description Address City
1 Candy shop 43 Oxford Str. London
2 Icecream shop 45 Side Lane Huddersfield
Connections
Id Store_Ref Start End
1 2 2011-02-11 09:12:34.123 2011-02-11 09:12:34.123
2 2 2011-02-11 09:12:36.123 2011-02-11 09:14:58.125
3 1 2011-02-14 08:42:10.855 2011-02-14 08:42:10.855
4 1 2011-02-14 08:42:12.345 2011-02-14 08:50:45.987
5 1 2011-02-15 08:35:19.345 2011-02-15 08:38:20.123
6 2 2011-02-19 09:08:55.555 2011-02-19 09:12:46.789
I need to get various data from the database. I've already gotten the max and average connection duration. (So probably very self-evident that..) I also need to have some information about which connection lasted the least. I ofcourse immediately thought of the Min() function of Linq, but as you can see, the database also includes connections that started and ended instantly. Therefore, that data isn't actually "valid" for data analysis.
So my question is how to get the minimum value, but if the value = 0, get the next value that is the lowest.
My linq query so far (which implements the Min() function):
var min = from connections in Connections
join stores in Stores
on connections.Store_Ref equals stores.Id
group connections
by stores.Description into groupedStores
select new
{
Store_Description = groupedStores.Key,
Connection_Duration = groupedStores.Min(connections =>
(SqlMethods.DateDiffSecond(connections.Start, connections.End)))
};
I know that it's possible to get the valid values through multiple queries and/or statements though, but I was wondering if it's possible to do it all in just one query, since my program expects linq queries to be returned and my preference goes to keeping the program as "light" as possible.
If you have to great/simple method to do so, please share it. Your contribution is very appreciated! :)
What if you add, before the select new, a let clause for the duration of the conection with something like:
let duration = SqlMethods.DateDiffSecond(connections.Start, connections.End)
And then add a where clause
where duration != 0
var min = from connections in Connections.Where(connections => (SqlMethods.DateDiffSecond(connections.Start, connections.End) > 0)
join stores in Stores
on connections.Store_Ref equals stores.Id
group connections
by stores.Description into groupedStores
select new
{
Store_Description = groupedStores.Key,
Connection_Duration = groupedStores.Min(connections =>
(SqlMethods.DateDiffSecond(connections.Start, connections.End)))
};
Try this, With filtering the "0" values you will get the right result, at least that is my taught.
Include a where clause before calculating the Min value.
groupedStores.Where(conn => SqlMethods.DateDiffSecond(conn.Start, conn.End) > 0)
.Min(conn => (SqlMethods.DateDiffSecond(conn.Start, conn.End))

Resources