Solr's Complex Query for Nested Documents is not Working - solr

The DataSet I am working on Solr.
[
{
"id": "doc_1",
"name": "Harpreet Chaggar",
"_childDocuments_": [
{ "id": "child_doc_a", "number": 22,"created_at":"2020-03-20T00:00:00Z" },
{ "id": "child_doc_b", "number": 10 ,"created_at":"2021-05-28T00:00:00Z"},
]
},
{
"id": "doc_2",
"name": "Hardik Deshmukh",
"_childDocuments_": [
{ "id": "child_doc_1", "number": 67,"created_at":"2022-03-20T00:00:00Z" },
{ "id": "child_doc_2", "number": 78 ,"created_at":"2022-05-28T00:00:00Z"},
]
},
]
My objective is to make exclude query for a nested Date Data along with some parent conditions and to return parent document for all queries.
I am trying to fetch "id" : "doc_2", "name": "Hardik Deshmukh" by the following query. Note:- I need parent document in return.
q = {!parent which='(name:("Hardik" OR "Harpreet") AND id:"doc_1")'}-created_at:[2020-01-17T00:00:00Z TO 2021-12-17T00:00:00Z]
But I am not getting any results.
To make sure if the date query is working properly, I executed the below query.
q = -created_at:[2020-01-17T00:00:00Z TO 2021-12-17T00:00:00Z]
And it was working.
"response":{"numFound":4,"start":0,"numFoundExact":true,"docs":[
{
"id":"doc_1",
"name":["Harpreet Chaggar"],
"_version_":1746310602768252928},
{
"id":"child_doc_1",
"number":["67"],
"created_at":"2022-03-20T00:00:00Z",
"_version_":1746310602791321600},
{
"id":"child_doc_2",
"number":["78"],
"created_at":"2022-05-28T00:00:00Z",
"_version_":1746310602791321600},
{
"id":"doc_2",
"name":["Hardik Deshmukh"],
"_version_":1746310602791321600}]
}}
Field types:
For created_at
Field-Type:org.apache.solr.schema.DatePointField
For name
Field-Type:org.apache.solr.schema.TextField
And if I want to fetch "id": "doc_1", I am able to get it by executing the following query.
{!parent which='(name:("Hardik" OR "Harpreet") AND id:"doc_1")'} ( created_at:[2020-01-17T00:00:00Z TO 2021-12-17T00:00:00Z] )
It fetches desired results.
"response":{"numFound":1,"start":0,"numFoundExact":true,"docs":[
{
"id":"doc_1",
"name":["Harpreet Chaggar"],
"_version_":1746310602768252928}]
}}

Related

useQuery's onCompleted being called with cached value

Hopefully I can articulate this question clearly without too much code as it's difficult to extract the pieces from my codebase.
I was observing odd behavior yesterday with useQuery that I can't seem to understand. I think I understand Apollo's cache pretty well but this particular behavior doesn't make sense to me. I have a query that looks something like this:
query {
reservations {
priceBreakdown {
sections {
id
name
total
}
}
}
}
The schema is something like:
type Query {
reservations: [Reservation]
}
type Reservation {
priceBreakdown: PriceBreakdown
}
type PriceBreakdown {
sections: [Section]
}
type Section {
id: String
name: String
total: Float
}
That id on Section is not a proper ID and, in fact, is not unique. It's just a string and all PriceBreakdowns have a list of Sections that contain the same ID. I've pointed this out to the backend folks and it's being fixed but I realize this causes incorrect caching with Apollo since there will be collisions w.r.t. __typename and id. My confusion comes from how onCompleted is called. I noticed when doing
const { data } = useQuery(myQuery, {
onCompleted: console.log
})
that when the network call returns, all PriceBreakdowns are unique and correct, as they should be. But when onCompleted is called with what I thought would be that same API data, it's different and seems to reflect the cached values. In case that's confusing, here are the two results. First is straight from the API and second is the log from onCompleted:
// api results
"data": [
{
"id": "92267",
"price_breakdown": {
"sections": [
{
"name": "Reservation",
"total": "$60.00",
"id": "RESERVATION"
},
{
"name": "Promotions and Fees",
"total": null,
"id": "PROMOTIONS_AND_FEES"
},
{
"name": "Total",
"total": "$51.00",
"id": "HOST_TOTAL"
}
]
}
},
{
"id": "92266",
"price_breakdown": {
"sections": [
{
"name": "Reservation",
"total": "$30.00",
"id": "RESERVATION"
},
{
"name": "Promotions and Fees",
"total": null,
"id": "PROMOTIONS_AND_FEES"
},
{
"name": "Total",
"total": "$25.50",
"id": "HOST_TOTAL"
}
]
}
}
]
// onCompleted log
"data": [
{
"id": "92267",
"price_breakdown": {
"sections": [
{
"name": "Reservation",
"total": "$60.00",
"id": "RESERVATION"
},
{
"name": "Promotions and Fees",
"total": null,
"id": "PROMOTIONS_AND_FEES"
},
{
"name": "Total",
"total": "$51.00",
"id": "HOST_TOTAL"
}
]
}
},
{
"id": "92266",
"price_breakdown": {
"sections": [
{
"name": "Reservation",
"total": "$60.00",
"id": "RESERVATION"
},
{
"name": "Promotions and Fees",
"total": null,
"id": "PROMOTIONS_AND_FEES"
},
{
"name": "Total",
"total": "$51.00",
"id": "HOST_TOTAL"
}
]
}
}
]
As you can see, in the onCompleted log, the Sections that had the same ID as Sections from the previous record are duplicated, suggesting Apollo is rebuilding the payload from cache and calling onCompleted with that. Is that what's happening? If I set the fetchPolicy to no-cache, the results are correct, but of course that's just a patch for the problem. I want to better understand Apollo because I thought I understood and now I see something unintuitive. I wouldn't have expected onCompleted to be called with something built from the cache. Thanks in advance.

Multikey partial index not used with elemMatch

Consider the following document format which has an array field tasks holding embedded documents
{
"foo": "bar",
"tasks": [
{
"status": "sleep",
"id": "1"
},
{
"status": "active",
"id": "2"
}
]
}
There exists a partial index on key tasks.id
{
"v": 2,
"unique": true,
"key": {
"tasks.id": 1
},
"name": "tasks.id_1",
"partialFilterExpression": {
"tasks.id": {
"$exists": true
}
},
"ns": "zardb.quxcollection"
}
The following $elemMatch query with multiple conditions on the same array element
db.quxcollection.find(
{
"tasks": {
"$elemMatch": {
"id": {
"$eq": "1"
},
"status": {
"$nin": ["active"]
}
}
}
}).explain()
does not seem to use the index
"winningPlan": {
"stage": "COLLSCAN",
"filter": {
"tasks": {
"$elemMatch": {
"$and": [{
"id": {
"$eq": "1"
}
},
{
"status": {
"$not": {
"$eq": "active"
}
}
}
]
}
}
},
"direction": "forward"
}
How can I make the above query use the index? The index does seem to be used via dot notation
db.quxcollection.find({"tasks.id": "1"})
however I need the same array element to match multiple conditions which includes the status field, and the following does not seem to be equivalent to the above $elemMatch based query
db.quxcollection.find({
"tasks.id": "1",
"tasks.status": { "$nin": ["active"] }
})
The way the partial indexes work is it uses the path as a key. With $elemMatch you don't have the path explicitly in the query. If you check it with .explain("allPlansExecution") it is not even considered by the query planner.
To benefit from the index you can specify the path in the query:
db.quxcollection.find(
{
"tasks.id": "1",
"tasks": {
"$elemMatch": {
"id": {
"$eq": "1"
},
"status": {
"$nin": ["active"]
}
}
}
}).explain()
It duplicates part of the elemMatch condition, so the index will be used to get all documents containing tasks of specific id, then it will filter out documents with "active" tasks at fetch stage. I must admit the query doesn't look nice, so may be add some comments to the code with explanations.

How to write a SQL query in CosmosDB for a JSON document which has nested/multiple array

I need to write a SQL query in the CosmosDB query editor, that will fetch results from JSON documents stored in Collection, as per my requirement shown below
The example JSON
{
"id": "abcdabcd-1234-1234-1234-abcdabcdabcd",
"source": "Example",
"data": [
{
"Laptop": {
"New": "yes",
"Used": "no",
"backlight": "yes",
"warranty": "yes"
}
},
{
"Mobile": [
{
"order": 1,
"quantity": 2,
"price": 350,
"color": "Black",
"date": "07202019"
},
{
"order": 2,
"quantity": 1,
"price": 600,
"color": "White",
"date": "07202019"
}
]
},
{
"Accessories": [
{
"covers": "yes",
"cables": "few"
}
]
}
]
}
Requirement:
SELECT 'warranty' (Laptop), 'quantity' (Mobile), 'color' (Mobile), 'cables' (Accessories) for a specific 'date' (for eg: 07202019)
I've tried the following query
SELECT
c.data[0].Laptop.warranty,
c.data[1].Mobile[0].quantity,
c.data[1].Mobile[0].color,
c.data[2].Accessories[0].cables
FROM c
WHERE ARRAY_CONTAINS(c.data[1].Mobile, {date : '07202019'}, true)
Original Output from above query:
[
{
"warranty": "yes",
"quantity": 2,
"color": "Black",
"cables": "few"
}
]
But how can I get this Expected Output, that has all order details in the array 'Mobile':
[
{
"warranty": "yes",
"quantity": 2,
"color": "Black",
"cables": "few"
},
{
"warranty": "yes",
"quantity": 1,
"color": "White",
"cables": "few"
}
]
Since I wrote c.data[1].Mobile[0].quantity i.e 'Mobile[0]' which is hard-coded, only one entry is returned in the output (i.e. the first one), but I want to have all the entries in the array to be listed out
Please consider using JOIN operator in your sql:
SELECT DISTINCT
c.data[0].Laptop.warranty,
mobile.quantity,
mobile.color,
c.data[2].Accessories[0].cables
FROM c
JOIN data in c.data
JOIN mobile in data.Mobile
WHERE ARRAY_CONTAINS(data.Mobile, {date : '07202019'}, true)
Output:
Update Answer:
Your sql:
SELECT DISTINCT c.data[0].Laptop.warranty, mobile.quantity, mobile.color, accessories.cables FROM c
JOIN data in c.data JOIN mobile in data.Mobile
JOIN accessories in data.Accessories
WHERE ARRAY_CONTAINS(data.Mobile, {date : '07202019'}, true)
My advice:
I have to say that,actually, Cosmos DB JOIN operation is limited to the scope of a single document. What possible is you can join parent object with child objects under same document. Cross-document joins are NOT supported.However,your sql try to implement mutiple parallel join.In other words, Accessories and Mobile are hierarchical, not nested.
I suggest you using stored procedure to execute two sql,than put them together. Or you could implement above process in the code.
Please see this case:CosmosDB Join (SQL API)

cloudant groupby and count number of times a value appears

First time working with a nosql DB and having trouble writing a query that can look in my DB and for a key count the number of time it appears by another key.
For instance if my DB contains
{
"person": "user1",
"status": "good"
},
{
"person": "user1",
"status": "good"
},
{
"person": "user1",
"status": "bad"
},
{
"person": "user2",
"status": "good"
}
would like to know that person1 was good 2 and bad 1 and person2 was only good 1
in sql would do
select person, status, count(*)
from mydb
groupby person, status
or to get it by a user in the db
select person, status, count(*)
from mydb
groupby person, status
where person = "user1"
You can achieve this with Cloudant's MapReduce views and suitably chosen query parameters. I created a view where the map is
function (doc) {
emit([doc.person, doc.status], null);
}
and the reduce the built-in _count. That gives us an index where the key is a vector, and we can then group at different levels. Using groupby=true with group_level=2 gives us the desired result:
curl 'https://A.cloudant.com/D/_design/so/_view/by-status?groupby=true&group_level=2'
{
"rows": [
{
"key": [
"user1",
"bad"
],
"value": 1
},
{
"key": [
"user1",
"good"
],
"value": 2
},
{
"key": [
"user2",
"good"
],
"value": 1
}
]
}

"There is no index available for this selector" despite the fact I made one

In my data, I have two fields that I want to use as an index together. They are sensorid (any string) and timestamp (yyyy-mm-dd hh:mm:ss).
So I made an index for these two using the Cloudant index generator. This was created successfully and it appears as a design document.
{
"index": {
"fields": [
{
"name": "sensorid",
"type": "string"
},
{
"name": "timestamp",
"type": "string"
}
]
},
"type": "text"
}
However, when I try to make the following query to find all documents with a timestamp newer than some value, I am told there is no index available for the selector:
{
"selector": {
"timestamp": {
"$gt": "2015-10-13 16:00:00"
}
},
"fields": [
"_id",
"_rev"
],
"sort": [
{
"_id": "asc"
}
]
}
What have I done wrong?
It seems to me like cloudant query only allows sorting on fields that are part of the selector.
Therefore your selector should include the _id field and look like:
"selector":{
"_id":{
"$gt":0
},
"timestamp":{
"$gt":"2015-10-13 16:00:00"
}
}
I hope this works for you!

Resources