Solr field value in another query? is it possible? - solr

I have this type of ouput genral
"docs": [
{
"id": "552e1acd5971b",
"product_id": "NOT05253",
"attribute_id": 0,
"attribute_name": "Brend",
"value_id": 12923,
"value_name": "ACER",
"category_id": 166,
"_version_": 1498504104368930800
},
{
"id": "552e1acd7d479",
"product_id": "NOT05604",
"attribute_id": 0,
"attribute_name": "Brend",
"value_id": 12923,
"value_name": "ACER",
"category_id": 166,
"_version_": 1498504104523071500
},
{
"id": "552e1acdc9c7c",
"product_id": "NOT05988",
"attribute_id": 0,
"attribute_name": "Brend",
"value_id": 12923,
"value_name": "ACER",
"category_id": 166,
"_version_": 1498504104851275800
},
{
"id": "552e1acdef261",
"product_id": "NOT06361",
"attribute_id": 0,
"attribute_name": "Brend",
"value_id": 12923,
"value_name": "ACER",
"category_id": 166,
"_version_": 1498504105011708000
}, .....
What I'm trying to do is getting results where product_id match at leas one product_id in another query...
category_id:166 AND value_id:12923
AND product_id:( _query_:"value_id:64 " )
I'm using this for making EAV product filtering faster, and this query particular has to shorten product_id count of ACER notebook in variation with 15.6" screen size :)
Another words - something like SQL query:
SELECT * FROM table WHERE value_id=12923 AND product_id IN (SELECT product_id FROM table WHERE value_id=64)
Any toughts? :D

Solr has some Join capabilities described here [1]. It could be what you're looking for.
About the other clauses: I would use a filter query (i.e. fq=value_id:12923)
[1] https://wiki.apache.org/solr/Join

Related

Performance Improvement for Real time Updates from RDS to Snowflake

As a part of an hourly process, we source data from a table in RDS and ingest into the raw data layer Snowflake.
After ingesting, we have to update certain fields in the final mart table based on new data ingested.
This mart table is a very wide table with more than 100 fields and the size of the table is around 10 GB.
So these updates are taking 6-7 minutes.
Edit: Update Query and Explain Plan:
update <target_table> tgt
set email_id =src.email_id,
first_name = (
case
when len(src.updated_first_name) < 2 then first_name
when src.updated_first_name is null or src.updated_first_name = '' then first_name
else src.updated_first_name
end ),
last_name = (
case
when len(src.updated_last_name) < 2 then last_name
when src.updated_last_name is null or src.updated_last_name =
'' then last_name
else src.updated_last_name
end),
link = src._link,
title = src.updated_title,
is_verified = 1,
last_verified = src.verified_time,
last_active_type_update = src.verified_time,
active_type = 'ENGAGED',
title_id = src.title_id,
function = src.function,
role = src.role
from <sourcetable> src
where lower(tgt.email_id) = lower(src.email_id)
and src.demographic_disposition in ('Updated Last Name','Updated
Title','Updated First Name','Verified','Updated
Country','Partial
Verify')
and src.verified_time > '<last_runtime>';
Explain Plan:
{
"GlobalStats": {
"partitionsTotal": 728,
"partitionsAssigned": 486,
"bytesAssigned": 9182406144
},
"Operations": [
[
{
"id": 0,
"operation": "Result",
"expressions": [
"number of rows updated",
"number of multi-joined rows updated"
]
},
{
"id": 1,
"parent": 0,
"operation": "Update",
"objects": [
"<target_table>"
]
},
{
"id": 2,
"parent": 1,
"operation": "InnerJoin",
"expressions": [
"joinKey: (LOWER(src.EMAIL_ID) = LOWER(tgt.EMAIL_ID))"
]
},
{
"id": 3,
"parent": 2,
"operation": "Filter",
"expressions": [
"(src.DEMOGRAPHIC_DISPOSITION IN 'Updated Last Name' IN 'Updated Job Title' IN 'Updated First Name' IN 'Verified' IN 'Updated Country' IN 'Partial Verify') AND (src.VERIFIED_TIME > '<last_runtime>') AND (LOWER(src.EMAIL_ID) IS NOT NULL)"
]
},
{
"id": 4,
"parent": 3,
"operation": "TableScan",
"objects": [
"<src_table>"
],
"expressions": [
"EMAIL_ID",
"LINK",
"UPDATED_FIRST_NAME",
"UPDATED_LAST_NAME",
"UPDATED_TITLE",
"DEMOGRAPHIC_DISPOSITION",
"VERIFIED_TIME",
"FUNCTION",
"ROLE",
"TITLE_ID"
],
"alias": "BH",
"partitionsAssigned": 1,
"partitionsTotal": 243,
"bytesAssigned": 1040384
},
{
"id": 5,
"parent": 2,
"operation": "Filter",
"expressions": [
"LOWER(tgt.EMAIL_ID) IS NOT NULL"
]
},
{
"id": 6,
"parent": 5,
"operation": "JoinFilter",
"expressions": [
"joinKey: (LOWER(src.EMAIL_ID) = LOWER(tgt.EMAIL_ID))"
]
},
{
"id": 7,
"parent": 6,
"operation": "TableScan",
"objects": [
"<target_table>"
],
"expressions": [
"EMAIL_ID",
"FIRST_NAME",
"LAST_NAME"
],
"partitionsAssigned": 485,
"partitionsTotal": 485,
"bytesAssigned": 9181365760
}
]
]
}
Now the new requirement is that this update jobs run every 5 minutes so that we can have near real time updated data in Snowflake.
However, we have tried all query optimisation and are unable to bring the execution time of the updates to less than 5 minutes.
We are using a Small warehouse in Snowflake and due to budget constraints we cant increase it further to fasten the update query performance.
Is their any other budget friendly way to do near real time updates(after ingesting the data) in Snowflake?

Querying CosmosDB based on timestamps

I am working with a CosmosDB setup by one of my colleagues and connecting to it using a connection string. The database contains several JSON documents with the following schema:
{
"period": "Raw",
"source": "Traffic",
"batchId": "ee737270-0b72-49b7-a2f1-201f642e9c81",
"periodName": "Raw",
"sourceName": "Traffic",
"groupKey": "gc4151_a",
"partitionKey": "traffic-gc4151_a-raw-raw",
"time": "2021-08-05T23:55:10",
"minute": 55,
"hour": 23,
"day": 05,
"month": 08,
"quarter": 3,
"year": 2021,
"minEventTime": "2021-08-05T23:55:09",
"maxEventTime": "2021-08-05T23:55:11",
"meta": {
"siteId": "GC4151_A",
"from": {
"lat": "55.860894822588506",
"long": "-4.284365958508686"
},
"to": {
"lat": "55.86038667864348",
"long": "-4.2826901232101795"
}
},
"measurements": {
"flow": [
{
"calculation": "Raw",
"name": "flow",
"calculationName": "Raw",
"value": 0
}
],
"concentration": [
{
"calculation": "Raw",
"name": "concentration",
"calculationName": "Raw",
"value": 0
}
]
},
"added": "2021-08-05T12:21:32.000819Z",
"updated": "2021-08-05T12:21:32.000819Z",
"id": "d4346f50-543e-4c4d-82cf-835b480914c2",
"_rid": "4RRTAIYVA1AIAAAAAAAAAA==",
"_self": "dbs/4RRTAA==/colls/4RRTAIYVA1A=/docs/4RRTAIYVA1AIAAAAAAAAAA==/",
"_etag": "\"1c0015a1-0000-1100-0000-5f3fbc4c0000\"",
"_attachments": "attachments/",
"_ts": 1598012492
}
I am trying to write a SQL query to select all the records that fall between the current date-time and one week earlier, so I can use these to perform future calculations.
I have attempted to use both of the following:
SELECT *
FROM c
WHERE c.time > date_sub(now(), interval 1 week);
and
SELECT *
FROM c
WHERE c.time >= DATE_ADD(CURDATE(), INTERVAL -7 DAY);
However, both of these return the following error:
Gateway Failed to Retrieve Query Plan: Message: {"errors":[{"severity":"Error","location":{"start":124,"end":125},"code":"SC1001","message":"Syntax error, incorrect syntax near '1'."}]}
ActivityId: 51c3b6f7-e760-4062-bd80-8cc9f8de5352, Microsoft.Azure.Documents.Common/2.14.0, Microsoft.Azure.Documents.Common/2.14.0
My question is what is the issue with my code, and how can I fix it?
You may use DateTimeAdd and GetCurrentDateTime() to achieve this. Eg.
SELECT *
FROM c
WHERE c.time > DateTimeAdd("day",-7,GetCurrentDateTime() )
Let me know if this works for you.

How to write a SQL query in CosmosDB for a JSON document which has nested/multiple array

I need to write a SQL query in the CosmosDB query editor, that will fetch results from JSON documents stored in Collection, as per my requirement shown below
The example JSON
{
"id": "abcdabcd-1234-1234-1234-abcdabcdabcd",
"source": "Example",
"data": [
{
"Laptop": {
"New": "yes",
"Used": "no",
"backlight": "yes",
"warranty": "yes"
}
},
{
"Mobile": [
{
"order": 1,
"quantity": 2,
"price": 350,
"color": "Black",
"date": "07202019"
},
{
"order": 2,
"quantity": 1,
"price": 600,
"color": "White",
"date": "07202019"
}
]
},
{
"Accessories": [
{
"covers": "yes",
"cables": "few"
}
]
}
]
}
Requirement:
SELECT 'warranty' (Laptop), 'quantity' (Mobile), 'color' (Mobile), 'cables' (Accessories) for a specific 'date' (for eg: 07202019)
I've tried the following query
SELECT
c.data[0].Laptop.warranty,
c.data[1].Mobile[0].quantity,
c.data[1].Mobile[0].color,
c.data[2].Accessories[0].cables
FROM c
WHERE ARRAY_CONTAINS(c.data[1].Mobile, {date : '07202019'}, true)
Original Output from above query:
[
{
"warranty": "yes",
"quantity": 2,
"color": "Black",
"cables": "few"
}
]
But how can I get this Expected Output, that has all order details in the array 'Mobile':
[
{
"warranty": "yes",
"quantity": 2,
"color": "Black",
"cables": "few"
},
{
"warranty": "yes",
"quantity": 1,
"color": "White",
"cables": "few"
}
]
Since I wrote c.data[1].Mobile[0].quantity i.e 'Mobile[0]' which is hard-coded, only one entry is returned in the output (i.e. the first one), but I want to have all the entries in the array to be listed out
Please consider using JOIN operator in your sql:
SELECT DISTINCT
c.data[0].Laptop.warranty,
mobile.quantity,
mobile.color,
c.data[2].Accessories[0].cables
FROM c
JOIN data in c.data
JOIN mobile in data.Mobile
WHERE ARRAY_CONTAINS(data.Mobile, {date : '07202019'}, true)
Output:
Update Answer:
Your sql:
SELECT DISTINCT c.data[0].Laptop.warranty, mobile.quantity, mobile.color, accessories.cables FROM c
JOIN data in c.data JOIN mobile in data.Mobile
JOIN accessories in data.Accessories
WHERE ARRAY_CONTAINS(data.Mobile, {date : '07202019'}, true)
My advice:
I have to say that,actually, Cosmos DB JOIN operation is limited to the scope of a single document. What possible is you can join parent object with child objects under same document. Cross-document joins are NOT supported.However,your sql try to implement mutiple parallel join.In other words, Accessories and Mobile are hierarchical, not nested.
I suggest you using stored procedure to execute two sql,than put them together. Or you could implement above process in the code.
Please see this case:CosmosDB Join (SQL API)

Retrieve faceted results with non default count value(10) from Azure Search

I am using the Azure index for an index search. My objective behind the Index search is to retrieve the Unique records depend upon some unique parameter say System_ID and I started using facets feature for this, but when using it I am unable to retrieve more than 10 unique facets despite providing a count value to 20 in the query.
Below is the summary:
I am able to retrieve only 10 unique records even though more than 10 unique records are there in Index.
When i modify the count property of facet to 20 Still I am getting only 10 records
Can you please help with me to modify it in such a way that I will get more than 10 records.
Any help will be appreciable.
Default query:
$filter=(systemID ne null) and (ownerSalesforceRecordID eq 'a0h5B000000gJKfQAM')&facet=machineTagSystemID,sort:value&queryType=full
Default Results:
{"machineTagSystemID": [
{
"count": 9,
"value": "ABCS test machines-111-test - change|*1XA78RUGV23PVPN"
},
{
"count": 6,
"value": "Ajit Machine testing1jjcdxxxxxxxxxxxxxx|*1L693D439H5ZNG9"
},
{
"count": 19,
"value": "Anvesh test111dsaa|*13SSNP5AJ3L96C5"
},
{
"count": 3,
"value": "Dead End cross 2|*1NK7KNNLFVTM4QC"
},
{
"count": 3,
"value": "hehehe|*1NDC32TDNXT5RAH"
},
{
"count": 14,
"value": "high2 Machine12345678ppjk fvrf|*1T2F3VQEJ58ZLQL"
},
{
"count": 31,
"value": "prashant dev machine 213|*12L343TZTFGH3M6"
},
{
"count": 1,
"value": "ryansjcilaptop465986543|*1E2PG9V3BMEYDM7"
},
{
"count": 12,
"value": "snehali DEV June|*1QXEDL8E2V8MGBY"
},
{
"count": 27,
"value": "tarun Machine-dev|*1YRPHS3J7NGUVA8"
}
]}
Facet with count:
$filter=(systemID ne null) and (ownerSalesforceRecordID eq 'a0h5B000000gJKfQAM')&facet=machineTagSystemID,sort:value,count:20&queryType=full
But same results:
{"machineTagSystemID": [
{
"count": 9,
"value": "ABCS test machines-111-test - change|*1XA78RUGV23PVPN"
},
{
"count": 6,
"value": "Ajit Machine testing1jjcdxxxxxxxxxxxxxx|*1L693D439H5ZNG9"
},
{
"count": 19,
"value": "Anvesh test111dsaa|*13SSNP5AJ3L96C5"
},
{
"count": 3,
"value": "Dead End cross 2|*1NK7KNNLFVTM4QC"
},
{
"count": 3,
"value": "hehehe|*1NDC32TDNXT5RAH"
},
{
"count": 14,
"value": "high2 Machine12345678ppjk fvrf|*1T2F3VQEJ58ZLQL"
},
{
"count": 30,
"value": "prashant dev machine 213|*12L343TZTFGH3M6"
},
{
"count": 1,
"value": "ryansjcilaptop465986543|*1E2PG9V3BMEYDM7"
},
{
"count": 12,
"value": "snehali DEV June|*1QXEDL8E2V8MGBY"
},
{
"count": 27,
"value": "tarun Machine-dev|*1YRPHS3J7NGUVA8"
}
]}
This is based on the documentation link: https://learn.microsoft.com/en-us/azure/search/search-faceted-navigation
Facets honor the filter specified in a query. It's possible this is why you are only seeing 10 unique facet value for this field. Generally speaking, your query looks fine. If there were more than 10 unique values in this field for the query specified, I would expect them to show up.
How many total results are returned by this query? I see 125 total values in the facets you provided and I'm wondering if the count aligns with your results.
Mike
This is old but I hit the same issue - there is a default limit of 10 values returned in a facet, you can extend this to return more facet values by adding a count to a given facet. E.g.:
facet=Month,count:12&search=something
This can also be done in the c# API by just adding the count to the facet name:
var options = new SearchOptions();
options.Facets.Add("Month,count:12");

Finding Documents Array CouchDB

I have Documents they have this structure:
{id: ####,
rev: ####,
"Cam_name": "Camera SX",
"colour": "white",
"manufacturer": "Sony",
"rec_limit": 180,
"Customer": ["Mike","Ann","James"]
}
{id: ####,
rev: ####,
"Cam_name": "PXSV CAM",
"colour": "white",
"manufacturer": "LG",
"rec_limit": 144,
"Customer": ["Mike","Oliver","Mr. Rain"]
}
{id: ####,
rev: ####,
"Cam_name": "LxSV Double",
"colour": "white",
"manufacturer": "Phillips",
"rec_limit": 160,
"Customer": ["Mike"]
}
And i want to make an MAP Function query where i can see ALL Cam_Names which the Customer Mike is using.
i have a simillar Map Function but this shows only the Cam_Name LxSV Double and only the Customer Mike. i want to show all Cam_Names which mike is using.
MyQuery:
function(doc){
if(doc.Customer == "Mike"){
emit(doc.Cam_name, doc.Customer)
This query gives me not the right result.
If your query looks exactly like that, then you have a syntax error. But also, doc.Customer is an array, so you can't do a simple equality check.
But checking the existence of a value in an array is totally unnecessary, your map function can simply look like this:
function (doc) {
doc.Customer.forEach(function (customer) {
emit(customer, doc.Cam_name);
});
}
Then, query your view with /{db}/_design/{ddoc}/_view/{view}?key="Mike"
Your output will look like:
{
"total_rows": 3,
"offset": 0,
"rows": [
{
"id": "####",
"key": "Mike",
"value": "Camera SX"
},
{
"id": "####",
"key": "Mike",
"value": "PXSV CAM"
},
{
"id": "####",
"key": "Mike",
"value": "LxSV Double"
}
]
}
Now, you can use this same view to find any customer, not just whomever you specify in your map function.

Resources