I'm using Firestore to store data for multiple entities. In this example, each document is a company with details on the products it sells, and each product is associated with multiple keywords. Example structure:
Document 1:
company_name: 'Company 1',
products: [
{
name: 'Green tea',
keywords: ['green tea', 'healthy, 'matcha']
},
{
name: 'Sushi',
keywords: ['sushi', 'rice', 'healthy']
}
]
Document 2:
company_name: 'Company 2',
products: [
{
name: 'Apple',
keywords: ['fruit', 'healthy']
},
{
name: 'Cake',
keywords: ['dessert', 'sweet']
}
]
I would like to search for companies that sell products with certain keywords. For example, by searching for the keyword healthy, both documents Company 1 and Company 2 would be returned, as they both sell foods with that keyword. How would I do this with Firestore filtering/searching?
The way you have your data structured now, with multiple field array elements containing values to search, it's not possible to have a single query find everything you want. The problem here is the arrays. It's simply not possible to search the nested contents of array fields.
When you have array elements that need to be matched individually with queries, they should instead be individual documents in a collection. Yes, that's more reads and writes, but that also means your queries become possible.
Imagine instead if your two documents didn't contain a products array field, and instead each document contained a subcollection called products where each item had a field called keywords.
companies (collection)
document 1 (doc)
company_name: string field
products (subcollection)
keywords: string array field
With this, you could then do a collection group query across all products across all companies like this (in JavaScript):
db.collectionGroup("products").where("keywords", "array-contains", keyword)
where keyword is the word you're looking for.
Related
MongoDB
I know that if you have an array of subdocuments, and you index some field on those subdocuments, that field is only assured to be unique within the whole collection, but not within that single array.
Does the same apply to the _id property of those subdocuments?
For example, if I have the following
{
_id: 'Parent ID',
subdocArray: [
{
_id: 'Child ID 1'
}
]
}
And I decide to add another child document to the array, is is assured that the _id field will be unique in the same way that it would be in a regular top-level document.
No, _id has no special meaning to MongoDB in an array of subdocuments.
However, as asked here, you can enforce this desired restriction manually while adding elements to the array.
db.coll.update(
{_id: 'Parent ID', 'subdocArray._id': {$ne: 'Child ID 1'}},
{$push: {subdocArray: {_id: 'Child ID 1'}}})
I base on facet.field and I have one situation. In my store i have base products and variants, when I use facet.field I get count with base products and variants:
Category:
Chairs(30) <- this is count of base products and variants
Tables(20) <- this is count of base products and variants
I want to add some terms for facet.field in order to that facet return count only of variants, every product has field like "productType":"baseProduct" or "productType":"variantProduct"
I want to use those fields.
Any ideas? how can I use this in some query , please help
You can use facet.pivot to get distinct counts for each type:
&facet.pivot=productType,category
You can also use the JSON Facet API to do two separate facets:
{
base: {
type: terms,
field: category,
domain: { filter: "productType:baseProduct" }
},
variant: {
type: terms,
field: category,
domain: { filter : "productType:variantProduct" }
}
}
I'm running an instance of Solr 6.2. One of the use cases I'm exploring is to return records grouped by a field, including summed columns (facets) and sorted by those columns. I realize Solr is not meant to be utilized as a relational database, but is this possible?
Using the JSON API, I send the following data payload to the query endpoint of my Solr instance:
{
query: "*:*",
filter: ["status:1", "date:[2016-10-11T00:00:00Z-7DAYS/DAY TO 2016-10-11T00:00:00Z]"],
limit: 10,
params: {
group: true,
group.field: name,
group.facet: true
},
facet: {
funcs: {
type: terms,
field: name,
sort: { sum_v1: desc },
limit: 10,
facet: {
sum_v1: "sum(v1)",
sum_v2: "sum(v2)",
sum_v3: "sum(v3)"
}
}
}
This returns 10 records at a time in both the groups key and facets key of the response JSON. However, the sorted facet buckets do not match up with the grouped records. How can I get the facet counts with the relevant groups?
The only workaround I can come up with is to do a query for the grouped records first, then do another query using the id's from that query to get the facet counts. However, the downside is that I'd lose the ability to sort or filter by any of the facet counts.
If I have documents (lets say books) I want to search that have a facet (lets say genre) where the document can have many values for that facet, so for example a book could be both "young adult", "fiction", "sci-fi"
Can azure search faceting handle this situation and if so can it do it from simple strings with a delimeter?
Define the genre field in your index as a string collection (Collection(Edm.String) and make it facetable. When indexing documents, pass the values for that field as a JSON array:
{
... other properties
"genre" : [ "young adult", "fiction", "sci-fi" ]
}
Imagine a SolR-index with documents similar to this
[
{
ProductId: 123,
Contract: abc
},
{
ProductId: 123,
Contract: def
},
{
ProductId: 123
},
{
ProductId: 567
},
{
ProductId: 567,
Contract: bar
}
]
There is always a document with a specific ProductId and without a Contract
Additionally there may be 0 to n documents with Contract
I need a query, where I can use a Contract and that should return me all ProductIds either the one with the given Contract, if exists, or the single document without a Contract at all.
For example I will make a query with Contract: def (somehow) and it should give me this
[
{
ProductId: 123,
Contract: def
},
{
ProductId: 567
}
]
The document with Contract:abc is not part of the result
The document with ProductId:123 but without Contract is not part of the result
The document ProductId:567 is part of the result, because there is no document with this ProductId and ContractId: def
In other words what I need is something like
Give me one documents per ProductId and with Contract:X XOR -Contract*, but not both.
Step 1 Write your query so that records without Contracts as well as all with matching contracts are returned, but the ones with the appropriate contract have the highest score. This gets around the problem that you will sometimes want items in your results that don't match the contract value: q=Contract:"def" OR (*:* -Contract:[* TO *]). The (*:* -Contract:[* TO *]) matches that all records without contracts, and the Contract:"def" matches records with the correct contract. The records matching Contract:"def" should naturally have a higher score than those with no contract, but if there's any trouble or you just want to be sure, you can add a boost to that clause, Contract:"def"^2.
Step 2 Add Result Grouping to the query, configured so that you are requesting only the highest scoring record for any given ProductId:
q=Contract:"def" OR (*:* -Contract:[* TO *])&group=true&group.field=ProductId
This requires that the ProductId field be configured in your schema.xml as multiValued="false", as multiValued fields cannot be used as groups. I'm also assuming that you are using the Standard Query Parser, either set as a default in your solrconfig.xml or by adding the argument defType=lucene when you make the query.
The results should look something like this:
'grouped'=>{
'ProductId'=>{
'matches'=>5,
'groups'=>[{
'groupValue'=>123,
'doclist'=>{'numFound'=>3,'start'=>0,'docs'=>[
{
'ProductId'=>123,
'Contract'=>'def'}]
}},
{
'groupValue'=>567,
'doclist'=>{'numFound'=>2,'start'=>0,'docs'=>[
{
'ProductId'=>567}]
}}]}}}
Note that neither the matches nor the numFound values in the result set will tell you how many groups have been returned, but the argument rows=XX can be used to define the maximum number of desired groups (in this case ProductIds).