Solr json facet query, unique for particular values - solr

I have following query , For the two company ids, I would also like to get the unique rows (unique_internal_plays and unique_external_plays). Is that possible ?
{
"facet":{
"unique_viewers" : "unique(uuid)",
"internal_plays": {
"type": "query",
"q": "company:100"
},
"external_plays": {
"type": "query",
"q": "-company:100"
},
"unique_internal_plays": {
"type": "query",
"q": "company:100"
},
"unique_external_plays": {
"type": "query",
"q": "-company:100"
}
}
}

For any facet in the JSON Facet API you can further divide the given facet into nested facets. If you combine this with a stats facet (an aggregate facet), you can get the unique count for a field in that specific bucket:
"internal_plays": {
"type": "query",
"q": "company:100",
"facet": {
"unique_viewers": "unique(uuid)"
}
}
This will create a nested facet under the facet query, effectively giving you a way to further pivot/run statistics across the set for the matching documents.

Related

how to let Azure facet query just return facet

I have this basic facet query &facet=IncAcc and I just want to return a count of different facet values.
Azure search returned not only facet values and Count, but also an array of search documents. As I just need facet value and count, is there a way to tell azure search not to return anything else?
"#search.facets": {
"IncAcc": [
{
"count": 8124,
"value": "I"
},
{
"count": 6464,
"value": "A"
},
{
"count": 5,
"value": ""
}
]
},"value": [
{
}

Search for exact field in an array of strings in elasticsearch

Elasticsearch version: 7.1.1
Hi, I try a lot but could not found any solution
in my index, I have a field which is containing strings.
so, for example, I have two documents containing different values in locations array.
Document 1:
"doc" : {
"locations" : [
"Cloppenburg",
"Berlin"
]
}
Document 2:
"doc" : {
"locations" : [
"Landkreis Cloppenburg",
"Berlin"
]
}
a user requests a search for a term Cloppenburg
and I want to return only those documents which contain term Cloppenburg
and not Landkreis Cloppenburg.
the results should contain only Document-1.
but my query is returning both documents.
I am using the following query and getting both documents back.
can someone please help me out in this.
GET /my_index/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"doc.locations": {
"query": "cloppenburg",
"operator": "and"
}
}
}
]
}
}
}
The issue is due to your are using the text field and match query.
Match queries are analyzed and used the same analyzer of search terms which is used at index time, which is a standard analyzer in case of text fields. which breaks text on whitespace on in your case Landkreis Cloppenburg will create two tokens landkreis and cloppenburg both index and search time and even cloppenburg will match the document.
Solution: Use the keyword field.
Index def
{
"mappings": {
"properties": {
"location": {
"type": "keyword"
}
}
}
}
Index your both docs and then use same search query
{
"query": {
"bool": {
"must": [
{
"match": {
"location": {
"query": "Cloppenburg"
}
}
}
]
}
}
}
Result
"hits": [
{
"_index": "location",
"_type": "_doc",
"_id": "2",
"_score": 0.6931471,
"_source": {
"location": "Cloppenburg"
}
}
]

Is there a way to dereference parameters across different facets in Solr?

I have a Solr JSON facet query which calculates a metric for the present year. I want to enhance this query to calculate this exact metric for the previous year as well and then calculate the ratio of increase/decrease.
Here is the JSON facet query that I have written so far -
json.facet={
"thisYear": {
"type": "terms",
"field": "<some-value>",
"domain": {
"filter": "<query to identify this year's document>"
},
"facet": {
"thisYearFacet": "<the metric calculated for the present year>"
}
},
"lastYear": {
"type": "terms",
"field": "<some-value>",
"domain": {
"filter": "<query to identify last year's document>"
},
"facet": {
"lastYearFacet": "<the metric calculated for the last year>"
}
},
// Here is where I am facing trouble!
"compare": {
"type": "func",
"func": "avg(div($thisYearFacet,$lastYearFacet))"
}
}
But running the above query throws the following error -
"error":{
"metadata":[
"error-class","org.apache.solr.common.SolrException",
"root-error-class","org.apache.solr.search.SyntaxError"],
"msg":"org.apache.solr.search.SyntaxError: Missing param thisYearFacet while parsing function 'avg(div($thisYearFacet,$lastYearFacet))'",
"code":400}}
Is there a way to make the calculated variables "thisYearFacet" and "lastYearFacet" accessible in the "compare" facet?

"There is no index available for this selector" despite the fact I made one

In my data, I have two fields that I want to use as an index together. They are sensorid (any string) and timestamp (yyyy-mm-dd hh:mm:ss).
So I made an index for these two using the Cloudant index generator. This was created successfully and it appears as a design document.
{
"index": {
"fields": [
{
"name": "sensorid",
"type": "string"
},
{
"name": "timestamp",
"type": "string"
}
]
},
"type": "text"
}
However, when I try to make the following query to find all documents with a timestamp newer than some value, I am told there is no index available for the selector:
{
"selector": {
"timestamp": {
"$gt": "2015-10-13 16:00:00"
}
},
"fields": [
"_id",
"_rev"
],
"sort": [
{
"_id": "asc"
}
]
}
What have I done wrong?
It seems to me like cloudant query only allows sorting on fields that are part of the selector.
Therefore your selector should include the _id field and look like:
"selector":{
"_id":{
"$gt":0
},
"timestamp":{
"$gt":"2015-10-13 16:00:00"
}
}
I hope this works for you!

Elasticsearch not returning hits for multi-valued field

I am using Elasticsearch with no modifications whatsoever. This means the mappings, norms, and analyzed/not_analyzed is all default config. I have a very small data set of two items for experimentation purposes. The items have several fields but I query only on one, which is a multi-valued/array of strings field. The doc looks like this:
{
"_index": "index_profile",
"_type": "items",
"_id": "ega",
"_version": 1,
"found": true,
"_source": {
"clicked": [
"ega"
],
"profile_topics": [
"Twitter",
"Entertainment",
"ESPN",
"Comedy",
"University of Rhode Island",
"Humor",
"Basketball",
"Sports",
"Movies",
"SnapChat",
"Celebrities",
"Rite Aid",
"Education",
"Television",
"Country Music",
"Seattle",
"Beer",
"Hip Hop",
"Actors",
"David Cameron",
... // other topics
],
"id": "ega"
}
}
A sample query is:
GET /index_profile/items/_search
{
"size": 10,
"query": {
"bool": {
"should": [{
"terms": {
"profile_topics": [
"Basketball"
]
}
}]
}
}
}
Again there are only two items and the one listed should match the query because the profile_topics field matches with the "Basketball" term. The other item does not match. I only get a result if I ask for clicked = ega in the should.
With Solr I would probably specify that the fields are multi-valued string arrays and are to have no norms and no analyzer so profile_topics are not stemmed or tokenized since all values should be treated as tokens (even the spaces). Not sure this would solve the problem but it is how I treat similar data on Solr.
I assume I have run afoul of some norm/analyzer/TF-IDF issue, if so how do I solve this so that even with two items the query will return ega. If possible I'd like to solve this index or type wide rather than field specific.
Basketball (with capital B) in terms will not be analyzed. This means this is the way it will be searched in the Elasticsearch index.
You say you have the defaults. If so, indexing Basketball under profile_topics field means that the actual term in the index will be basketball (with lowercase b) which is the result of the standard analyzer. So, either you set profile_topics as not_analyzed or you search for basketball and not Basketball.
Read this about terms.
Regarding to setting all the fields to not_analyzed you could do that with a dynamic template. Still with a template you can do what Logstash is doing: defining a .raw subfield for each string field and only this subfield is not_analyzed. The original/parent field still holds the analyzed version of the same text, maybe you will use in the future the analyzed field.
Take a look at this dynamic template. It's the one Logstash is using.
More specifically:
{
"template": "your_indices_name-*",
"mappings": {
"_default_": {
"_all": {
"enabled": true,
"omit_norms": true
},
"dynamic_templates": [
{
"string_fields": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"type": "string",
"index": "analyzed",
"omit_norms": true,
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
]
}
}
}

Resources