I am creating a custom catch-all field with a porter stemmer in Apache Solr. I want to index and query the data using this field.
First I created a field type using the following JSON
{
"add-field-type":{
"name":"text_general_extended",
"class":"solr.TextField",
"positionIncrementGap":"100",
"multiValued":true,
"indexAnalyzer":{
"tokenizer":{
"class":"solr.StandardTokenizerFactory"
},
"filters":[
{
"class":"solr.StopFilterFactory",
"words":"stopwords.txt",
"ignoreCase":"true"
},
{
"class":"solr.PorterStemFilterFactory"
},
{
"class":"solr.LowerCaseFilterFactory"
}
]
},
"queryAnalyzer":{
"tokenizer":{
"class":"solr.StandardTokenizerFactory"
},
"filters":[
{
"class":"solr.StopFilterFactory",
"words":"stopwords.txt",
"ignoreCase":"true"
},
{
"class":"solr.PorterStemFilterFactory"
},
{
"class":"solr.SynonymGraphFilterFactory",
"expand":"true",
"ignoreCase":"true",
"synonyms":"synonyms.txt"
},
{
"class":"solr.LowerCaseFilterFactory"
}
]
}
}
}
Then I created a field with the above field type
{
"add-field":{
"name":"_text_extended",
"type":"text_general_extended",
"multiValued":true,
"indexed":true,
"stored":false
}
}
Finally, I created a copy field and added my text field i.e. passage_text in destination '_text_extended'
{"add-copy-field" : {"source":"passage_text","dest":"_text_extended"}}
I created all the JSON using the _default configset in Solr. I am not sure what am I missing here.
Appreciate the help!
Related
> db.Dashboards.find({user:{$regex:"adams"}}).pretty();
{
"_id" : ObjectId("123"),
"user" : "adam.adams",
"widgets" : [
{
"_id" : ObjectId("124"),
"column" : 1,
"configMap" : {
"coleg" : "bas.baser,cer.ceras,tom.tomsa"
},
}
]
}
I have a Mongo database that keeps records like the one above, unfortunately I need to find all users who have "cer.ceras" in the "coleg" field and then replace this with "per.peras"
I try with
db.Dashboards.find({widgets:{"$elemMatch":{configMap:{"$elemMatch":{coleg:/.*ceras.*/}}}}}}).pretty();
But I'm not finding anything for me
This may a bit complex.
Filter:
The criteria should work with $regex to find the occurrence of the text.
Update:
Require the update with aggregation pipeline.
$set - Set widgets field.
1.1. $map - Iterate the element in the widgets array and return a new array.
1.1.1. $mergeObjects - Merge current iterate document with the result of 1.1.1.1.
1.1.1.1. A document with configMap array. With $mergeObjects to merge current iterate configMap document with the result 1.1.1.1.1.
1.1.1.1.1. A document with coleg field. With $replaceAll to replace all matching text "cer.ceras" to "per.peras".
Update Options
{ multi: true } aims to update multiple documents.
db.collection.update({
widgets: {
$elemMatch: {
"configMap.coleg": {
$regex: "cer\\.ceras"
}
}
}
},
[
{
$set: {
widgets: {
$map: {
input: "$widgets",
in: {
$mergeObjects: [
"$$this",
{
configMap: {
$mergeObjects: [
"$$this.configMap",
{
coleg: {
$replaceAll: {
input: "$$this.configMap.coleg",
find: "cer.ceras",
replacement: "per.peras"
}
}
}
]
}
}
]
}
}
}
}
}
],
{
multi: true
})
Demo # Mongo Playground
I am using Solr 8.6.1, started in solrcloud mode.
The field type is
{
"add-field-type" : {
"name":"articleTitle",
"positionIncrementGap":100,
"multiValued":false,
"class":"solr.TextField",
"indexAnalyzer":{
"tokenizer":{ "class":"solr.StandardTokenizerFactory" },
"filters":[
{ "class":"solr.LowerCaseFilterFactory" },
{ "class":"solr.ManagedStopFilterFactory", "managed":"english" },
{ "class":"solr.ManagedSynonymGraphFilterFactory", "managed":"english" },
{ "class":"solr.FlattenGraphFilterFactory" },
{ "class":"solr.PorterStemFilterFactory" }
]
},
"queryAnalyzer":{
"tokenizer":{ "class":"solr.StandardTokenizerFactory" },
"filters":[
{ "class":"solr.LowerCaseFilterFactory" },
{ "class":"solr.ManagedStopFilterFactory", "managed":"english" },
{ "class":"solr.ManagedSynonymGraphFilterFactory", "managed":"english" },
{ "class":"solr.PorterStemFilterFactory" }
]
}
}
}
After I add a document
{
"id": 100,
"articleTitle": "Best smartphone"
}
I update the synonyms list by API
curl -X PUT -H 'Content-type:application/json' --data-binary '["iphone", "smartphone"]' "http://localhost:8983/solr/articles/schema/analysis/synonyms/english"
and reload the collection by API
http://localhost:8983/solr/admin/collections?action=RELOAD&name=articles
However when I try to search the documents don't pop-up.
http://localhost:8983/solr/articles/select?q=articleTitle:iphone
No result are returned. I expected that added document will be returned.
It works only if I first update the synonyms list and after that add the document into collection.
How to configure Solr to find the documents by synonyms if the synonyms are changed after documents are created?
This is an example of what my data looks like for an Elastic Search index called video_service_inventory:
{
'video_service': 'netflix',
'movies' : [
{'title': 'Mission Impossible', 'genre: 'action'},
{'title': 'The Hangover', 'genre': 'comedy'},
{'title': 'Zoolander', 'genre': 'comedy'},
{'title': 'The Ring', 'genre': 'horror'}
]
}
I have established in my index that the "movies" field is of type "nested"
I want to write a query that says "get me all video_services that contain both of these movies":
{'title': 'Mission Impossible', 'genre: 'action'}
AND
{'title': 'The Ring', 'genre': 'horror'}
where, the title and genre must match. If one movie exists, but not the other, I don't want the query to return that video service.
Ideally, I would like to do this in 1 query. So far, I haven't been able to find a solution.
Anyone have suggestions for writing this search query?
the syntax may vary depending on elasticsearch version, but in general you should combine multiple nested queries within a bool - must query. For nested queries you need to specify path to "navigate" to the nested documents, and you need to qualify the properties with the part + the field name:
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "movies",
"query": {
"bool": {
"must": [
{ "terms": { "movies.title": "Mission Impossible" } },
{ "terms": { "movies.genre": "action" } }
]
}
}
}
},
{
"nested": {
"path": "movies",
"query": {
"bool": {
"must": [
{ "terms": { "movies.title": "The Ring" } },
{ "terms": { "movies.genre": "horror" } }
]
}
}
}
}
]
}
}
}
This example assumes that the title and genre fields are not analyzed properties. In newer versions of elasticsearch you may find them as a .keyword field, and you would then use "movies.genre.keyword" to query on the not analyzed version of the data.ยจ
For details on bool queries you can have a look at the documentation on the ES website:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html
For nested queries:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-nested-query.html
In elasticsearch we have a type which has an array of objects. When trying to access from Kibana I am getting some inconsistencies while accessing
Here is an extract from my mapping,
{
"myIndex-2017.08.22": {
"mappings": {
"typeA": {
"properties": {
.
.
.
"Collection": {
"properties": {
.
.
.
"FileType": {
"type": "text"
}
}
}
}
}
}
}
}
Here I can have multiple objects in the Collection i.e., indexing it as an array. When I try too query using one FileType for example FileType: DOCX then I get some records with FileType as HTML as well.
When looking deeper I found that it is because some of the records which has two collection elements one with FileType: DOCX and one with FileType: HTML.
Why is filtering working like this ? Is there any other way to filter and get only FileType: DOCX and not display FileType: HTML.
Am running ES 5.3.
Elasticsearch flattens array fields out of the box, so
{
"files" : [
{
"name" : "name1",
"fileType" : "doc"
},
{
"name" : "name2",
"fileType" : "html"
}
]}
becomes:
{
"files.name" : [ "name1", "name2" ],
"files.fileType" : [ "doc", "html" ]
}
If you want to search for the objects itself in this array you have to use the nested datatype in the mapping of the collection:
{
"myIndex-2017.08.22": {
"mappings": {
"typeA": {
"properties": {
.
.
.
"Collection": {
"type": "nested",
"properties": {
.
.
.
"FileType": {
"type": "text"
}
}
}
}
}
}
}
}
Trying to perform an ES query, I ran into a problem while trying to do a nested filtering of objects in an array. Our structure of data has changed from being:
"_index": "events_2015-07-08",
"_type": "my_type",
"_source":{
...
...
"custom_data":{
"className:"....."
}
}
to:
"_index": "events_2015-07-08",
"_type": "my_type",
"_source":{
...
...
"custom_data":[ //THIS CHANGED FROM AN OBJECT TO AN ARRAY OF OBJECTS
{
"key":".....",
"val":"....."
},
{
"key":".....",
"val":"....."
}
]
}
this nested filter works fine on indices that have the new data structure:
{
"nested": {
"path": "custom_data",
"filter": {
"bool": {
"must": [
{
"term":
{
"custom_data.key": "className"
}
},
{
"term": {
"custom_data.val": "SOME_VALUE"
}
}
]
}
},
"_cache": true
}
}
However, it fails when going over indices that have the older data structure, so that feature cannot be added. Ideally I'd be able to find both data structures but at this point i'd settle for a "graceful failure" i.e. just don't return results where the structure is old.
I have tried adding an "exists" filter on the field "custom_data.key", and an "exists" within "not" on the field "custom_data.className", but I keep getting "SearchParseException[[events_2015-07-01][0]: from[-1],size[-1]: Parse Failure [Failed to parse source"
There is an indices filter (and query) that you can use to perform conditional filters (and queries) based on the index that it is running against.
{
"query" : {
"filtered" : {
"filter" : {
"indices" : {
"indices" : ["old-index-1", "old-index-2"],
"filter" : {
"term" : {
"className" : "SOME_VALUE"
}
},
"no_match_filter" : {
"nested" : { ... }
}
}
}
}
}
}
Using this, you should be able to transition off of the old mapping and onto the new mapping.