Solr sorting text in Polish

Solr sorting text in Polish - solr

I have solr 5.2.1 and such definition of field which is used for sorting:
<fieldType name="polishSortVarchar" class="solr.ICUCollationField" locale="pl_PL" strength="secondary" />
After reindex sorting almost work as I want:
{
"responseHeader": {
"status": 0,
"QTime": 2,
"params": {
"fl": "name_varchar",
"sort": "sort_name_varchar asc",
"indent": "true",
"q": "*:*",
"_": "1454575147254",
"wt": "json",
"rows": "10"
}
},
"response": {
"numFound": 5250,
"start": 0,
"docs": [
{
"name_varchar": "\"Europą\" na Antarktydę"
},
{
"name_varchar": "1:0 dla Korniszonka"
},
{
"name_varchar": "1001 faktów o roślinach"
}
]
}
}
As You see on first position is phrase with " on 1st char, I want filter special chars and sort only by letters (so this phrase will be sorted by 'E' on first position).
Anybody?

I can't find solution directly in SOLR, so I clean unnecessary chars during indexation.
$sortValue = preg_replace('/[^A-Za-z0-9- zżźćńółęąśŻŹĆĄŚĘŁÓŃ]/u', '', $sortValue);

Related

No results being displayed if the character length is more than is 15 letters in Solr Search

While working with Solr Search encountered one issue.
When a search query is single syllable/word greater than 15 characters we are getting no results.
But if the same search query is shortened to less than 15 characters then we are getting results.
How can we increase the character limit to get search result in both the cases?
Case 1: Greater than 15 Characters
curl -XGET 'http://localhost:8983/solr/techproducts/query?debug=query&q=katanasmartwatch'
Result:
{
"responseHeader": {
"status": 0,
"QTime": 0,
"params": {
"q": "katanasmartwatch",
"debug": "query"
}
},
"response": {
"numFound": 0,
"start": 0,
"docs": []
}
}
Case 2: Less than 15 Characters
curl -XGET 'http://localhost:8983/solr/techproducts/query?debug=query&q=katanasmartwatc'
Result
{
"responseHeader": {
"status": 0,
"QTime": 0,
"params": {
"q": "katanasmartwatc",
"debug": "query"
}
},
"response": {
"numFound": 1,
"start": 0,
"docs": [{...}]
}
}

Grouping results in SOLR?

I have a Solr index with a schema that looks like this:
{
"responseHeader": {
"status": 0,
"QTime": 0,
"params": {
"q": "*:*",
"q.op": "OR",
"_": "1673422604341"
}
},
"response": {
"numFound": 1206,
"start": 0,
"numFoundExact": true,
"docs": [
{
"material_name_s":"MaterialName1",
"company_name_s": "CompanyName1",
"price_per_lb_value_f": 1.11,
"received_date_dt": "2015-01-01T00:00:00Z"
},
{
"material_name_s":"MaterialName1",
"company_name_s": "CompanyName2",
"price_per_lb_value_f": 2.22,
"received_date_dt": "2020-01-01T00:00:00Z"
},
{
"material_name_s":"MaterialName1",
"company_name_s": "CompanyName3",
"price_per_lb_value_f": 3.33,
"received_date_dt": "2021-01-01T00:00:00Z"
},
{
"material_name_s":"MaterialName2",
"company_name_s": "CompanyName1",
"price_per_lb_value_f": 4.44,
"received_date_dt": "2016-01-01T00:00:00Z"
},
{
"material_name_s":"MaterialName2",
"company_name_s": "CompanyName2",
"price_per_lb_value_f": 5.55,
"received_date_dt": "2021-01-01T00:00:00Z"
},
{
"material_name_s":"MaterialName2",
"company_name_s": "CompanyName3",
"price_per_lb_value_f": 6.66,
"received_date_dt": "2022-01-01T00:00:00Z"
}
]
}
}
These are historical prices for different materials from different companies.
I would like to get the lowest price_per_lb_value_f for each material_name_s in last 2 years, so the results would look like this:
{
"response": {
"numFound": 2,
"start": 0,
"numFoundExact": true,
"docs": [
{
"material_name_s":"MaterialName1",
"company_name_s": "CompanyName3",
"price_per_lb_value_f": 3.33,
"received_date_dt": "2021-01-01T00:00:00Z"
},
{
"material_name_s":"MaterialName2",
"company_name_s": "CompanyName2",
"price_per_lb_value_f": 5.55,
"received_date_dt": "2021-01-01T00:00:00Z"
}
]
}
}
Is this kind of grouping is even possible to do with Solr?
I'm a newbie to Solr, so any help would be appreciated.

grouping is possible in Solr.
You can get the result you want with the following queries:
Field collapsing approach (recommended in your case):
https://solr.apache.org/guide/solr/latest/query-guide/collapse-and-expand-results.html
http://localhost:8983/solr/test/select?indent=true&q.op=OR&q=received_date_dt:[NOW-3YEAR%20TO%20*]&fq={!collapse%20field=material_name_s%20min=price_per_lb_value_f}
q:received_date_dt:[NOW-3YEAR TO *] // Range query to filter only the documents received in the last 3 years otherwise I wouldn't get documents received on 2021-01-01
fq:{!collapse field=material_name_s min=price_per_lb_value_f} // It shows only one document within all documents with the same value of material_name_s. It gets the document with the min price_per_lb_value_f
Grouping approach: https://solr.apache.org/guide/solr/latest/query-guide/result-grouping.html
http://localhost:8983/solr/test/select?indent=true&q.op=OR&q=received_date_dt:[NOW-3YEAR%20TO%20*]&group=true&group.field=material_name_s&group.sort=price_per_lb_value_f%20asc
q:received_date_dt:[NOW-3YEAR TO *] // same filter as before
group:true // enable grouping
group.field:material_name_s // groups by material_name_s
group.sort:price_per_lb_value_f asc // sort each group by the field price_per_lb_value_f in ascending order
group.limit not specified as the default value is 1 // it sets the number of results for each group

Showing facet result even though not a single result found for solrQuery

I'm Working with Solr 8.0.0 & i am facing the problem when using the exclude Tag
My solr query looks like below:
http://localhost:8984/solr/HappyDemo202/select?q=*:*&
rows=6&
start=0&
wt=json&
fq={!tag=CATFACET}cat:((desktops))&
fq={!tag=TAGFACET}tag:((cool))&
fq={!tag=Price}Price:[1200 TO 1245]&
json.facet={CatFacet:{type:terms,field:cat,domain:{excludeTags:CATFACET},limit:-1,sort:{count:desc}},TagsFacet:{ type:terms,field:tag,domain:{excludeTags:TAGFACET},limit:-1,sort:{count:desc}}}
Output Of Query looks like below:
{ "responseHeader": {
"status": 0,
"QTime": 0,
"params": {
"q": "*:*",
"json.facet": "{CatFacet:{type:terms,field:cat,domain:{excludeTags:CATFACET},limit:-1,sort:{count:desc}},TagsFacet:{ type:terms,field:tag,domain:{excludeTags:TAGFACET},limit:-1,sort:{count:desc}}}",
"start": "0",
"fq": [
"{!tag=CATFACET}cat:((desktops))",
"{!tag=TAGFACET}tag:((cool))",
"{!tag=Price}Price:[1200 TO 1245]"
],
"rows": "6",
"wt": "json"
}
}, "response": {
"numFound": 0,
"start": 0,
"docs": [] },
"facets": {
"count": 0,
"CatFacet": {
"buckets": []
},
"TagsFacet": {
"buckets": [
{
"val": "new",
"count": 1
},
{
"val": "new1",
"count": 1
}
]
} } }
When you check the Output of Query,CatFacet is not showing any facet result because numFound is 0 but TagsFacet is showing the two facet result like new & new1. I don't know what going wrong , tagFacet must not show the two facet result if numFound is 0.
Can you please suggest,what's going wrong ? Any help will be appreciable.

You're explicitly asking for the fq for tag to be excluded for the tag facet ({excludeTags:TAGFACET}) - meaning that if you didn't have that fq there, there would be results - and you're asking for a count without that field.
If you want the facet to only count those documents being returned, drop the excludeTags value for all those facets that should only be returned for documents that are included in the result set.

Solr removing the 'e' from ace001 search term

Solr is removing the letter 'e' from search queries...
I'm pretty new when it comes to Solr so I don't really know where to start looking to figure this out but whenever I send a search query Solr is stripping out the 'e' character...
As you can see here when I try and search the term ace001
{
"responseHeader": {
"status": 0,
"QTime": 1,
"params": {
"q": "_text:ace001",
"indent": "true",
"wt": "json",
"debugQuery": "true",
"_": "1478467316690"
}
},
"response": {
"numFound": 0,
"start": 0,
"docs": []
},
"debug": {
"rawquerystring": "_text:ace001",
"querystring": "_text:ace001",
"parsedquery": "PhraseQuery(_text:\"ac 001 ac 001\")",
"parsedquery_toString": "_text:\"ac 001 ac 001\"",
"explain": {},
"QParser": "LuceneQParser",
"timing": {
"time": 1,
"prepare": {
"time": 1,
"query": {
"time": 1
},
"facet": {
"time": 0
},
"mlt": {
"time": 0
},
"highlight": {
"time": 0
},
"stats": {
"time": 0
},
"spellcheck": {
"time": 0
},
"debug": {
"time": 0
}
},
"process": {
"time": 0,
"query": {
"time": 0
},
"facet": {
"time": 0
},
"mlt": {
"time": 0
},
"highlight": {
"time": 0
},
"stats": {
"time": 0
},
"spellcheck": {
"time": 0
},
"debug": {
"time": 0
}
}
}
}
}
Searching a different term such as 'acb001' doesn't strip the 'b' but I noticed it does separate the numbers from the letters. I'd want Solr to match the term 'acb001' in the text field...
extract:
"rawquerystring": "_text:acb001",
"querystring": "_text:acb001",
"parsedquery": "PhraseQuery(_text:\"acb 001 acb 001\")",
"parsedquery_toString": "_text:\"acb 001 acb 001\"",
"explain": {},
"QParser": "LuceneQParser",
Would really appreciate some direction here as to how I can either further debug or ideally fix this so ace001 returns all the occurrences of just that.
Edit:
Schema is standard/default http://pastebin.com/59LbmJUp

this is happening because of solr.PorterStemFilterFactory. your default search field id is htmltext which has
<filter class="solr.PorterStemFilterFactory"/>
in the query analysis.
the PorterStemmer stems the word "ace" to "ac".
you can check it here https://tartarus.org/martin/PorterStemmer/voc.txt
search for the word "ace".
now look here which has corresponding output after stemming https://tartarus.org/martin/PorterStemmer/output.txt the corresponding word after stemming which will be "ac"
to solve this revmoe the filter during query as well as index in solrconfig.xml
Also you are using WordDelimiterFilterFactory, which will split words on alphanumeric bounderies. that is why you see "ac" and "001", if you do not want that then remove that filter too in schema.xml
you are using default schema.xml which has a lot of these unnecessary filters which you might not even need. I would suggest to strip it down to a few filters. and then add filters as you need instead of the other way.

Solr Field Alias in Solr Search is not working

The following link says we can use field aliases like id,price:crazy_price_field etc. I am trying to use it but this is not working.
http://wiki.apache.org/solr/CommonQueryParameters#Field_alias
https://issues.apache.org/jira/browse/SOLR-1205
My Query:
http://localhost:8080/solr/ee_core/select?indent=on&version=2.2&q=\*%3A\*&fq=%2BinstanceId_index_store%3A217&start=0&rows=10&fl=description_index_store%2Cscore&qt=&wt=json
fl=description_index_store,score gives correct result with field names description_index_store and score
{
"responseHeader": {
"status": 0,"QTime": 1,
"params": {
"explainOther": "","fl": "description_index_store,score",
"indent": "on","start": "0","q": "*:*","hl.fl": "","qt": "",
"wt": "json","fq": "+instanceId_index_store:217","rows": "3",
"version": "2.2"
}
},
"response": {
"numFound": 128,"start": 0,"maxScore": 1,
"docs": [
{
"description_index_store": "Apple MacBook - Intel Core 2 Duo",
"score": 1
},
{
"description_index_store": "Apple MacBook - Intel Core 2 Duo",
"score": 1
},
{
"description_index_store": "HP Envy - 17.3\" - Intel Core i7",
"score": 1
}
]
}
}
but when i try to use alias like fl=description:description_index_store,score in the same query it doesn't return the field.
{
"responseHeader": {
"status": 0,"QTime": 0,
"params": {
"explainOther": "","fl": "description:description_index_store,score",
"indent": "on","start": "0","q": "*:*","hl.fl": "","qt": "",
"wt": "json","fq": "+instanceId_index_store:217","rows": "3",
"version": "2.2"
}
},
"response": {
"numFound": 128,"start": 0,"maxScore": 1,
"docs": [
{
"score": 1
},
{
"score": 1
},
{
"score": 1
}
]
}
}

You're referring to a feature which has been added to the 4.0 version of Solr, not yet released. In fact, within the fl section of that wiki page there's an exclamation mark which tells you that the following content (still within the fl section) is available only with Solr 4.0.
The SOLR-1205 issue has been addressed, together with other improvements, within SOLR-2444: Update fl syntax to support: pseudo fields, AS, transformers, and wildcards, which will be released with Solr 4.0. You might want to have a look at the Solr 4.0 roadmap to find out when it should be released.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Solr sorting text in Polish - solr

I can't find solution directly in SOLR, so I clean unnecessary chars during indexation. $sortValue = preg_replace('/[^A-Za-z0-9- zżźćńółęąśŻŹĆĄŚĘŁÓŃ]/u', '', $sortValue);

Related

No results being displayed if the character length is more than is 15 letters in Solr Search

Grouping results in SOLR?

Showing facet result even though not a single result found for solrQuery

Solr removing the 'e' from ace001 search term

Solr Field Alias in Solr Search is not working

Categories

Resources