Solr full refresh doesn't clear the index - solr

I'm having an issue where Solr won't clear the index during a full import.
All of the servers run Solr 3.4, the configuration is as vanilla as it can be.
I tried this on our development environment and on an instance on my own computer, and received similar results.
The schema is rather simple, these are the salient points:
<schema name="System" version="1.4">
...
</types>
<fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true" />
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" omitNorms="true" positionIncrementGap="0" />
<fieldType name="date" class="solr.TrieDateField" omitNorms="true" precisionStep="0" positionIncrementGap="0" />
<fieldType name="documentKey" class="solr.TextField">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
</analyzer>
</fieldType>
</types>
<fields>
<field name="document_id" type="documentKey" indexed="true" stored="true" required="true" />
<field name="entity_id" type="long" indexed="true" stored="true" required="true" />
<field name="name" type="string" indexed="true" stored="true" required="true" />
<field name="entity_type" type="string" indexed="true" stored="true" required="false" />
<field name="Timestamp" type="date" indexed="true" stored="true" default="NOW" multiValued="false"/>
</fields>
</schema>
Of note:
- The document_id field is calculated in the materialized view which is used to populate the index, and is a combination of other fields not in this index, but is indipendent of the entity_id. It's unique.
- The entity_id field is the key of a couple of tables, and for the same document_id it can change wildly between a refresh and another.
Before a full refresh, if I query the index as such:
http://localhost:8080/qq-solr/system/select/?rows=10&q=document_id:%22French_Polynesia/Huahine~4034376%22
I get:
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">5</int>
<lst name="params">
<str name="indent">true</str>
<str name="q">document_id:"French_Polynesia/Huahine~4034376"</str>
<str name="rows">10</str>
</lst>
</lst>
<result name="response" numFound="1" start="0">
<doc>
<date name="Timestamp">2012-03-08T09:47:26.335Z</date>
<str name="document_id">French_Polynesia/Huahine~4034376</str>
<long name="entity_id">22902728</long>
<str name="name">Huahine</str>
<str name="type">LOCATION</str>
</doc>
</result>
</response>
Then I refresh:
http://localhost:8080/qq-solr/system/dataimport?command=full-import&clean=true&commit=true&optimize=true
(I know the clean, commit, and optimize are redundant, but I used them just to make sure) and after a while I get the message that everything is a-ok.
Then I query the index again:
http://localhost:8080/qq-solr/system/select/?rows=10&q=document_id:%22French_Polynesia/Huahine~4034376%22
And I get:
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">5</int>
<lst name="params">
<str name="indent">true</str>
<str name="q">document_id:"French_Polynesia/Huahine~4034376"</str>
<str name="rows">10</str>
</lst>
</lst>
<result name="response" numFound="1" start="0">
<doc>
<date name="Timestamp">2012-03-09T08:31:07.317Z</date>
<str name="document_id">French_Polynesia/Huahine~4034376</str>
<long name="entity_id">22902728</long>
<str name="name">Huahine</str>
<str name="type">LOCATION</str>
</doc>
</result>
</response>
But in the database the entity_id is different!
I see that the Timestamp has been updated, so that record has been touched, but why is the old value being retained?

I would run your DataImportHandler (DIH) process through the Interactive Development Mode so that you can assure that your database query is retrieving the entity_id that you are expecting. Because the timestamp on the solr entry is being updated, your DIH process is running, but I am guessing the cause for this lies in the way the data is being retrieved.

Any time I'm doing an operation like this with Solr, I always manually clear the index first using curl to be 100% sure its wiped. Here is a tutorial: http://www.alphadevx.com/a/365-Clearing-a-Solr-search-index

Related

How to use Solr Suggester ContextField with boolean field

I am using Solr 6.0.0
I am tring to filter out unwanted suggestions from Solr Suggester. In my Solr database I have all my products
My products all have a boolean field "ShowOnSite". Products that are ready for sale have this value set to true. Products not yet ready have it set to false.
When I try to filter the suggested results from the suggester using this boolean field, I always get 0 results, even though I have plenty of products ready to be shown.
My Products looks somewhat like this like this:
<field name="id" type="string" indexed="true" stored="true" required="true"/>
<field name="Name" type="string" indexed="true" stored="true"/>
<field name="ShowOnSite" type="boolean" indexed="true" stored="true" />
<field name="text_autocomplete" type="textSuggest" indexed="true" stored="true"/>
The textSuggest fieldType has the following configuration:
<fieldType class="solr.TextField" name="textSuggest" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
My suggester looks like this
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.count">20</str>
<str name="wt">json</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">default</str>
<str name="lookupImpl">AnalyzingInfixLookupFactory</str>
<str name="highlight">true</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">text_autocomplete</str>
<str name="weightField">InStock</str>
<str name="contextField">ShowOnSite</str>
<str name="suggestAnalyzerFieldType">textSuggest</str>
<str name="buildOnStartup">true</str>
</lst>
</searchComponent>
My query looks like this:
/suggest?suggest.q={querystring}&suggest.cfq=true
Expected
I receive only the products that has "ShowOnSite" == true
Actual
I receive 0 proucts from the suggester
I have tried other configurations aswell. By using not true I get all products:
/suggest?suggest.q={querystring}&suggest.cfq=-true
I have also tried to add the field name in the cfq. This yields 0 products:
/suggest?suggest.q={querystring}&suggest.cfq=ShowOnSite:true
EDIT1
I have also tried using either 0 or 1 for false and true respectively. These do not work either
Initial guess is that this is caused by the boolean type of the field, since no analysis happens as far as I know for the values used by the cfq.
Make a secondary field as a string field and store the false or true value verbatim in that field - and use that for filtering instead.
As suggested by MatsLindh. Use a text field instead.
The easiest way is to just copy that field:
Add this to the managed-schema file of your index (in Solr):
<field name="THE_FIELD_TO_BE_USED_BY_THE_SUGGESTER" type="text_general" indexed="true" stored="true" multiValued="false"/>
<copyField source="YOUR_BOOLEAN_FIELD" dest="THE_FIELD_TO_BE_USED_BY_THE_SUGGESTER" maxChars="30000" />

Solr edismax relevancy sorting multiple fields

I use the edismax query parser to handle user queries against our Solr 4.4 server.
Im getting correct query ,but require help with the prioritization.
For example if i give q=ideapad miix 310
1)It will get all the exact matched ,this is working fine .Now if the results
contains ideapad instead of full matched word it should be given least priority
2)prioritization of results in this order
field8,keywords,product,marketing,description .Also here ideapad will be
have least priority.
MY bq:
bq:text:"ideapad miix 310"^20000 OR (text:"miix"^12000 -text:ideapad^-20 -text:thinkpad^-20 -text:ideacentre^-20 -text:thinkcentre^-20 text:"310"^1000 -text:ideapad^-20 -text:thinkpad^-20 -text:ideacentre^-20 -text:thinkcentre^-20)
URL
http://localhost:8983/solr/collection1/select?q=ideapad+miix+310&defType=edismax&bq=text%3A%22ideapad+miix+310%22%5E20000++OR+(text%3A%22miix%22%5E12000+-text%3Aideapad%5E-20+-text%3Athinkpad%5E-20+-text%3Aideacentre%5E-20+-text%3Athinkcentre%5E-20+text%3A%22310%22%5E1000+-text%3Aideapad%5E-20+-text%3Athinkpad%5E-20+-text%3Aideacentre%5E-20+-text%3Athinkcentre%5E-20)
I use the catch all field "text" and boosted copied each fields(field8,keywords etc....)
<field name="field8" type="text_search" indexed="true" stored="true" omitNorms="true"/>
<field name="description" type="text_search" indexed="true" stored="true" omitNorms="true"/>
<field name="keywords" type="commaDelimited" indexed="true" stored="true" omitNorms="true"/>
<field name="product" type="commaDelimited" indexed="true" stored="true" omitNorms="true" omitPositions="true" omitTermFreqAndPositions="true"/>
<field name="marketing" type="commaDelimited_s" indexed="true" stored="true" omitNorms="true" omitPositions="true" omitTermFreqAndPositions="true"/>
<copyField source="field8" dest="text"/>
<copyField source="field8" dest="text"/>
My solrconfig for edismax i have boosted the fields
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<str name="defType">edismax</str>
<str name="qf">
text^100 field8^90 keywords^80 product^70 marketing^60 description^10
</str>
<str name="pf">
text^100 field8^90 keywords^80 product^70 marketing^60 description^10
</str>
</lst>
</requestHandler>

Solr Suggester - Store Lookup build failed

I've exhausted my search efforts as to why this isn't working. I believe I'm following the documentation correctly found at https://cwiki.apache.org/confluence/display/solr/Suggester
However, every time I attempt to build the suggester, I receive the error "SolrSuggester - Store Lookup build failed." in the logs. I can see it creating the directory for the store correctly on disk, however, there is no data within the file.
I've also tried removing the line <str name="storeDir">fuzzy_dir</str>. If I do this and try building, I don't receive the error in the logs, however, I still receive no results.
Can anyone see what I may be doing wrong?
I'm using Solr 6.5.0.
Here is what I have in my schema.xml:
<field name="name" type="text_general" indexed="true" stored="true" required="true" multiValued="false" />
<field name="term" type="suggestType" indexed="true" stored="true" />
<copyField source="name" dest="term" />
<fieldType name="suggestType" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
Here is what I have in my solrconfig.xml:
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">fuzzySuggester</str>
<str name="lookupImpl">FuzzyLookupFactory</str>
<str name="storeDir">fuzzy_dir</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">term</str>
<str name="suggestAnalyzerFieldType">suggestType</str>
<str name="buildOnStartup">false</str>
<str name="buildOnCommit">false</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy" >
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.dictionary">fuzzySuggester</str>
<str name="suggest.count">5</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
This is how I'm executing the build:
http://localhost:8983/solr/my_core/suggest?suggest.build=true
you might be missing this in the fuzzySuggester:
<str name="weightField">WEIGHT</str>
even if the docs say it's an optional param, I think it might be what is messing with you. If you don't have a good field that you can use, you can just declare one like this:
<field name="WEIGHT" type="tfloat" indexed="true" stored="true" multiValued="false" />
and just don't bother putting any data into it.
Try giving suggester.dictionary=fuzzySuggester in the query.
http://localhost:8983/solr/my_core/suggest?suggest.build=true&suggester.dictionary=fuzzySuggester
After endless hours of scouring the internet and attempting suggestions provided by others on this post, I've come to the conclusion something in my solrconfig.xml or schema.xml file was corrupt.
My fix was to create a completely new core and migrate the pieces I was using in solrconfig.xml and schema.xml to get it to work. Unfortunately I don't have a better answer, but this is what I had to do in order to solve the problem.

How to get Suggestions in Solr 5.3.0

I am trying to implement auto complete feature using Solr 5.3.0
solrconfig.xml looks like this
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">default</str>
<str name="lookupImpl">FuzzyLookupFactory</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">suggest_ngram</str>
<str name="weightField">price</str>
<str name="suggestAnalyzerFieldType">text_suggest_ngram</str>
<str name="buildOnStartup">true</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy" >
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.count">10</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
managed-schema looks like this:
<fieldType name="text_suggest_ngram" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" maxGramSize="10" minGramSize="2" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<field name="suggest_ngram" type="text_suggest_ngram" indexed="true" stored="false"/>
<field name="name" type="string" multiValued="false" indexed="true" stored="true"/>
<field name="price" type="tlong" multiValued="false" indexed="true" stored="true"/>
<copyField source="name" dest="suggest_ngram"/>
Now when I use the analyzer from the admin panel of Solr, I can see the indexed ngrams. And it successfully points out the match.
However when I use the query:
http://localhost:8983/solr/products/suggest?suggest=true&suggest.build=true&wt=json&suggest.q=Jind
I get 0 suggestions.
The response is here:
https://api.myjson.com/bins/47r3i
There exists a value "Jindal Panther" for the name key in one of the docs.
Moreover, I have found that if I create a dummy copyfield "suggest" with type as "String", with source as "name", any suggestion that works fine on "name" will not work on "suggest". Can this be any misconfiguration of copyfield to enable suggestions?
Any help would be appreciated.
Thanks in advance.
EDIT:
Got the solution. See the accepted answer and its comments below.
There is a blog that I encountered that beautifully explains Suggesters. It is definitely worth reading for a newbie to Solr Search.
https://lucidworks.com/blog/2015/03/04/solr-suggester/
The field on which you want to configure the suggester should be store=true. It need not to be indexed. The suggester configuration will build a dictionary according to the provide configuration in the suggestComponet. The name field have stored as true where as suggest_ngram is not. You need to update the schema configuration like this:
<field name="suggest_ngram" type="text_suggest_ngram" indexed="false" stored="true"/>
Also you need to provide the parameter suggest.dictionary, the dictionary you are using for suggestions. For you it is names as default.
http://localhost:8983/solr/products/suggest?suggest=true&
suggest.build=true&
wt=json&
suggest.dictionary=default&
suggest.q=Jind
OR you can provide the dictionary configuration in requestHandler of /suggest:
<str name="suggest.dictionary">default</str>

How do I get solr to return results from all indicies?

I am starting to integrate with Solr and have run across what I perceive as an issue. I uploaded a simple spreadsheet using the java API (here is an exert:
- Document, id, value
- Excel3, name, steelers
- Excel3, subject, pirates
- Excel3, description, penguins
- Excel3, comments, panthers
- Excel3, author, panthers
)
Using this I used the first column as the "document name", second column as the field in the document to index, and the third column as the indexed data. All of these fields already existed in schema.xml, but here is how they are set up:
<field name="id" type="string" indexed="true" stored="true" required="true" />
<field name="name" type="text_general" indexed="true" stored="true"/>
<field name="subject" type="text_general" indexed="true" stored="true"/>
<field name="description" type="text_general" indexed="true" stored="true"/>
<field name="comments" type="text_general" indexed="true" stored="true"/>
<field name="author" type="text_general" indexed="true" stored="true"/>
now here is where my problem comes into play. I run a search for say steelers, and it comes back fine, but if I look for penguins, or many of the other fields, it does not pull back any results. However if I do description:penguins, the result pulls back as expected.
Can anyone please help me understand why the part before the : is required for some fields, but not others?
example searches:
solr/select?indent=on&q=penguins&wt=xml ----Doesn't return any results
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
<lst name="params">
<str name="indent">on</str>
<str name="q">penguins</str>
<str name="wt">xml</str>
</lst>
</lst>
<result name="response" numFound="0" start="0"/>
</response>
solr/select?indent=on&q=description:penguins&wt=xml
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">18</int>
<lst name="params">
<str name="indent">on</str>
<str name="q">description:penguins</str>
<str name="wt">xml</str>
</lst>
</lst>
<result name="response" numFound="1" start="0">
<doc>
<str name="author">panthers</str>
<str name="comments">panthers</str>
<str name="description">penguins</str>
<str name="id">Excel3</str>
<str name="name">steelers</str>
<str name="subject">pirates</str>
</doc>
</result>
</response>
The default query parser will query the default field, which can be specified in the schema.xml as seen here: http://wiki.apache.org/solr/SchemaXml#The_Default_Search_Field
I think #Frank Famer's comment about using the DisMax parser is a real solution to this problem. That said, here are two work-arounds I've seen in practice:
1.Create an additional copyField that is indexed, not stored, that contains the values from all the fields you want to search and then specify that field as the default. It would look something like this in your schema.xml file.
<field name="myhugedefaultfield" type="text" indexed="true" stored="false" multiValued="true"/>
<copyField source="name" dest="myhugedefaultfield"/>
<copyField source="subject" dest="myhugedefaultfield"/>
<copyField source="description" dest="myhugedefaultfield"/>
<defaultSearchField>myhugedefaultfield</defaultSearchField>
2.Alter the user edited syntax and turn the query for penguins into a query for (name:penguins) OR (subject:penguins) OR (description:penguins).

Resources