Solr not searching (dynamically created) fields - solr

I have imported docs into Solr that have fields dynamically created from a pattern (mostly *_s). In the back-end (/solr/admin), I can see that they exist: the aggregate stats, like term frequency, appear correctly. They are all listed as indexed & stored.
However, they do not appear in queries, even when I search across all fields, for example:
/solr/select/?indent=on&q=myterms&fl=*
This problem seems similar to SOLR not searching on certain fields, and I tried the solution there, which was:
If you want your standard query handler to search against all your fields you can change it in your solrconfig.xml (I always add a second query handler instead of modifying "standard". The fl field is the list of fields you want to search against. It's a comma separated list or *.
I made that change to the standard solrconfig.xml, but still get no results.
I tried creating a very simple doc:
{'id':5, 'name':'foo'}
And this query returns that doc:
/solr/select/?indent=on&q=foo&fl=*
The whole results of a query with no results read:
<response>
−
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>
−
<lst name="params">
<str name="echoParams">all</str>
<str name="h1">true</str>
<str name="defType">dismax</str>
<str name="indent">on</str>
<str name="start">0</str>
<str name="q">Foo</str>
<str name="version">2.2</str>
<str name="rows">10</str>
</lst>
</lst>
<result name="response" numFound="0" start="0"/>
</response>

Is the deftype of your "standard" query handler is dismax? If not, then it won't work. As the answer to the question you provided says, you have to use dismax to search in multiple fields. If you do not want to use dismax and still want to search in many fields at once, you have to use the copy fields feature at index time to gather all the fields you want to search on into one field, and then make that field your default field.

Since you're using _s you can copy those fields to "text" in solr/collection1/conf/schema.xml like this:
<copyField source="*_s" dest="text" maxChars="3000"/>
It's a slight variation the solution at Why do dynamic fields not act like normal fields (specifically when querying and displaying in Hue) in solr? which was to uncomment this *_t line:
<!-- Above, multiple source fields are copied to the [text] field.
Another way to map multiple source fields to the same
destination field is to use the dynamic field syntax.
copyField also supports a maxChars to copy setting. -->
<!-- <copyField source="*_t" dest="text" maxChars="3000"/> -->
This made my dynamic fields searchable with:
curl http://localhost:8983/solr/collection1/select?q=foo
Here's where the "catchall" text field is described:
<!-- catchall field, containing all other searchable text fields (implemented
via copyField further on in this schema -->
<field name="text" type="text_general" indexed="true" stored="false" multiValued="true"/>
See also http://wiki.apache.org/solr/SchemaXml#Copy_Fields

I see your using the query "Foo" while the name value is "foo". You might wanna check if you lowercase terms in de index and query in your schema for the fieldtype you are using for name.

Related

Solr MoreLikeThis handler returns 0 elements

EDIT: After changing the search field (mlt.fl) with another with the same features (type="text_general" indexed="true" stored="true"), it started working as it should. 'description' field is not empty in every documents so I don't know the reason of this difference, but if I'll discover more I will write here. I won't delete this question in the hope it will useful as a list of common solutions of the same problem.
I have some documents indexed with Solr v. 4.0.0 and I'm trying to obtain similar documents to a given one, but I'm getting no results.
What I have done:
1. in schema.xml, I've modified the field I want to search for, adding termVectors="true":
<field name="description" type="text_general" indexed="true" stored="true" termVectors="true" />
2. In solrconfig.xml I've added a RequestHandler:
<requestHandler name="/mlt" class="solr.MoreLikeThisHandler">
<lst name="defaults">
<str name="mlt.fl">description</str>
</lst>
</requestHandler>
3. I've restarted Solr and I execute this query:
http://localhost:8983/solr/mlt?q=id:123456&mlt.match.include=false
I expect to get some documents with the field 'description' similar to the 'description' field of document with id=123456, but I obtain an empty response:
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">31</int>
</lst>
<result name="response" numFound="0" start="0"/>
</response>
Description field is filled in every document.
I've tried to:
changing the tokenizer of the field type text_general from StandardTokenizerFactory to WhitespaceTokenizerFactory (as suggested here)
using the MoreLikeThisComponent instead of MoreLikeThisHandler, with a query like:
http://localhost:8983/solr/select?q=id:123456&mlt=true&mlt.fl=description
Changing the mintf (0,1,..), mindf (0,1,..), qf (description) and boost (true, false) params in default values in schema.xml as well as in queries (with both /select and /mlt).
Nothing of above works. Solr works perfectly with all other query types. I have no clue about what can be wrong, as I've followed all guides and tutorial I've found (also a lot of answers here). Any ideas about?

Time of SolrRecord being added to Index from Nutch

I am running Solr 5.4.1 and Nutch 1.11
I am also using Apache Nifi, and particularly the GetSolr processor.
I understand that the tstamp in my SolrRecord is the time at which the value in the index was fetched.
The challenge I have, is for the GetSolr process to work in NiFi unattended, I need to provide a date field to filter on. If I use tstamp, it will only populate my dataflow the first time, after which the tstamp filter excludes future values, as it is looking at the index time, and not the time that the record was ingested into Solr.
So my question is: how can I include a field in my SolrRecord at the time of bin\nutch index that will include the timestamp of insertion into Solr, not fetching by the crawler.
I think you would have two options...
You could add a new date field in your Solr schema.xml with a default value of NOW:
<field name="timestamp" type="date" indexed="true" stored="true" default="NOW" multiValued="false"/>
You could use the TimestampUpdateProcessorFactory:
https://lucene.apache.org/solr/5_4_1/solr-core/org/apache/solr/update/processor/TimestampUpdateProcessorFactory.html
In solrconfig.xml you would add this to an update chain:
<updateRequestProcessorChain name="add-timestamp-field">
<processor class="solr.TimestampUpdateProcessorFactory">
<str name="fieldName">timestamp</str>
</processor>
</updateRequestProcessorChain>
If using the update chain, the add-timestamp-field chain needs to be enabled:
<initParams path="/update/**">
<lst name="defaults">
<str name="update.chain">add-timestamp-field</str>
</lst>
</initParams>

Solr Facet Search-Spell check

I'm usign Solr facet search on a column of database. It successfully returns the data:
<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields">
<lst name="tags">
<int name="lol">58</int>
<int name="scienc">58</int>
<int name="photo">34</int>
<int name="axiom">27</int>
<int name="geniu">14</int>
</lst>
</lst>
<lst name="facet_dates"/>
<lst name="facet_ranges"/>
</lst>
I want to make sure that only complete words are counted. In the above example you can see counts for'scienc' and 'geniu' that should be for 'science' and 'genius'. How can I achieve this? Can I incorporate spell checking feature?
This probably has to do with the underlying fieldType that you have associated with your tags field. The field value is most likely being stemmed or having other analyzers associated with it. I would suggest one of two things:
Remove the stemming and/or other processing to prevent the words from appearing as partial.
(Recommended) Create a separate field tags_facet with fieldType="string" in your schema.xml and use a copyField directive to copy the values feed into your original tags field. Then facet on this new tags_facet field.
Use the copyField feature of Solr to copy the original field to one with a string fieldType. If the values are a set of words, instead of string, you could use a whitespace tokenised fieldtype (without ngrams of course.)

Full text search means convert everything to text?

Solr uses Lucene's Full text search. Does it mean I have to convert everything to text?
For example, I have fileds like:
<field name="rollno" type="int" indexed="true" stored="true"/>
<field name="name" type="string" indexed="true" stored="true"/>
And a document based on these fileds,
<doc>
<field name="id">1</field>
<field name="rollno">32</field>
<field name="name">John Milton</field>
</doc>
And I have to convert them all to text like this?
<copyField source="name" dest="text"/>
<copyField source="rollno" dest="text"/>
And my Search Handler as,
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">100</int>
<str name="df">text</str>
</lst>
</requestHandler>
Please clarify my doubt.
You need not convert everything to text.
It depends upon the Content of the field.
You would usually use text field type for fields having more content in it.
With text field you can a lot handling to make the content searchable.
e.g.
- Can apply lower case filter to make your searches case insensitive
- Can apply synonyms to mark synonyms like vehicle, auto etc
- Can apply Stemming to make words searchable by roots e.g. bank banking etc
- And much more word delimiter for At&t and Att etc ...
You don't want usually the same anaylsis to be applied to all the fields.
e.g. you don't want the stemmer to be applied on person name or author names as it may be incorrect matches.
For fields which are integer and string you can still search on them without marking them as text if the fields are indexed.
Copy field would need to copy all your content to a field with a single field type.
If you don't want to use copy field use can use edismax parser and still search on mulitple fields.
<requestHandler name="standard" class="solr.SearchHandler" default="true">
<lst name="defaults">
<str name="defType">edismax</str>
<str name="echoParams">explicit</str>
<str name="qf">rollno name</str>
<str name="q.alt">*:*</str>
</lst>
</requestHandler>
In schema.xml file you set the defaultSearchField which is used to search query by default if no fieldName is setted.
<defaultSearchField>text</defaultSearchField>
The df parameter in the requestHandler is used for the similar purpose. It overrides the default field defined in the schema.xml file.
If you add multiple fields to this parameter using copyField, then you can search over all fields regardless of their types.
So, when you create your query as follows, it searches on the default field.
http://localhost:8080/solr/select/?q=searchText
If you want to search in specific field then you should create your query as follows. Following query will search on the rollno field.
http://localhost:8080/solr/select/?q=rollno:32
You got this wrong. copyField doesn´t convert it to a text. It copies the value from field named name to field named text. This is normally used to have a field, which contains all of your values. This field is normally your default search field. Let me explain why this is done:
If you have your 2 field posted above, you have to declare which one is your default search field. Lets say name. If you now query the server by using normal expressions without any query syntax, than only the field name will be searched. But normally you want to have the field rollno to be searched too. To do this without any query syntax you declare another field, in this case named text. Now you copy the values from field name and field rollno to the field text and define it as default search field. If you now search for John Milton or 42 the document will be found. Hopefully this can help you a bit.

Facet a multivalue field (Solr powered Tag Cloud pt II)

This is related to this: Solr powered Tag Cloud
However i decided to create another question since it's different from the original scope of the first question.
Here's the deal, I ve managed to index a multivalued field with multiple words for a tag cloud:
<arr name="words">
<str>builders_NNS</str>
<str>builders_NNS</str>
<str>buildings_NNS</str>
<str>buildings_NNS</str>
<str>construction_NN</str>
<str>construction_NN</str>
<str>green_JJ</str>
<str>green_JJ</str>
</arr>
But when I facet on the query with simple parameters:
&facet=true&facet.field=words&facet.mincount=1
It fails to facet them correctly, it doesn't sum up the values... Do i need to send another extra parameter since it's a multivalued field? Response from Solr once i apply the faceting:
<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields">
<lst name="words">
<int name="builders_NNS">1</int>
<int name="buildings_NNS">1</int>
<int name="construction_NN">1</int>
<int name="green_JJ">1</int>
</lst>
</lst>
</lst>
My field is defined as follows:
<field name="words" type="string" indexed="true" stored="true" multiValued="true" />
And i'm using Solr 1.4, Thanks!
Answering my own question in here:
Solr is not capable (at least version 1.4, I haven't migrated to 3.2 yet and see if it's possible) to facet multivalued fields that are repeated in the same document, however it correctly groups repeated valued from different multivalued fields. Therefore the approach I was taking isn't possible (just yet).

Resources