Solr query must match all words/tokens in a field - solr

I have a text-field called name in my schema.xml. A query q=name:(organic) returns the following documents:
<doc>
<str name="id">ontology.category.1483</str>
<str name="name">Organic Products</str>
</doc>
<doc>
<str name="id">ontology.keyword.4896</str>
<str name="name">Organic Stores</str>
</doc>
This is perfectly right in a normal Solr Search, however I would like to construct the query so that it doesn't return anything because 'organic' only matches 1 of the 2 words available in the field.
A better way to say it could be this: Only return results if all tokens in the field are matched. So if there are two words (tokens) in a field and I only match 1 ('organic', 'organics','organ' etc.) I shouldn't get a match because only 50% of the field has been searched on.
Is this possible in Solr? How do I construct the query?

you are probably using StandardTokenizerFactory (or something similar), one solution is to use KeywordTokenizerFactory and issue a phrase query and then only perfect matches will work. Of course remember other filters you might want to use (like LowerCaseFilterFactory etc). Note that: "stores organic" will not match your doc either

Due to time contraints, I had to resort to the following (hacky) solution.
I added the term count to the index via a DynamicField field called tc_i.
<dynamicField name="*_i" type="int" indexed="true" stored="true"/>
Now at query time I count the terms and append it to the query, so q=name:(organic) becomes q=name:(organic) AND tc_i:(1) and this won't return documents for "organic stores" / "organic products" obviously because their tc_i fields are set at 2 (two words).

Related

Working with Highlights on Solr 6.4.1

I am running Solr 6.4.1 on a Windows 7 machine, with Chrome for testing query URLs currently.
I have set up and got working an index on a set of test documents - a small number of of webpages saved as Docx files in a folder. I can get basic queries working and am now trying to get highlighting working.
I have not modified the schema in any way - simply indexed the folder into a Core called test.
The following query and highlights as I expect:
http://localhost:8983/solr/test/select?hl=on&hl.fl=meta_author&q=steven&wt=xml&fl=meta_author
and returns
...<lst name="highlighting">
<lst name="C:\Users\steven\Documents\Indexing\Dungeon Arena Building.docx">
<arr name="meta_author">
<str><em>steven</em></str>
</arr>
</lst>...
However if I change the fields try and highlight where the term is found in the name of the document it does not work in this way.
http://localhost:8983/solr/test/select?hl=on&hl.fl=dc_title&q=gothic&wt=xml&fl=dc_title
returns
...<lst name="highlighting">
<lst name="C:\Users\steven\Documents\Indexing\Basic Gothic Dungeon.docx"/>
<lst name="C:\Users\steven\Documents\Indexing\Dungeon Arena Building.docx"/>
</lst>...
The results are correct but it does not highlight the identified data fields.
Are there some rules around the available fields that can be highlighted or do I need to amend something in the schema?
For context I aim to bring over all the file content into the index so that I can then present back the match in context of the surrounding text for the users to see.
check whether the field is stored for dc_title .
In your schema your field should look like(field type can be different, as you defined, but set stored=true), after modification, reindex doc and search again.
<field name="dc_title" type="text_general" indexed="true" stored="true"/>

SOLR Updated docs missing from query

(Still a newbie; more questions)
I'm performing atomic updates on some SOLR 4 records via HTTP GET calls. This is working correctly after I fixed up some problems with my URLs.
But my original problem is still present: After I update a document, my search queries are no longer finding my updated docs.
Do I need to re-index an updated document? Do atomic updates cause a document to fall out of the index?
example:
I can search with this:
http://solrfarm.gateway.cco:8983/solr/records/select/?q=firstName:(tomas) recordType:(myrectype)&rows=100
and I get XML that looks like:
<doc>
<str name="id">CollName-7276748</str>
<str name="system">OHM Liens</str>
<long name="_version_">1464208859225653248</long>
<bool name="optout">false</bool>
</doc>
I want to change the optout value to "true" and that is happening with a URL that looks like this:
http://prodsolr01.cco:8983/solr/records/update?stream.body=%3Cadd%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3ECollName-7276748%3C/field%3E%3Cfield%20name=%22optout%22%20update=%22set%22%20%3Etrue%3C/field%3E%3C/doc%3E%3C/add%3E&commit=true
Decoded and formatted:
stream.body=
<add>
<doc>
<field name="id">CollName-7276748</field>
<field name="optout" update="set" >true</field>
</doc>
</add>
&commit=true
But, now when I run my original query, my record does not get returned.
If I search for the record explicitly, I get the record:
http://solrfarm.gateway.cco:8983/solr/records/select/?q=id:(%22CollName-7276748%22)%20&rows=100
So I'm confused as to why an updated record is no longer found by my query. Do I need to pass in all the original fields to my update command (i.e. the "firstName" and "lastName" fields that were indexed originally)?
Shouldn't it be enough to just perform the update?
Again, I'm a newbie and I'm probably not "getting" something basic, so all help is appreciated.

solr facet field values appear to be generated by solr

I want to facet on a specific field. The field is defined as
<field name="specials_de" type="textgen" indexed="true" stored="true" multiValued="true"/>
Two of the values in specials_de are "Cunard Hamburg Specials" and "Cunard the New Yorker". I want to use these two values as facets, but the solr query returns facet fields with values like
<int name="cunard">11</int>
<int name="new">9</int>
<int name="yorker">9</int>
<int name="hamburg">5</int>
<int name="hamburgspecialscunard">3</int>
<int name="hamburgspecials">2</int>
What am I doing wrong?
Just to clarify: I'm not referring to the counts (11, 9, etc.), but to the names, i.e. "cunard", "new", etc.
Text fields are not suggested to be used for Faceting.
You won't get the desired behavior as the text fields would be tokenized and filtered leading to the generation of multiple tokens which you see from the facets returned as response.
SolrFacetingOverview :-
Because faceting fields are often specified to serve two purposes,
human-readable text and drill-down query value, they are frequently
indexed differently from fields used for searching and sorting:
They are often not tokenized into separate words
They are often not mapped into lower case
Human-readable punctuation is often not removed (other than double-quotes)
There is often no need to store them, since stored values would look much like indexed values and the faceting mechanism is used for
value retrieval.
Try to use String fields and it would be good enough without any overheads.

Searching multi-valued fields in the same position

Let's say we have a document like this:
<arr name="pvt_rate_type">
<str>CORPORATE</str>
<str>AGENCY</str>
</arr>
<arr name="pvt_rate_set_id">
<str>1</str>
<str>2</str>
</arr>
Now I do a search where I want to return the document only if it contains pvt_rate_set_id = 1 AND pvt_rate_type = AGENCY in the same position in their mutli-valued fields so the above document should NOT be returned (because pvt_rate_set_id 1 has a pvt_rate_type of CORPORATE)
Is this possible at all in SOLR ? or is my schema badly designed ? how else would you design tat schema to allow for the searching I want?
This may not be available Out of the Box.
You would need to modify the schema to have fields with pvt rate type as field name and id as its value
e.g.
CORPORATE=1
AGENCY=2
This can be achieved by having dynamic fields defined.
e.g.
<dynamicField name="*_pvt_rate_type" type="string" indexed="true" stored="true"/>
So you can input data as corporate_pvt_rate_type or agency_pvt_rate_type with the respective values.
The filter queries will be able to match the exact mappings fq=corporate_pvt_rate_type:1
Unfortunately Solr does not seem to support this.
Another way to do this in Solr would be to store a concatenated string field type_and_id with a delimiter (say comma) separating the type and the id and query like:
q=type_and_id:AGENCY%2C1
(where %2C is the URL encoding for comma).

Is it necessary for a unique key to be a uuid in Solr?

Can a unique key in Solr/Lucene schema be text_general instead? I have tried that but Solr doesn't overwrite the data, it simply adds another row hence duplicating the data.
I have commented out following from solrconfig.xml
<searchComponent name="elevator" class="solr.QueryElevationComponent" >
<!-- pick a fieldType to analyze queries -->
<str name="queryFieldType">string</str>
<str name="config-file">elevate.xml</str>
</searchComponent>
My schema.xml has
<uniqueKey>_id</uniqueKey>
<field name="_id" type="text_general" indexed="true" stored="true" default="NEW"/>
Any help would be greatly appreciated.
You can use whatever type you want for the uniqueKey field. As you can read from the documentation:
The declaration can be used to inform Solr that there is a
field in your index which should be unique for all documents. If a
document is added that contains the same value for this field as an
existing document, the old document will be deleted.
It is not mandatory for a schema to have a uniqueKey field.
Note that if you have enabled the QueryElevationComponent in
solrconfig.xml it requires the schema to have a uniqueKey of type
StrField. It cannot be, for example, an int field.
What's important is that your uniqueKey field is unique, meaning that the same document has the same identifier. Only that way the replace if existing mechanism can work. Using an uuid field type you'd never replace a document because you would have a different id for each one, automatically.

Resources