Working with Highlights on Solr 6.4.1 - solr

I am running Solr 6.4.1 on a Windows 7 machine, with Chrome for testing query URLs currently.
I have set up and got working an index on a set of test documents - a small number of of webpages saved as Docx files in a folder. I can get basic queries working and am now trying to get highlighting working.
I have not modified the schema in any way - simply indexed the folder into a Core called test.
The following query and highlights as I expect:
http://localhost:8983/solr/test/select?hl=on&hl.fl=meta_author&q=steven&wt=xml&fl=meta_author
and returns
...<lst name="highlighting">
<lst name="C:\Users\steven\Documents\Indexing\Dungeon Arena Building.docx">
<arr name="meta_author">
<str><em>steven</em></str>
</arr>
</lst>...
However if I change the fields try and highlight where the term is found in the name of the document it does not work in this way.
http://localhost:8983/solr/test/select?hl=on&hl.fl=dc_title&q=gothic&wt=xml&fl=dc_title
returns
...<lst name="highlighting">
<lst name="C:\Users\steven\Documents\Indexing\Basic Gothic Dungeon.docx"/>
<lst name="C:\Users\steven\Documents\Indexing\Dungeon Arena Building.docx"/>
</lst>...
The results are correct but it does not highlight the identified data fields.
Are there some rules around the available fields that can be highlighted or do I need to amend something in the schema?
For context I aim to bring over all the file content into the index so that I can then present back the match in context of the surrounding text for the users to see.

check whether the field is stored for dc_title .
In your schema your field should look like(field type can be different, as you defined, but set stored=true), after modification, reindex doc and search again.
<field name="dc_title" type="text_general" indexed="true" stored="true"/>

Related

Solr Authentication

I have my Solr 4.3 instance running on a tomcat server with Nutch crawling my local filesystem and Solr storing the indexes.
When a user searches, I need Solr to filter out certain docs based on the type of user.
Say I have a directory structure like so:
dir1
|------dir_userA----files
|------dir_userB----files
|------Public-------files
So I only want the search to return results from directories that a particular user has access to.
Is this possible?
Solr does not have document-level security, so you would have to retrieve and index access control lists for each document. Then you need to apply a filter query to every search and pass in the user's security group and/or username.
Let's say your document is indexed like this, where the values for the multivalued field "access" is determined at index time by the actual permissions on the file:
<doc>
<field name="id">42</field>
<field name="name">Products.xlsx</field>
<field name="title">Product list</field>
<field name="content">...</field>
<field name="access">COMPANY\Marketing</field>
<field name="access">COMPANY\CustomerService</field>
</doc>
Then you can decorate the query request handler with a default filter query parameter in solrconfig.xml:
<requestHandler name="/select" class="solr.SearchHandler">
<defaults>
<str name="fq">access:"COMPANY\Everyone"</str>
</default>
</requestHandler>
Now searches by default will not return the Products.xlsx document, since the default 'user' that is impersonated (namely "COMPANT\Everyone") does not appear in the "access" field. But as soon as you pass in an actual user's group to override the default filter query, Solr will return the document:
/solr/collection1/select?q=content:"product x"&fq=access:"COMPANY\Marketing"
Of course when the permissions change, the index must be updated as soon as possible to reflect this.

SOLR Updated docs missing from query

(Still a newbie; more questions)
I'm performing atomic updates on some SOLR 4 records via HTTP GET calls. This is working correctly after I fixed up some problems with my URLs.
But my original problem is still present: After I update a document, my search queries are no longer finding my updated docs.
Do I need to re-index an updated document? Do atomic updates cause a document to fall out of the index?
example:
I can search with this:
http://solrfarm.gateway.cco:8983/solr/records/select/?q=firstName:(tomas) recordType:(myrectype)&rows=100
and I get XML that looks like:
<doc>
<str name="id">CollName-7276748</str>
<str name="system">OHM Liens</str>
<long name="_version_">1464208859225653248</long>
<bool name="optout">false</bool>
</doc>
I want to change the optout value to "true" and that is happening with a URL that looks like this:
http://prodsolr01.cco:8983/solr/records/update?stream.body=%3Cadd%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3ECollName-7276748%3C/field%3E%3Cfield%20name=%22optout%22%20update=%22set%22%20%3Etrue%3C/field%3E%3C/doc%3E%3C/add%3E&commit=true
Decoded and formatted:
stream.body=
<add>
<doc>
<field name="id">CollName-7276748</field>
<field name="optout" update="set" >true</field>
</doc>
</add>
&commit=true
But, now when I run my original query, my record does not get returned.
If I search for the record explicitly, I get the record:
http://solrfarm.gateway.cco:8983/solr/records/select/?q=id:(%22CollName-7276748%22)%20&rows=100
So I'm confused as to why an updated record is no longer found by my query. Do I need to pass in all the original fields to my update command (i.e. the "firstName" and "lastName" fields that were indexed originally)?
Shouldn't it be enough to just perform the update?
Again, I'm a newbie and I'm probably not "getting" something basic, so all help is appreciated.

Solr Search not working after dataimport successful

I am new in Solr. I have tried DataImport using a Oracle Database. The data gets successfully imported. When I try to search with query:
qt=standard
q=*
I get good results. But when I do a specific search, the results are empty showing no documents. The logger is empty and there are NO errors displayed.
Ok! I got it.
I observed that when I am using some pre-defined fields of schema.xml, the search on those fields are working fine. But when I defined some fields of my own, the result was still NOTHING.
Then I looked into "solr-config.xml's" "/select" request handler. There is a line
<str name="df">text</str>
which says that "txt" is the only field which is searchable. But then how does it searches the other fields?
Answer lies in "schema.xml's"
"<copyField>"
tag. The fields present by default are copied into "text" which makes them searchable. Hence if you want your defined field as searchable, just define your field and add it in copyField tag. ;)
TLDR Version: Define your fields as type="text" to start off. If you have a field called "product", add <field name="product" type="text" indexed="true" stored="true" /> to the default schema.xml inside the <fields> tag and you should be done. To search using the select request-handler, use q=<field_name>:<text_to_look_for> or q=*:* to show all documents.
There are a few mistakes you're making here. I'll be explaining using the 'select' request handler.
The format for a query is ?q=<field_name>:<text_to_look_for>. So if you want to return all the values matching all the fields, you'd say q=*:*
And if you were to look for the word "iPod" in the field "product" your query would be q=product:iPod
Another thing to keep in mind is that if in schema.xml, say if you specify the field product as type="string" which maps to class="solr.StrField", the query (<text_to_look_for>) should precisely match the value in the index, since Solr doesn't tokenize the StrField by default, i.e., ipod will not return results if your index holds it as iPod. If you need it to return it still, you could use the type="text" in schema.xml (the fieldType definition is present already in the default schema.xml.) The "text" fieldType has several analyzers(one analyzer ignores case) and tokenizers(tokenizer splits up the words in the field and indexes them so that if you search for a particular word, say "ipod", it would match the value "iPod 16GB White").
Regarding your own answer, the <str name="df">text</str> specifies the default field to search in, i.e, if you just said q=iPod, it would look in this field. The objective of this field called text is to hold all the other fields in the document, so that you could just search in this field and know that some or the other field in this document would match your query, thereby you wouldn't need to search in a specific field if you don't know what field you're expecting the value to be in.

Solr query must match all words/tokens in a field

I have a text-field called name in my schema.xml. A query q=name:(organic) returns the following documents:
<doc>
<str name="id">ontology.category.1483</str>
<str name="name">Organic Products</str>
</doc>
<doc>
<str name="id">ontology.keyword.4896</str>
<str name="name">Organic Stores</str>
</doc>
This is perfectly right in a normal Solr Search, however I would like to construct the query so that it doesn't return anything because 'organic' only matches 1 of the 2 words available in the field.
A better way to say it could be this: Only return results if all tokens in the field are matched. So if there are two words (tokens) in a field and I only match 1 ('organic', 'organics','organ' etc.) I shouldn't get a match because only 50% of the field has been searched on.
Is this possible in Solr? How do I construct the query?
you are probably using StandardTokenizerFactory (or something similar), one solution is to use KeywordTokenizerFactory and issue a phrase query and then only perfect matches will work. Of course remember other filters you might want to use (like LowerCaseFilterFactory etc). Note that: "stores organic" will not match your doc either
Due to time contraints, I had to resort to the following (hacky) solution.
I added the term count to the index via a DynamicField field called tc_i.
<dynamicField name="*_i" type="int" indexed="true" stored="true"/>
Now at query time I count the terms and append it to the query, so q=name:(organic) becomes q=name:(organic) AND tc_i:(1) and this won't return documents for "organic stores" / "organic products" obviously because their tc_i fields are set at 2 (two words).

Querying Solr without specifying field names

I'm new to using Solr, and I must be missing something.
I didn't touch much in the example schema yet, and I imported some sample data. I also set up LocalSolr, and that seems to be working well.
My issue is just with querying Solr in general. I have a document where the name field is set to tom. I keep looking at the config files, and I just can't figure out where I'm going awry. A bunch of fields are indexed and stored, and I can see the values in the admin, but I can't get querying to work properly. I've tried various queries (http://server.com/solr/select/?q=value), and here are the results:
**Query:** ?q=tom
**Result:** No results
**Query:** q=\*:\*
**Result:** 10 docs returned
**Query:** ?q=*:tom
**Result:** No results
**Query:** ?q=name:tom
**Result:** 1 result (the doc with name : tom)
I want to get the first case (?q=tom) working. Any input on what might be going wrong, and how I can correct it, would be appreciated.
Set <defaultSearchField> to name in your schema.xml
The <defaultSearchField> Is used by
Solr when parsing queries to identify
which field name should be searched in
queries where an explicit field name
has not been used.
You might also want to check out (e)dismax instead.
I just came across to a similar problem... Namely I have defined multiple fields (that did not exist in the schema.xml) to describe my documents, and want to search/query on the multiple fields of the document, not only one of them (like the "name" in the above mentioned example).
In order to achieve this, I have created a new field ("compoundfield"), where I then put/copyField my defined fields (just like the "text" field on the schema.xml document that comes with Solr distribution). This results in something like this:
coumpoundfield definition:
<field name="compoundfield" type="text_general" indexed="true" stored="false" multiValued="true"/>
defaultSearchField:
<!-- field for the QueryParser to use when an explicit fieldname is absent -->
<defaultSearchField>compoundfield</defaultSearchField>
<!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
<solrQueryParser defaultOperator="OR"/>
<!-- copyField commands copy one field to another at the time a document
is added to the index. It's used either to index the same field differently,
or to add multiple fields to the same field for easier/faster searching. -->
<!-- ADDED Fields -->
<copyField source="field1" dest="compoundfield"/>
<copyField source="field2" dest="compoundfield"/>
<copyField source="field3" dest="compoundfield"/>
This works fine for me, but I am not sure if this is the best way to make such a "multiple field" search...
Cheers!
It seems that a DisMax parser
is the right thing to use for this end.
Related stackoverflow thread here.
The current solution is deprecated in newer versions of lucene/solr. To change the default search field either use the df parameter or change the field that is in:
<initParams
path="/update/**,/query,/select,/tvrh,/elevate,/spell,/browse">
<lst name="defaults">
<str name="df">default_field</str>
</lst>
</initParams>
inside the solrconfig.xml
Note I am using a non-managed schema and solr 7.0.0 at the time of writing
Going through the solr tutorial is definitely worth your time:
http://lucene.apache.org/solr/tutorial.html
My guess is that the "name" field is not indexed, so you can't search on it. You'd need to change your schema to make it indexed.
Also make sure that your XML actually lines up with the schema. So if you are adding a field named "name" in the xml, but the schema doesn't know about it, then Solr will just ignore that field (ie it won't be "stored" or "indexed").
Good luck
Well, despite of setting a default search field is quite usefull i don't understand why don't you just use the solr query syntax:
......./?q=name:tom
or
......./?q=:&fq=name:tom

Resources