Solr Authentication - solr

I have my Solr 4.3 instance running on a tomcat server with Nutch crawling my local filesystem and Solr storing the indexes.
When a user searches, I need Solr to filter out certain docs based on the type of user.
Say I have a directory structure like so:
dir1
|------dir_userA----files
|------dir_userB----files
|------Public-------files
So I only want the search to return results from directories that a particular user has access to.
Is this possible?

Solr does not have document-level security, so you would have to retrieve and index access control lists for each document. Then you need to apply a filter query to every search and pass in the user's security group and/or username.
Let's say your document is indexed like this, where the values for the multivalued field "access" is determined at index time by the actual permissions on the file:
<doc>
<field name="id">42</field>
<field name="name">Products.xlsx</field>
<field name="title">Product list</field>
<field name="content">...</field>
<field name="access">COMPANY\Marketing</field>
<field name="access">COMPANY\CustomerService</field>
</doc>
Then you can decorate the query request handler with a default filter query parameter in solrconfig.xml:
<requestHandler name="/select" class="solr.SearchHandler">
<defaults>
<str name="fq">access:"COMPANY\Everyone"</str>
</default>
</requestHandler>
Now searches by default will not return the Products.xlsx document, since the default 'user' that is impersonated (namely "COMPANT\Everyone") does not appear in the "access" field. But as soon as you pass in an actual user's group to override the default filter query, Solr will return the document:
/solr/collection1/select?q=content:"product x"&fq=access:"COMPANY\Marketing"
Of course when the permissions change, the index must be updated as soon as possible to reflect this.

Related

Working with Highlights on Solr 6.4.1

I am running Solr 6.4.1 on a Windows 7 machine, with Chrome for testing query URLs currently.
I have set up and got working an index on a set of test documents - a small number of of webpages saved as Docx files in a folder. I can get basic queries working and am now trying to get highlighting working.
I have not modified the schema in any way - simply indexed the folder into a Core called test.
The following query and highlights as I expect:
http://localhost:8983/solr/test/select?hl=on&hl.fl=meta_author&q=steven&wt=xml&fl=meta_author
and returns
...<lst name="highlighting">
<lst name="C:\Users\steven\Documents\Indexing\Dungeon Arena Building.docx">
<arr name="meta_author">
<str><em>steven</em></str>
</arr>
</lst>...
However if I change the fields try and highlight where the term is found in the name of the document it does not work in this way.
http://localhost:8983/solr/test/select?hl=on&hl.fl=dc_title&q=gothic&wt=xml&fl=dc_title
returns
...<lst name="highlighting">
<lst name="C:\Users\steven\Documents\Indexing\Basic Gothic Dungeon.docx"/>
<lst name="C:\Users\steven\Documents\Indexing\Dungeon Arena Building.docx"/>
</lst>...
The results are correct but it does not highlight the identified data fields.
Are there some rules around the available fields that can be highlighted or do I need to amend something in the schema?
For context I aim to bring over all the file content into the index so that I can then present back the match in context of the surrounding text for the users to see.
check whether the field is stored for dc_title .
In your schema your field should look like(field type can be different, as you defined, but set stored=true), after modification, reindex doc and search again.
<field name="dc_title" type="text_general" indexed="true" stored="true"/>

Solr click scoring implementation

after searching and searching over the net, i've found a possible open-source solution for the click-count-popularity in solr (=does not require a payd version of lucid work search).
In my next two answers i will try to solve the problem in a easy way and in a way a little bit complex...
But first some pre-requisites.
We suppose to google-like scenario:
1. the user will introduce some terms in a textfield and push the search button
2. the system (a custom web-app coupled with solr) will produce a web page with results that are clickable
3. the user will select one of the results (e.g. to access to the details) and will inform the system to change the 'popularity' of the selected result
The very easy way.
We define a field called 'popularity' in solr schema.xml
<field name="popularity" type="long" indexed="true" stored="true"/>
We suppose the user will click on the document with id 1234, so we (=the webapp) have to call solr to update the popularity field of the document with id 1234 using the url
http://mysolrappserver/solr/update?commit=true
and posting in the body
<add>
<doc>
<field name="id">**1234**</field>
<field name="popularity" update="inc">1</field>
</doc>
</add>
So, each time the webapp will query something to solr (combining/ordering the solr 'boost' field with our custom 'popularity' field) we will obtain a list ordered also by popularity
The more complex idea is to update the solr index tracing not only the user selection but also the search terms used to obtain the list.
First of all we have to define a history field where to store the search terms used:
<field name="searchHistory" type="text_general" stored="true" indexed="true" multiValued="true"/>
Then we suppose the user searched 'something' and selected from the result list the document with id 1234. The webapp will call the solr instance at the url
http://mysolrappserver/solr/update?commit=true
adding a new value to the field searchHistory
<add>
<doc>
<field name="id">**1234**</field>
<field name="searchHistory" update="add">**something**</field>
</doc>
</add>
finally, using the solr termfreq function in every following query we will obtain a 'score' that combined with 'boost' field can produce a sorted list based of click-count-popularity (and the history of search terms).
This is interesting approach however I see some disadvantages in it:
Overall items storage will grow dramatically with each and every search.
You're assuming that choosing specific item is 100% correct and it wasn't done by mistake or for brief only. In this way you might get wrong search results along the way.
I suggest only to increment the counter or even to maintain relative counter based on the other results that the user didn't click it.

Solr Search not working after dataimport successful

I am new in Solr. I have tried DataImport using a Oracle Database. The data gets successfully imported. When I try to search with query:
qt=standard
q=*
I get good results. But when I do a specific search, the results are empty showing no documents. The logger is empty and there are NO errors displayed.
Ok! I got it.
I observed that when I am using some pre-defined fields of schema.xml, the search on those fields are working fine. But when I defined some fields of my own, the result was still NOTHING.
Then I looked into "solr-config.xml's" "/select" request handler. There is a line
<str name="df">text</str>
which says that "txt" is the only field which is searchable. But then how does it searches the other fields?
Answer lies in "schema.xml's"
"<copyField>"
tag. The fields present by default are copied into "text" which makes them searchable. Hence if you want your defined field as searchable, just define your field and add it in copyField tag. ;)
TLDR Version: Define your fields as type="text" to start off. If you have a field called "product", add <field name="product" type="text" indexed="true" stored="true" /> to the default schema.xml inside the <fields> tag and you should be done. To search using the select request-handler, use q=<field_name>:<text_to_look_for> or q=*:* to show all documents.
There are a few mistakes you're making here. I'll be explaining using the 'select' request handler.
The format for a query is ?q=<field_name>:<text_to_look_for>. So if you want to return all the values matching all the fields, you'd say q=*:*
And if you were to look for the word "iPod" in the field "product" your query would be q=product:iPod
Another thing to keep in mind is that if in schema.xml, say if you specify the field product as type="string" which maps to class="solr.StrField", the query (<text_to_look_for>) should precisely match the value in the index, since Solr doesn't tokenize the StrField by default, i.e., ipod will not return results if your index holds it as iPod. If you need it to return it still, you could use the type="text" in schema.xml (the fieldType definition is present already in the default schema.xml.) The "text" fieldType has several analyzers(one analyzer ignores case) and tokenizers(tokenizer splits up the words in the field and indexes them so that if you search for a particular word, say "ipod", it would match the value "iPod 16GB White").
Regarding your own answer, the <str name="df">text</str> specifies the default field to search in, i.e, if you just said q=iPod, it would look in this field. The objective of this field called text is to hold all the other fields in the document, so that you could just search in this field and know that some or the other field in this document would match your query, thereby you wouldn't need to search in a specific field if you don't know what field you're expecting the value to be in.

Searching multi-valued fields in the same position

Let's say we have a document like this:
<arr name="pvt_rate_type">
<str>CORPORATE</str>
<str>AGENCY</str>
</arr>
<arr name="pvt_rate_set_id">
<str>1</str>
<str>2</str>
</arr>
Now I do a search where I want to return the document only if it contains pvt_rate_set_id = 1 AND pvt_rate_type = AGENCY in the same position in their mutli-valued fields so the above document should NOT be returned (because pvt_rate_set_id 1 has a pvt_rate_type of CORPORATE)
Is this possible at all in SOLR ? or is my schema badly designed ? how else would you design tat schema to allow for the searching I want?
This may not be available Out of the Box.
You would need to modify the schema to have fields with pvt rate type as field name and id as its value
e.g.
CORPORATE=1
AGENCY=2
This can be achieved by having dynamic fields defined.
e.g.
<dynamicField name="*_pvt_rate_type" type="string" indexed="true" stored="true"/>
So you can input data as corporate_pvt_rate_type or agency_pvt_rate_type with the respective values.
The filter queries will be able to match the exact mappings fq=corporate_pvt_rate_type:1
Unfortunately Solr does not seem to support this.
Another way to do this in Solr would be to store a concatenated string field type_and_id with a delimiter (say comma) separating the type and the id and query like:
q=type_and_id:AGENCY%2C1
(where %2C is the URL encoding for comma).

Is it necessary for a unique key to be a uuid in Solr?

Can a unique key in Solr/Lucene schema be text_general instead? I have tried that but Solr doesn't overwrite the data, it simply adds another row hence duplicating the data.
I have commented out following from solrconfig.xml
<searchComponent name="elevator" class="solr.QueryElevationComponent" >
<!-- pick a fieldType to analyze queries -->
<str name="queryFieldType">string</str>
<str name="config-file">elevate.xml</str>
</searchComponent>
My schema.xml has
<uniqueKey>_id</uniqueKey>
<field name="_id" type="text_general" indexed="true" stored="true" default="NEW"/>
Any help would be greatly appreciated.
You can use whatever type you want for the uniqueKey field. As you can read from the documentation:
The declaration can be used to inform Solr that there is a
field in your index which should be unique for all documents. If a
document is added that contains the same value for this field as an
existing document, the old document will be deleted.
It is not mandatory for a schema to have a uniqueKey field.
Note that if you have enabled the QueryElevationComponent in
solrconfig.xml it requires the schema to have a uniqueKey of type
StrField. It cannot be, for example, an int field.
What's important is that your uniqueKey field is unique, meaning that the same document has the same identifier. Only that way the replace if existing mechanism can work. Using an uuid field type you'd never replace a document because you would have a different id for each one, automatically.

Resources