Can we use regular expression in solr elevate.xml configuration - solr

We are using below elevate.xml to get desired results in order in solr configuration.
<query text="hotels">
<doc id="14421"/>
</query>
<query text="Hotels">
<doc id="14421"/>
</query>
Now, we got requirement with list of keywords(> 50 words). If I hardcode all these in elevate.xml I can fulfill my requirement. I want to know is there any better approach for this like configuring reqgural expression or any other way.

you could try two options:
putting the elevated ids in the request at query time, see here
this guy created a plugin to generate the elevate.xml from a db, which you maintain, see in here

Related

Solr Result Ranking modifications to a particlar query

I am newbie to Solr and I am trying to build a simple search solution for my website. I am trying to build a feature which is similar to Swifttype's result ranking feature.
i.e., Lets say "ice" results in
Ice
Iceberg
Ice cream
......
so on....
If i want to rank "Ice cream" higher only for query "ice", but when i search for other search terms like "Iceland" default ranking should be maintained. how can i achieve that?
I am using solr v7.5
Thanks in advance...
The Query Elevation Component lets you configure the top results for a given query regardless of the normal Lucene scoring.
More importantly, you will need to configure elevate.xml file which looks like this:
<elevate>
<query text="ice">
<doc id="1" /> //where id=1 is your ice cream document
<doc id="2" />
<doc id="3" />
</query>
</elevate>
Later, during searches you only need to enable it by specifying http param enableElevation=true

Synonyms in Solr Query Results

Whenever I query a string in solr, i want to get the synonyms of field value if there exists any as a part of query result, is it possible to do that in solr
There is no direct way to fetch the synonyms used in the search results. You can get close by looking at how Solr parsed your query via the debugQuery=true parameter and looking at the parsedQuery value in the response but it would not be straightforward. For example, if you search for "tv" on a text field that uses synonyms you will get something like this:
$ curl "localhost:8983/solr/your-core/select?q=tv&debugQuery=true"
{
...
"parsedquery":"SynonymQuery(Synonym(_text_:television _text_:televisions _text_:tv _text_:tvs))",
...
Another approach would be to load in your application the synonyms.txt file that Solr uses and do the mapping yourself. Again, not straightforward,

How to search in Clusterpoint DB for full text match only?

I am executing Search query
<query>
<edu>college</edu>
<studies>mathematics</studies>
</query>
The result is returned even for documents where values for <studies> is “mathematics and statistics”, “geometry in mathematics” etc. But my purpose is to get documents where in tag <studies> is value “mathematics” and nothing else.
Is there a possibility to search by exact text value?
For this purpose you should use exact-match operator ==”” and rule "exact-match=binary" for appropriate tag in policy configuration. Your query could look like:
<query>
<team>college</team>
<studies>=="mathematics"</studies>
</query>
This will return results where "studies" tag contains only "mathematics" as value and no other words.
Check out their docs. Exact match clause should be the one (I believe): http://docs.clusterpoint.com/wiki/Search_query_syntax#Exact_match

Merging Solr query results through SolrNet

I'm using Sorl v3.6.1 and have successfully managed to index data as well as using Apache Tika to index binary items. I'm using SolrNet to pull this data out. However I have an issue whereby I want to link 2 results together.
Now consider the following XML (this is just for illustration purposes):
<doc>
<id>263</id>
<title>This is the title</title>
<summary>This is the summary<summary/>
<binary_id>994832</binary_id>
</doc>
<doc>
<id>994832</id>
<title>This is the title</title>
<summary>This is the summary<summary/>
<text>this is the contents of the binary</text>
</doc>
Is it possible (through SolrNet) to merge the two above results together so when a user searches for This is the contents of the binary it also returns the data in the first item?
In my example you can see the first item contains the id of the binary (994832) so my initial thoughts are that I need to do 2 queries and somehow merge them?
Not really sure about how to do this so any help would be greatly appreciated.
You can try to do something funky with a join kind of query, however beware of performance impacts. Here is my post from some time ago where I was trying to do something similar.
solr grouping based on all index values
Alternatively, a better solution, IF and only IF you can massage the data a bit before going in. Would be to assign the same ID to all documents that need to be retrieved as a group, per your example, this would be to add binaryid field to the second doc and assign 994832 value to it. You would be able to very cleanly use Solr grouping to group the items as one and then group sorting to only return the item that you want.

How to update Solr documents on the Solr server side with custom handler / plugin

I have a core with millions of records.
I want to add a custom handler which scan the existing documents and update one of the field based on a condition (age>12 for example).
I prefer doing it on the Solr server side for avoiding sending millions of documents to the client and back.
I was thinking of writing a solr plugin which will receive a query and update some fields on the query documents (like the delete by query handler).
I was wondering whether there are existing solutions or better alternatives.
I was searching the web for a while and couldn't find examples of Solr plugins which update documents (I don't need to extend the update handler).
I've written a plug-in which use the following code which works fine but isn't as fast as I need.
Currently I do:
AddUpdateCommand addUpdateCommand = new AddUpdateCommand(solrQueryRequest);
DocIterator iterator = docList.iterator();
SolrIndexSearcher indexReader = solrQueryRequest.getSearcher();
while (iterator.hasNext()) {
Document document = indexReader.doc(iterator.nextDoc());
SolrInputDocument solrInputDocument = new SolrInputDocument();
addUpdateCommand.clear();
addUpdateCommand.solrDoc = solrInputDocument;
addUpdateCommand.solrDoc.setField("id", document.get("id"));
addUpdateCommand.solrDoc.setField("my_updated_field", new_value);
updateRequestProcessor.processAdd(addUpdateCommand);
}
But this is very expensive since the update handler will fetch again the document which I already hold at hand.
Is there a safe way to update the lucene document and write it back while taking into account all the Solr related code such as caches, extra solr logic, etc?
I was thinking of converting it to a SolrInputDocument and then just add the document through Solr but I need first to convert all fields.
Thanks in advance,
Avner
I'm not sure whether the following is going to improve the performance, but thought it might help you.
Look at SolrEntityProcessor
Its description sounds very relevant to what you are searching for.
This EntityProcessor imports data from different Solr instances and cores.
The data is retrieved based on a specified (filter) query.
This EntityProcessor is useful in cases you want to copy your Solr index
and slightly want to modify the data in the target index.
In some cases Solr might be the only place were all data is available.
However, I couldn't find an out-of-the-box feature to embed your logic. So, you may have to extend the following class.
SolrEntityProcessor and the link to sourcecode
You may probably know, but a couple of other points.
1) Make the entire process exploit all the cpu cores available. Make it multi-threaded.
2) Use the latest version of Solr.
3) Experiment with two Solr apps on different machines with minimal network delay. This would be a tough call :
same machine, two processes VS two machines, more cores, but network overhead.
4) Tweak Solr cache in a way that applies to your use-case and particular implementation.
5) A couple of more resources: Solr Performance Problems and SolrPerformanceFactors
Hope it helps. Let me know the stats despite this answer. I'm curious and your info might help somebody later.
To point out where to put custom logic, I would suggest to have a look at the SolrEntityProcessor in conjunction with Solr's ScriptTransformer.
The ScriptTransformer allows to compute each entity after it is extracted from the source of a dataimport, manipulate it and add custom field values before the new entity is written to solr.
A sample data-config.xml could look like this
<?xml version="1.0" encoding="UTF-8" ?>
<dataConfig>
<script>
<![CDATA[
function calculateValue(row) {
row.put("CALCULATED_FIELD", "The age is: " + row.get("age"));
return row;
}
]]>
</script>
<document>
<entity name="sep" processor="SolrEntityProcessor"
url="http://localhost:8080/solr/your-core-name"
query="*:*"
wt="javabin"
transformer="script:calculateValue">
<field column="ID" name="id" />
<field column="AGE" name="age" />
<field column="CALCULATED_FIELD" name="update_field" />
</entity>
</document>
</dataConfig>
As you can see, you may perform any data transformation you like and is expressible in javascript. So this would be a good point to express your logic and transformations.
You say one constraint maybe age > 12. I would handle this via the query attribute of the SolrEntityProcessor. You could write query=age:[* TO 12] so that only records with an age up to 12 would be read for the update.

Resources