I have a scenario where I have a productdatabase in solr and a branddatabase in MySql. In the solr productdatabase I have a field named brandid where I refer to the Mysql primary key from the branddatabase. Now I would like to join the branddatabase for each solr searchquery and groups the result seperatly from the product results. I thought about a second solr database where I save the branddata and then join it on every query, but I would like to have each brand only one time and not merged together with the product results in the same resultset. A facette-style result for the brands is my goal. Anyone has a pointer how I could achieve this kind of results in my xml/json?
The resultset how I would like to have it in pseudo solr code:
<results>
<products>
<product>
...
</product>
<product>
...
</product>
<product>
...
</product>
<product>
...
</product>
</products>
<brands>
<brand>
...
</brand>
<brand>
...
</brand>
</brands>
</results>
If you only need to serve additional fields from brand database and you do not need to search/filter on them then you could apply a simple faceting on brandid and populate the presentation fields in a post processing step from DB directly/memory cache/key value store...
and use facet.mincount=1 to eliminate the brands without any products in the current query.
Can you use a higher-level language?
I currently do something similar, but I use Java as the glue. The Java application takes in requests, goes against solr using solrj, retrieves all the results, including the facets, I take that response and query against mysql to get more information, I merge all the data in the java layer and then construct the xml/json response.
solrj
other higher-level languages are offered:
Ruby,PHP,Java,Scala,Python,.Net,Perl,Javascript
Related
I am newbie to Solr and I am trying to build a simple search solution for my website. I am trying to build a feature which is similar to Swifttype's result ranking feature.
i.e., Lets say "ice" results in
Ice
Iceberg
Ice cream
......
so on....
If i want to rank "Ice cream" higher only for query "ice", but when i search for other search terms like "Iceland" default ranking should be maintained. how can i achieve that?
I am using solr v7.5
Thanks in advance...
The Query Elevation Component lets you configure the top results for a given query regardless of the normal Lucene scoring.
More importantly, you will need to configure elevate.xml file which looks like this:
<elevate>
<query text="ice">
<doc id="1" /> //where id=1 is your ice cream document
<doc id="2" />
<doc id="3" />
</query>
</elevate>
Later, during searches you only need to enable it by specifying http param enableElevation=true
We are using below elevate.xml to get desired results in order in solr configuration.
<query text="hotels">
<doc id="14421"/>
</query>
<query text="Hotels">
<doc id="14421"/>
</query>
Now, we got requirement with list of keywords(> 50 words). If I hardcode all these in elevate.xml I can fulfill my requirement. I want to know is there any better approach for this like configuring reqgural expression or any other way.
you could try two options:
putting the elevated ids in the request at query time, see here
this guy created a plugin to generate the elevate.xml from a db, which you maintain, see in here
An xml has 2 sets of similar tags with different data.
<address>
<door_num>100</door_num>
<street>hundred street</street>
<city>XYZ</city>
</address>
<address>
<door_num>200</door_num>
<street>two hundred street</street>
<city>ABC</city>
<active>1</active>
</address>
What is the best way to index this? Search by door_num 100 and city XYZ must return the document; whereas search by door_num 100 and city ABC must not return any document. Storing as multivalues does not help here. Also note that, the second set of address with door_num 200 may or may not be present in the xml. Please suggest
Model this data as nested documents, the Address info would be stored in nested docs, and then you can query them so that both door_num and city need to match on the same nested doc.
Regarding how to actually get them into the index, you have several options:
write some java (or groovy or any other jvm lang) code with SolrJ, build your docs on the client side, and index them.
if you don't like java, you can still write any other lang code on the client side, and build your docs as xml/json that Solr can ingest, index them.
if you don't want to write any code at all, try with DIH and XPathEntityProcessor, you might achieve all you need.
I just started learning Solr and the reason I learned it is because I want to do advance search queries (something considered simple in SQL) but on large amount of data. From what I read up to now (using solarium) I can index, update, select and delete but only on a single kind of data (relation/table). What I would like to do is to be able to perform operation between table (like SQL would in his own way). Here is an example scenario of what I could be working on.
Here are samples of data based on the relation above.
<root>
<!-- ID for Solr --->
<id>some_id</id>
<table>house</table>
<house_id>1</house_id>
<house_name>Gryffindor</house_name>
</root>
<root>
<!-- ID for Solr --->
<id>some_other_id</id>
<table>student</table>
<student_id>1</student_id>
<firstname>Albus</firstname>
<lastname>Dumbledore</lastname>
<house_id>1</house_id>
</root>
<root>
<!-- ID for Solr --->
<id>some_different_id</id>
<table>battle</table>
<student_id_1>1</student_id_1>
<student_id_2>3</student_id_2>
</root>
An example search query would be "full name of students from different houses who fought each other and the name of their respective house.
In SQL I would do something like:
SELECT * FROM houses housA, students studA, houses housB, students studB, battles
WHERE studA.id_house == housA.house_name AND studB.id_house == housB.house_name AND
((studA.id == battles.id_1 AND studB.id == battles.id_2) OR (studA.id == battles.id_2 AND studB.id == battles.id_1));
And the solution would be every field (all three tables) for Dumbledore vs Snape and Potter vs Who.
Can it be done with Solr?
You have to think about Solr backwards starting from the queries. And then you flatten the information to match your needs. In your case, it seems the entity in Solr would be a fight and then you flatten all the other information (house, name, etc) into that record. That allows you to do queries like "what house had most matches", etc.
Solr also support nested documents, but their use is not quite the same as with database joins and does not seem to match your use case. I am just mentioning it there for you to be aware of it.
I have a core with millions of records.
I want to add a custom handler which scan the existing documents and update one of the field based on a condition (age>12 for example).
I prefer doing it on the Solr server side for avoiding sending millions of documents to the client and back.
I was thinking of writing a solr plugin which will receive a query and update some fields on the query documents (like the delete by query handler).
I was wondering whether there are existing solutions or better alternatives.
I was searching the web for a while and couldn't find examples of Solr plugins which update documents (I don't need to extend the update handler).
I've written a plug-in which use the following code which works fine but isn't as fast as I need.
Currently I do:
AddUpdateCommand addUpdateCommand = new AddUpdateCommand(solrQueryRequest);
DocIterator iterator = docList.iterator();
SolrIndexSearcher indexReader = solrQueryRequest.getSearcher();
while (iterator.hasNext()) {
Document document = indexReader.doc(iterator.nextDoc());
SolrInputDocument solrInputDocument = new SolrInputDocument();
addUpdateCommand.clear();
addUpdateCommand.solrDoc = solrInputDocument;
addUpdateCommand.solrDoc.setField("id", document.get("id"));
addUpdateCommand.solrDoc.setField("my_updated_field", new_value);
updateRequestProcessor.processAdd(addUpdateCommand);
}
But this is very expensive since the update handler will fetch again the document which I already hold at hand.
Is there a safe way to update the lucene document and write it back while taking into account all the Solr related code such as caches, extra solr logic, etc?
I was thinking of converting it to a SolrInputDocument and then just add the document through Solr but I need first to convert all fields.
Thanks in advance,
Avner
I'm not sure whether the following is going to improve the performance, but thought it might help you.
Look at SolrEntityProcessor
Its description sounds very relevant to what you are searching for.
This EntityProcessor imports data from different Solr instances and cores.
The data is retrieved based on a specified (filter) query.
This EntityProcessor is useful in cases you want to copy your Solr index
and slightly want to modify the data in the target index.
In some cases Solr might be the only place were all data is available.
However, I couldn't find an out-of-the-box feature to embed your logic. So, you may have to extend the following class.
SolrEntityProcessor and the link to sourcecode
You may probably know, but a couple of other points.
1) Make the entire process exploit all the cpu cores available. Make it multi-threaded.
2) Use the latest version of Solr.
3) Experiment with two Solr apps on different machines with minimal network delay. This would be a tough call :
same machine, two processes VS two machines, more cores, but network overhead.
4) Tweak Solr cache in a way that applies to your use-case and particular implementation.
5) A couple of more resources: Solr Performance Problems and SolrPerformanceFactors
Hope it helps. Let me know the stats despite this answer. I'm curious and your info might help somebody later.
To point out where to put custom logic, I would suggest to have a look at the SolrEntityProcessor in conjunction with Solr's ScriptTransformer.
The ScriptTransformer allows to compute each entity after it is extracted from the source of a dataimport, manipulate it and add custom field values before the new entity is written to solr.
A sample data-config.xml could look like this
<?xml version="1.0" encoding="UTF-8" ?>
<dataConfig>
<script>
<![CDATA[
function calculateValue(row) {
row.put("CALCULATED_FIELD", "The age is: " + row.get("age"));
return row;
}
]]>
</script>
<document>
<entity name="sep" processor="SolrEntityProcessor"
url="http://localhost:8080/solr/your-core-name"
query="*:*"
wt="javabin"
transformer="script:calculateValue">
<field column="ID" name="id" />
<field column="AGE" name="age" />
<field column="CALCULATED_FIELD" name="update_field" />
</entity>
</document>
</dataConfig>
As you can see, you may perform any data transformation you like and is expressible in javascript. So this would be a good point to express your logic and transformations.
You say one constraint maybe age > 12. I would handle this via the query attribute of the SolrEntityProcessor. You could write query=age:[* TO 12] so that only records with an age up to 12 would be read for the update.