Newbie question so please be nice. :)
Basically we need to implement editorial boosting for a multi-tenant SOLR environment wherein a pre-defined query from a user would always bring a certain set of documents at the top of the results.
A couple of challenges we have include:
Given a single elevate.xml, we cannot indicate that a certain query text is intended for a particular tenant. Despite the existence of the tenantId in the index, there is no indication of that id in the elevate.xml file. We've thought of concatenating the ID to the query text (i.e. ipod_tenantID1) but I suppose the concatenation would not be traceable in the main query 'q'.
We need to make updates to the elevate.xml seamless to the other active tenants. Is it correct that updating elevate.xml would require a SOLR server restart? If yes, is there a way to work around it?
so you are using a single core/collection and the multitenancy is enforced by a fq=customer_id:A right?
Well, what about enforcing the multitenancy via a one collection per customer? This way each one can has its own conf (including elevate stuff).
About your second question, I did not check but probably a reload would be enough. If you go with the proposed solution, other tenants are not disrupted with the reload, as you deal with different collections.
Related
I am trying to move my website from SQL server to Azure Search (or at least the core searching functionality). I believe I understand how to get most of the functionality rebuilt but I'm stuck on one feature that is key to my site.
I would like to be able to sort the search results based on the weight on any of a fairly large number of tags. By weight, I mean that I maintain a count of the number of users that have tagged a document with a particular tag.
It looks like you can do this in elasticsearch: (Elastic search - tagging strength (nested/child document boosting)). But that uses features of elasticsearch that aren't exposed in Azure Search.
I don't see a way to use scoring profiles (https://msdn.microsoft.com/en-us/library/azure/dn798928.aspx) to do this either.
The only thing I can see that might work in a limited sense is to add a field for each tag that I want to sort on. This might work for my particular case for now, but in the long run I'd like to make this work for user-defined tags.
Is this possible in the broad sense that is outlined in the elastic search case?
I agree that for right now, the best way to do this would be to have a separate field that is periodically updated with the count of the # of users that have tagged a document. Please note that you can be pretty efficient with this update by just posting this numeric value using merge or mergeOrUpload. If you would like to see this feature added to Azure Search, it would be great if you could cast your vote.
Is there any way through Elastic Search or Lucene metadata, to store a count of how many times a particular document has satisfied queries even though one has not recalled the document for processing.
For example, say you issue a query and get 100 results. You process the first 10 and not go any further. We would like to flag ALL the documents (100) that satisfied the search criteria for later analysis.
Thanks
Currently, Azure Search does not expose this information (and neither does Elasticsearch or Lucene). However, we're working on building better ranking models, and we're thinking about capturing (and potentially exposing) this type of data.
We'd be very interested in learning more about your scenario. Could you email me at eugenesh at the usual Microsoft domain? Thanks!
I am using solr to index reports from DB. I am successful in doing that. However, I also need to track user activity to report whether a document has been read by the user or not. I am aware that Solr is not built to index/keep track user activity, but is there a good approach going about this ?
Any suggestions?
No, as you say there is no support for this in Solr. From a Solr perspective it’s more related to how you build you web-application. I would recommend you to ask yourself this:
When tracking the reading statistics of my users do I need to index that information into Solr too?
The answer to this question depends on if you need to the information to facet, search or use it in the relevance model. Say for example you want to have a facet that allows your users to filter on read or unread documents then of course you need to index this into Solr.
If you only want to present whether or not a document has been read or not (in the web interface) you might as well store this information inside a SQL database fetching it when presenting the results.
I am considering using Solr in a multi-tenant application and I am wondering if there are any best practices or things I should watch out for?
One question in particular is would it make sense to have a Solr Core per tenant. Are there any issues with have a large number of Solr Cores?
I am considering use a core per tenant because I could secure each core separately.
Thanks
Solr Cores are an excellent idea for multitenant, particularly as they can be managed at runtime (so not requiring a server restart). You shouldn't run into too many problems with performance for having multiple Solr cores, but be aware the performance of one core will be impacted by the work on other cores - they're probably going to be sharing the same disk.
I can see why you might want to give direct API access - for example if each 'user' is a Drupal site or similar, for a shared hosting type environment. The best thing would be to secure the different URLs, e.g. if you had /solr/admin/cores, /solr/client1 for a client core, and /solr/client2 for another, you would have three different authentications, one for your admin, and one each for your tenants. This is done in the container (Jetty, Tomcat etc.), take a look at the general Solr Security page: http://wiki.apache.org/solr/SolrSecurity - you'll want to setup a basic access login for each path in the same way.
You would no more use a separate table in a database for each tenant than you would a solr core for each tenant.
If you think of a core like a database table and organize your project in such a way that each core represents an object in your problem space then you can better leverage solr.
Where solr shines in when you need to index text and then search it quickly. If you are not doing that you might as well use a relational database.
Also from your question about securing solr for each tenant , I hope you're not suggesting allowing your logged in users to access the solr output directly? Your users should not be able to directly access your solr instance.
Good luck.
That's OK .. you can not use cache(inbuild) properly and for your requirements. You add permission bit in which you can change the query component in which you can. It should work properly according to the permission. There is a bitwise operation also available for this. Make use of this for your needs.
I'd like to have a single instance of Solr, protected by some sort of authentication, that operated against different indexes based on the credentials used for that authentication. The type of authentication is flexible, although I'd prefer to work with open standards (existing or emerging), if possible.
The core problem I'm attempting to solve is that different users of the application (potentially) have access to different data stored in it, and a user should not be able to search over inaccessible data. Building an index for each user seems the easiest way to guarantee that one user doesn't see forbidden data. Is there, perhaps, an easier way? One that would obviate the need for Solr to have a way to map users to indexes?
Thanks.
The Solr guys have a pretty exhaustive overview of what is possible, see http://wiki.apache.org/solr/MultipleIndexes