I am considering using Solr in a multi-tenant application and I am wondering if there are any best practices or things I should watch out for?
One question in particular is would it make sense to have a Solr Core per tenant. Are there any issues with have a large number of Solr Cores?
I am considering use a core per tenant because I could secure each core separately.
Thanks
Solr Cores are an excellent idea for multitenant, particularly as they can be managed at runtime (so not requiring a server restart). You shouldn't run into too many problems with performance for having multiple Solr cores, but be aware the performance of one core will be impacted by the work on other cores - they're probably going to be sharing the same disk.
I can see why you might want to give direct API access - for example if each 'user' is a Drupal site or similar, for a shared hosting type environment. The best thing would be to secure the different URLs, e.g. if you had /solr/admin/cores, /solr/client1 for a client core, and /solr/client2 for another, you would have three different authentications, one for your admin, and one each for your tenants. This is done in the container (Jetty, Tomcat etc.), take a look at the general Solr Security page: http://wiki.apache.org/solr/SolrSecurity - you'll want to setup a basic access login for each path in the same way.
You would no more use a separate table in a database for each tenant than you would a solr core for each tenant.
If you think of a core like a database table and organize your project in such a way that each core represents an object in your problem space then you can better leverage solr.
Where solr shines in when you need to index text and then search it quickly. If you are not doing that you might as well use a relational database.
Also from your question about securing solr for each tenant , I hope you're not suggesting allowing your logged in users to access the solr output directly? Your users should not be able to directly access your solr instance.
Good luck.
That's OK .. you can not use cache(inbuild) properly and for your requirements. You add permission bit in which you can change the query component in which you can. It should work properly according to the permission. There is a bitwise operation also available for this. Make use of this for your needs.
Related
I have a Spring Boot/React application. I have a list of users in my database I will have populated already from LDAP.
As part of a form, I need to allow users to specify a list of users. Since they could be searching from (and technically specifying as well), up to 400,000 users (most will be in the 10k or less range), I'm assuming I'd need to do this both client and server-side.
Does anyone have any recommendations on the approach or technologies?
I'm not using a small amount of data, but I don't want to over-engineer it either (tips are mostly for server-side, but any are welcome).
If you are using hibernate as the ORM in your application, you may also checkout Hibernate Search. This seems to serve your purpose as I feel that searching through a list of users can be done using a normal text based index. Hibernate search leverages Lucene, which is suitable for text based indexing and searching.
While another answer is good and works perfectly fine when you have a small set of data but be aware of the few design issue with it.
Lucene is not distributed and you can't easily scale it to multiple horizontal machines without duplicating the whole index, which is perfectly fine when you have a small set of data and in-fact it's pretty fast as there will be no network call(in case of elasticsearch, it will be).
If you want to build a stateless application that is easy to HS(horizontally scalablele) then going with Lucene will not be helpful as it stateful and you need to create Lucene index before your newly spawned app-server finished local indexing in Lucene.
Elasticsearch(ES) is rest-based and is written in JAVA and has very good java-client which you can easily use for simple to complex use-cases.
Last but not the least, please go through the STOF answer of none other than shay banon, creator of Elasticsearch, who explains why he created ES in first place :) and which will give more trade-off and insights to choose a best solution for your use-case.
Newbie question so please be nice. :)
Basically we need to implement editorial boosting for a multi-tenant SOLR environment wherein a pre-defined query from a user would always bring a certain set of documents at the top of the results.
A couple of challenges we have include:
Given a single elevate.xml, we cannot indicate that a certain query text is intended for a particular tenant. Despite the existence of the tenantId in the index, there is no indication of that id in the elevate.xml file. We've thought of concatenating the ID to the query text (i.e. ipod_tenantID1) but I suppose the concatenation would not be traceable in the main query 'q'.
We need to make updates to the elevate.xml seamless to the other active tenants. Is it correct that updating elevate.xml would require a SOLR server restart? If yes, is there a way to work around it?
so you are using a single core/collection and the multitenancy is enforced by a fq=customer_id:A right?
Well, what about enforcing the multitenancy via a one collection per customer? This way each one can has its own conf (including elevate stuff).
About your second question, I did not check but probably a reload would be enough. If you go with the proposed solution, other tenants are not disrupted with the reload, as you deal with different collections.
We have a system that enables users to create applications and store data on their application. We want to separate the index of each application. We create a core for each application and search on the given application when user make query. Since there isn't any relation between the applications, this solution could perform better than the storing all index together.
I have two questions related to this.
Is this a good solution? If not could you please suggest any better solution?
Is there a limit on the number of core that I can create on Solr? There will be thousands maybe more application on the system.
Yes, it COULD be a good solution, as always depends on the specific use case\
Look at this jira issue where Erick mentions a 10k core system...so
it seems it could work for you, should need to assess the hardware etc
Can Solr be run on Azure?
I know this thread is old, but I wanted to share our two cents. We are running SOLR on Azure with no big problems. We've created a startup task to install Java and create the deployment, and we have a SOLR instance on each web role. From there on, it's a bit of magic figuring out which master/slave configuration, but we've solved that too.
So yes, it can be done with a little bit of work. Most importantly, the startup task is key. Index doesn't have to be stored anywhere but on local disk (Local Resource), because indexing is part of the role startup. If you have to speed it up and a few minute differences are acceptable, you can have the index synced with a blob storage copy every once in a while by the master. But in this case you need to implement a voting algorithm so that the SOLR instances don't override each other.
We'll be posting info on our blog, but I don't want to post links in answers for old threads because I'll look like a spammer :o)
Abit of a dated question, but wanted to provide an updated answer. You can run Apache Solr on Azure. Azure offers IaaS (Infrastructure as a service), which is raw Virtual Machines running Linux/Windows. You can choose to set up your entire Solr cluster on a set of VMs and configure SolrCloud and Zookeeper on them.
If you are interested, you could also check out Solr-as-a-Service or Hosted Solr solutions as they remove the headache of setting up SolrCloud on Azure. There's a lot that goes into running, managing and scaling a search infrastructure and companies like Measured Search help reduce time and effort spent on doing that. You get back that time in developing features and functionality that your applications or products need.
More specifically, if you are doing it yourself, it can take many days to weeks to give the proper love and care it needs. Here's a paper that goes into the details of the comparison between doing it yourself and utilizing a Solr-as-a-Service solution.
https://www.measuredsearch.com/white-papers/why-solr-service-is-better-than-diy-solr-infrastructure/
Full disclosure, I run product for Measured Search that offers Cloud Agnostic Solr-as-a-Service. Measured Search enables you to standup a Solr Cluster on Azure within minutes.
For the new visitor their is now two Solr instances available via . We tested them and they are good. But ended up using the Azure Search service which so far looks very solid.
I haven't actually tried, but Azure can run Java, so theoretically it should be able to run Solr.
This article ("Run Java with Jetty in Windows Azure") should be useful.
The coordinator for "Lucene.Net on Azure" also claims it should run.
EDIT : The Microsoft Interop team has written a great guide and config tips for running Solr on Azure!
Azure IaaS allows you to create linux based VMs, flavors including Ubuntu, SUSE and CentOS. This VM comes with local root storage that exists only for the VM is rebooted.
However, you can add additional volumes on which data will persist even through reboots. Your solr data can be stored here.
I'd like to have a single instance of Solr, protected by some sort of authentication, that operated against different indexes based on the credentials used for that authentication. The type of authentication is flexible, although I'd prefer to work with open standards (existing or emerging), if possible.
The core problem I'm attempting to solve is that different users of the application (potentially) have access to different data stored in it, and a user should not be able to search over inaccessible data. Building an index for each user seems the easiest way to guarantee that one user doesn't see forbidden data. Is there, perhaps, an easier way? One that would obviate the need for Solr to have a way to map users to indexes?
Thanks.
The Solr guys have a pretty exhaustive overview of what is possible, see http://wiki.apache.org/solr/MultipleIndexes