Is there an easy way to make Solr reference different indexes based on a set of credentials submitted with the request? - solr

I'd like to have a single instance of Solr, protected by some sort of authentication, that operated against different indexes based on the credentials used for that authentication. The type of authentication is flexible, although I'd prefer to work with open standards (existing or emerging), if possible.
The core problem I'm attempting to solve is that different users of the application (potentially) have access to different data stored in it, and a user should not be able to search over inaccessible data. Building an index for each user seems the easiest way to guarantee that one user doesn't see forbidden data. Is there, perhaps, an easier way? One that would obviate the need for Solr to have a way to map users to indexes?
Thanks.

The Solr guys have a pretty exhaustive overview of what is possible, see http://wiki.apache.org/solr/MultipleIndexes

Related

Storing Application Configuration in AD

I am trying to write a small application that will run on all the domain controllers at my company.
Since all the DCs need to have the same, fairly static config, I thought it might be sane to store the configuration in AD itself. I imagine writing a GUI config editor that manipulates the AD based config.
At first glance, Application Partitions would seem like the right tool for the job.
The first question is: is this just generally a terrible idea? Would pro sysadmins get angry at doing this? Or will this require some high-inertia operation like schema changes?
The second question is: is there a specific object type that would be well suited for storing either JSON blobs or key-value pairs?
And the last question is: Are there better alternatives?
I found a post from a decade ago which touches on this, but things can change rather a lot after 3 major OS releases.
This only might make sense if you were planning on storing configuration for a user on the user's AD object. But even then, anyone in your organization who has access to update AD will be able to change those values in ways that your application may not expect.
is there a specific object type that would be well suited for storing either JSON blobs or key-value pairs?
No AD attribute is designed for that. At best, you might be able to find a string attribute that has a big enough max length that you could store some JSON value. But performance would be terrible if you want to search for a JSON value using that attribute.
Are there better alternatives?
Yes. The best solution is to use a dedicated database for your application. You can structure it the way you need, and restrict access to only your application.

What's the Best Way to Query Multiple REST API's and Deliver Relevant Search Results?

My organization has multiple databases that we need to provide search results for. Right now you have to search each database individually. I'm trying to create a web interface that will query all the databases at once and sort the results based upon relevance.
Some of the databases I have direct access to. Others I can only access via a REST API.
My challenge isn't knowing how to query each individual database. I understand how to make API calls. It's how to sort the results by relevance.
On the surface it looks like Elasticsearch would be a good option. Its reverse indexing system seems like a good solution to figuring out which results are going to be the most relevant to our users. It's also super fast.
The problem is that I don't see a way (so far) to include results from an external API into Elasticsearch so it can do its magic.
Is there a better option that I'm not aware of? Or is it possible to have Elasticsearch evaluate the relevance of results from an external API while also including data from its own internal indices?
I did find an answer, although nobody replied. :\
The answer is to use the http_poll plugin with logstash. This will make an API call and injest the results into Elasticsearch.
Another option could be some form of microservices orchestration for the various API calls then merge them into a final result set.

SOLR QueryElevationComponent for Multi-tenant Support

Newbie question so please be nice. :)
Basically we need to implement editorial boosting for a multi-tenant SOLR environment wherein a pre-defined query from a user would always bring a certain set of documents at the top of the results.
A couple of challenges we have include:
Given a single elevate.xml, we cannot indicate that a certain query text is intended for a particular tenant. Despite the existence of the tenantId in the index, there is no indication of that id in the elevate.xml file. We've thought of concatenating the ID to the query text (i.e. ipod_tenantID1) but I suppose the concatenation would not be traceable in the main query 'q'.
We need to make updates to the elevate.xml seamless to the other active tenants. Is it correct that updating elevate.xml would require a SOLR server restart? If yes, is there a way to work around it?
so you are using a single core/collection and the multitenancy is enforced by a fq=customer_id:A right?
Well, what about enforcing the multitenancy via a one collection per customer? This way each one can has its own conf (including elevate stuff).
About your second question, I did not check but probably a reload would be enough. If you go with the proposed solution, other tenants are not disrupted with the reload, as you deal with different collections.

Is OData suitable for multi-tenant LOB application?

I'm working on a cloud-based line of business application. Users can upload documents and other types of object to the application. Users upload quite a number of documents and together there are several million docs stored. I use SQL Server.
Today I have a somewhat-restful-API which allow users to pass in a DocumentSearchQuery entity where they supply keyword together with request sort order and paging info. They get a DocumentSearchResult back which is essentially a sorted collection of references to the actual documents.
I now want to extend the search API to other entity types than documents, and I'm looking into using OData for this. But I get the impression that if I use OData, I will face several problems:
There's no built-in limit on what fields users can query which means that either the perf will depend on if they query a indexed field or not, or I will have to implement my own parsing of incoming OData requests to ensure they only query indexed fields. (Since it's a multi-tenant application and they share physical hardware, slow queries are not really acceptable since those affect other customers)
Whatever I use to access data in the backend needs to support IQueryable. I'm currently using Entity Framework which does this, but i will probably use something else in the future. Which means it's likely that I need to do my own parsing of incoming queries again.
There's no built-in support for limiting what data users can access. I need to validate incoming Odata queries to make sure they access data they actually have permission to access.
I don't think I want to go down the road of manually parsing incoming expression trees to make sure they only try to access data which they have access to. This seems cumbersome.
My question is: Considering the above, is using OData a suitable protocol in a multi-tenant environment where customers write their own clients accessing the entities?
I think it is suitable here. Let me give you some opinions about the problems you think you will face:
There's no built-in limit on what fields users can query which means
that either the perf will depend on if they query a indexed field or
not, or I will have to implement my own parsing of incoming OData
requests to ensure they only query indexed fields. (Since it's a
multi-tenant application and they share physical hardware, slow
queries are not really acceptable since those affect other customers)
True. However you can check for allowed fields in the filter to allow the operation or deny it.
Whatever I use to access data in the backend needs to support
IQueryable. I'm currently using Entity Framework which does this, but
i will probably use something else in the future. Which means it's
likely that I need to do my own parsing of incoming queries again.
Yes, there is a provider for EF. That means if you use something else in the future you will need to write your own provider. If you change EF probably you took a decision to early. I don´t recommend WCF DS in that case.
There's no built-in support for limiting what data users can access. I
need to validate incoming Odata queries to make sure they access data
they actually have permission to access.
There isn´t any out-of-the-box support to do that with WCF Data Services, right. However that is part of the authorization mechanism that you will need to implement anyway. But I have good news for you: do it is pretty easy with QueryInterceptors. simply intercepting the query and, based on the user privileges. This is something you will have to implement it independently the technology you use.
My answer: Considering the above, WCF Data Services is a suitable protocol in a multi-tenant environment where customers write their own clients accessing the entities at least you change EF. And you should have in mind the huge effort it saves to you.

openam with database

I have a requirement that openam should access users and groups from a MySQL database. In the openam GUI for New Data Store -> Database Repository (Early Access), I could see some configurations related to this. But I am not aware about how to map fields from two or three of MySQL tables (users and groups) to the corresponding attributes of openAM. Also what are the mandatory or optional fields for keeping user and group information? Somebody please point me to good documentation on this.
Also I have a couple of basic queries.
Is it possible to keep policy information in database?
Is it possible to create users, groups and assign policy information from a web application deployed differently (through JSP / servlet). Does the OpenSSO APIs allows to do this.
Thanks.
The Database data store is very primitive (hence it's Early Access), and it is very likely that it won't fit your very specific needs. In that case you can always implement your own Data Store implementation (which can be tricky, I know..), here's a guide for it:
http://docs.forgerock.org/en/openam/10.0.0/dev-guide/index.html#chap-identity-repo-spi
Per your questions:
1) the policies are stored in the configuration store, and I'm not aware of any ways to store them in DataBase
2) yes, there are some APIs which let you perform some changes with the identities stored in the data store, but OpenSSO/OpenAM is not an identity management tool, so it might not be a perfect fit for doing it.
This is pretty simple but sparse documentation makes it a bit tough to do. Check out: http://rahul-ghose.blogspot.com/2014/05/openam-database-connectivity-with-mysql.html
Basically you need to create a MySql database and 2 tables for this. Include all variables you see in the User Configuration dropdown in the Mysql user table, it is okay to remove a few attributes.

Resources