I'm developing a project where I want to have hierarchical facets.
I have an index with a complex structure, like:
Index
-field1
-List
And othercomplexfield contains another list with anothercomplexfield inside.
I'd like to be able to give to users the possibility to:
Have the facets of field1.
When one is selected, I'd like to give the user the possibility to select one of the values of a certain field of "othercomplexfield" while filtering by the selected field1.
I can do that.
I'd then like to give the user the possibility to select one of the possible values of "anothercomplexfield" while filtering by field1 AND by the selected othercomplexfield.
The difficulty here is that I don't want every possible facet value, but only the ones CONTAINED by the othercomplexfield that I'm filtering for.
So far I had to do this inside of c# and i did not find a way to write a query that gives me back from azure search the distinct values that I want.
Someone has a similar problem?
Did I explain the problem well enough?
I saw no clear guidance online, everything is easy if you only have level 1 facets but when you get into nested objects it's not that clear anymore.
I'm not sure I fully understand the context of your question. What I can tell you is that filters only apply at the document level and not at the complex collection level. What I mean by that is that if a filter matches an item in a complex collection, the entire document will be returned, not just the item in the complex collection that matched. The same is true for facets--facets will count all documents in the result set that match the filter and can't be scoped down just to parts of documents. With that, it seems like having this logic in your application like you mentioned might be the best approach for your current index schema.
We do have this old blog post that talks about one way to implement hierarchical facets with Azure Cognitive Search which may give you some other ideas on how you could implement the functionality you're looking for: https://learn.microsoft.com/en-us/archive/blogs/onsearch/multi-level-taxonomy-facets-in-azure-search
Scenario.
I have a document in the database which has thousands of item in
'productList' as below.
here
All the object in array 'productList' has the same shape and same fields with different values.
Now I want to search in the following way.
when a user writes 'c' against 'Ingrediants' field, the list will show all 'Ingrediants' start with alphabet 'c'.
when a user write 'A' against 'brandName' field, the list will show
all 'brandName' start with alphabet 'A'.
please give an example using this to search for it, either it is by
creating an index(json,text).
creating a Search index (design document) or
using views etc
Note: I don't want to create an index at run-time(I mean index could be defined by Cloudant dashboard) I just want to query it, by this library in the application.
I have read the documentation's, I got the concepts.
Now, I want to implement it with the best approach.
I will use this approach to handle all such scenarios in future.
Sorry if the question is stupid :)
thanks.
CouchDB isn't designed to do exactly what you're asking. You'd need one index for Ingredient, and another for Brand Name - and it isn't particularly performant to do both at once. The best approach I think would be to check out the Mango query feature http://docs.couchdb.org/en/2.0.0/api/database/find.html, try the queries you're interested in and then add indexes as required (it has the explain plan to help make this more efficient).
I have an application that contains a set of text documents that users can search for. Every user must be able to search based on the text of the documents. What is more, users must be able to define custom tags and associate them to a document. Those tags are used in two ways:
1)Users must be able to search for documents based on specific tag ids.
2)There must be facets available for the tags.
My solution was adding a Mutivalued field in each document to pose as an array that contains the tagids that this document has been tagged with. So far so good. I was able to perform queries based on text and tagids ( for example text:hi AND tagIds:56 ).
My question is, would that solution work in production mode in an environment that users add but also remove tags from the documents ? Remember , I have to have the data available in real time, so whenever a user removes/adds a tag I have to reindex that document and commit immediately. If that's not a good solution, what would be an alternative ?
Stackoverflow uses Solr - this is in case if you doubt Solr abilities in production mode.
And although I couldn't find much information on how they have implemented tags, I don't think your approach sounds wrong. Yes, tagged documents will have to be reindexed (that means a slight delay) but other than that I don't see anything wrong with it.
I'm going to write simple news site on redis with supporting followers.
I can't imagine how can I organize users timeline like in twitter. I read about Retwis ( http://redis.io/topics/twitter-clone ), but its feed creating method seems stupid. What if I want to remove entries? I'll should to remove all entry references from followers feeds. What if I already do not follow some users?
There are several ways to attack what you describe using a bit of imagination, here are some examples that address your questions:
What if I want to remove entries?
One could mantain a set such as post:$postid:users for each post, holding all the userids that may have the post in their feeds; when the post is to be deleted one just has to extract all members from this set and iterate through the ids to remove it from each uid:$userid:posts set; speaking of which you would have to turn that last one into a set instead of a list like the original article suggests in order to be able to extract and remove individual items but that is trivial, the logic is pretty similar.
What if I already do not follow some users?
When the feed is being generated for each individual user you have to necessarily iterate and read each post:$postid key, from which you have access to the author userid; so before showing the post you read this id and look it up in the uid:$userid:following set, if it's there we show the post, if it's not we delete it from uid:$userid:posts and don't show it.
In a nutshell, this is what you have to keep in mind in order to build this kind of logic in redis:
You'll need many commands, but that's ok, Redis is supposed to be fast enough to handle it well.
Data will repeat, but that is also ok; it may look insane for someone with a relational DBMS background to store a set of users for each post if each user already has a set with their posts, but this is the only way around building relationships in a non-relational data store like redis.
Generally speaking think of sets and sorted sets when designing something relational in Redis.
With redis you get to do everything yourself, but once you get your head around it it's actually pretty powerful.
For example I have Documents A, B, C. User 1 must only be able to see Documents A, B. User 2 must only be able to see Document C. Is it possible to do it in SOLR without filtering by metadata? If I use metadata filter, everytime there are access right changes, I have to reindex.
[update 2/14/2012] Unfortunately, in the client's case, change is frequent. Data is confidential and usually only managed by the owners which are internal users. Then the specific case is they need to be able to share those documents to certain external users and specify access levels for those users. And most of the time this is an adhoc task, and not identified ahead of time
I would suggest storing the access roles (yes, its plural) as document metadata. Here the required field access_roles is a facet-able multi-valued string field.
Doc1: access_roles:[user_jane, manager_vienna] // Jane and the Vienna branch manager may see it
Doc2: access_roles:[user_john, manager_vienna, special_team] // Jane, the Vienna branch manager and a member of special team may see it
The user owning the document is a default access role for that document.
To change the access roles of a document, you edit access_roles.
When Jane searches, the access roles she belongs to will be part of the query. Solr will retrieve only the documents that match the user's access role.
When Jane (user_jane), manager at vienna office (manager_vienna) searches, her searches go like:
q=mainquery
&fq=access_roles:user_jane
&fq=access_roles:manager_vienna
&facet=on
&facet.field=access_roles
which fetches all documents which contains user_jane OR manager_vienna in access_roles; Doc1 and Doc2.
When Bob, (user_bob), member of a special team (specia_team) searches,
q=mainquery
&fq=access_roles:user_bob
&fq=access_roles:special_team
&facet=on
&facet.field=access_roles
which fetches Doc2 for him.
Queries adapted from http://wiki.apache.org/solr/SimpleFacetParameters#Multi-Select_Faceting_and_LocalParams
Might want to check the Document level Security patches.
https://issues.apache.org/jira/browse/SOLR-1872
https://issues.apache.org/jira/browse/SOLR-1834
I think my approach would be similar to #aitchnyu's answer. I would however NOT use individual users in the meta data.
If you create groups for each document, then you will have to reindex for security reason less often.
For a given document, you might have access_roles: group_1, group_3
In this way, the group_1 and group_3 always retain rights to the document. However, I can vary what groups each user belongs to and adjust the query accordingly.
When the query then is generated, it always passes as a part of the query the user's groups. If I belong to group_1 and group_2, my query will look like this:
q=mainquery
&fq=access_roles:group_1
&fq=access_roles:group_2
Since the groups are dynamically generated in the query, I simply remove a user from the group, and when a new query is issued, they will no longer include the removed group in the query. So removing the user from group_1 would new create a query like this:
q=mainquery
&fq=access_roles:group_2
All documents that require group 1 will no longer be accessible to the user.
This allows most changes to be done in real-time w/out the need to reindex the documents. The only reason you would have to reindex for security reasons is if you decided that a particular group should no longer have access to a document.
In many real-world scenarios, that should be a relatively uncommon occurrence. It seems much more likely that HR documents will always be available to the HR department, however a specific user may not always be part of the HR group.
Hope that helps.
You can implement your security model using Solr's PostFilter. For more information see http://searchhub.org/2012/02/22/custom-security-filtering-in-solr/
Note: you should probably cache your access rights otherwise performance will be terrible.
Keeping in mind that solr is pure text based search engine,indexing system,to facilitate fast searching, you should not expect RDMS style capabilities from it. solr does not provide security for documents being indexed, you have to write such an implementation if you want. In that case you have two options.
1)Just index documents into solr and keep authorization details into RDBMS.Now query solr for your search and collect the results returned.Now fire another query to DB for the doc ids returned by solr to see if the user has an access to them or not.Filter out those documents on which user in action has no access.You are done ! But not really, your problem starts from here only.Assume, what if all results returned by solr gets filtered out ? (Assuming you are not accessing all the documents at a time,means you are retrieving top 1000 results only from solr result set,otherwise you can not get fast search) You have to query solr again for next bunch of result set and have to iterate these steps until you get enough results to display.
2)Second approach to this is to index authorization meta data along with document in solr.Same as aitchnyu has explained.But to answer your query for document sharing to an external user,along with usergroup and role detail, you index these external user's userid into access_roles field or you can just add an another field to your schema 'access_user' too. Now you can modify search queries for external user's sharing to include access_user field into your filter query.
e.g
q=mainquery
&fq=access_roles:group_1
&fq=access_user:externaluserid
Now the most important thing, update to an indexed documents.Well its off course tedious task, but with careful design and async processing along with solrs partial document update feature(solr 4.0=>), you can achieve reasonably good TPS with solr. If you are using solr <4.0 you can have separate systems for both searching and updates and with care full use of load balancer and master slave replication strategies you will have smile on your face !
There are no built in mechanisms for Solr that I am aware of that will allow you to control access to documents without maintaining the rights in the metadata. The approach outlined by aitchnyu seems reasonable if you keep it a true role level and not assign user specific permissions to a document. That way you can assign roles to users and this will grant them the ability to see documents in the index. Granted you will still need to reindex documents when the roles change, but hopefully you can identify most of the needed roles ahead if time and reduce the need for frequent reindexing.