Azure Search Design Question: Omit result if User has seen this result - azure-cognitive-search

I'm trying to design a solution where I don't have to use the SQL Server Database to answer a question: Show me Azure Index search results where the user has never seen this search result.
I can keep track of user document "views" in my SQL database, but how do I extend this functionality to Azure Search Index queries?
I mean I could do a $filter where document id is not in (1,2,3,etc), or I could filter the Index results before the user ever sees them from the server.
I'm just wondering if there's a more clever way to do this?
Thanks for your help!

Best way to achieve this is the first option you mentioned, once the first query comes on that user session, you can save which document ids were returned and then create a filter to exclude those ids for subsequent queries on the same session.

Related

How to filter autocomplete

I am feeding Azure Search with data from multi-tenant database, so every document in the index has a property TenantId. For searching, aggregations, suggestions I always filter by "TenantId eq 'xxx'" depending on the user calling it.
However for autocomplete it is not possible to filter, so if it returns "something", the tenant in context might not have "something" in his data. Any way to overcome this?
This feature is actively being developed and will be completed before the Autocomplete API reaches General Availability. I'll update this thread once we deploy the change so you can try it.

How to know when a put on Cloud Datastore in App Engine reaches Milestone B?

I have an application that uses Cloud Datastore via App Engine to save data.
I need to refresh the clients when an object is put on the database. To do it, after the object is put on the database, the server sends a sync message to the clients. The clients read the sync message and does a query to the server. The server does a Query to return the new result.
The problem is that when the Query is done, the put object doesn't appears on the query results. Reading the documentation, I suppose that the reason is that the put isn't on the Milestone B, see https://cloud.google.com/appengine/articles/transaction_isolation, because another later call object appears.
How can I know when a put reaches a "Milestone B"? If it isn't possible to know it, how can I do this logic (refresh clients after put)?
You can ensure up-to-date query results by using an ancestor query, or, if you know the key of the specific entity you need to retrieve, you can fetch it by key rather than using a query.
This page discusses the trade-offs of using ancestor queries.
The data do not appear in the result of your query because the indexes have not been updated yet.
There is some latency before the indexes will be updated and unfortunately there is no way to know when this will happen.
The only way to handle this case is to use the entity's key, that is the only index that guarantees to be updated as soon the entity it's stored.
https://cloud.google.com/appengine/docs/java/datastore/entities

Best practictes - Should I store the URL of the link in the index?

Today I don't store the URL of my pages in the index of Azure Search. I am still coping what is the best solution to solve this. So, when a search is returned from Azure Search, all I receive is the Id and no URL to the pages :/.
So, to retrieve the URL, what would be a proper way to solve this?
Store the urls in the index and retrieve it with the search;
Do an additional query to the database with the data returned from Azure Search and retrieve the URL.
???
THANKS
You definitely want to store enough information in the search index to avoid an extra hit to a different database/store if possible.
If your URLs follow a consistent pattern and only a part changes (e.g. the document id or something like that), you can store only the variable part and construct the final URL when rendering results. If your URLs cannot be turned into a pattern, you can just store the whole URL in a field in the Azure Search index.
When storing data used for presentation (URLs, external keys, etc.) it's a good idea to ensure you disable all options related to fast search/filtering (searchable, filterable, sortable, facetable, etc.) and only leave retrievable enabled. That way you minimize the use of resources caused by the extra field, but you have the data at hand to avoid an extra roundtrip during results rendering.

How can I put my custom URLs to SOLR results

I am trying to index my database data with SOLR and I am successfully indexed it.
What I need is:
I need to put URLs with every results.
The URLs for each result item will be different.
Each result item need to append its item_id (which is available as a field) with its URL.
I am very new to SOLR configurations and SOLR query, so please help to implement a better search result XML.
Thanks in advance.
You can store URL in an additional field (stored=true indexed=false) and then simply retrieve it when you're searching.
Even better if you can compose URLs yourself (if they differ only in ID/primary key) by appending document ID to some fixed URL, that's certainly a better way to go.
That would include altering your page which displays search results.
What kind of application is your Solr integrated with?
Where are those documents of yours stored? In a db? How do you get to them through your application?

Data security in result sets from Elastic Search, Solr or

I need to add full-text search capabilities to my existing database. Of course first turn is to something like Solr or Elastic Search. And the blocking point I’ve got to is – how to securely display results returned from underlying search engine (let’s think about Solr or Elastic Search for now, however any other solution or engine that hit the point are also appreciated).
The tricky context is that I have, for example, in my system Personal Profile records that are to be indexed. One of the fields in personal profile is – manager’s feedback. Normally in the system that field is visible only to employee’s direct manager and higher hierarchy, i.e. ‘manager’ from another branch will not be able to see that field. However, I want that field to be searchable via full text search but only for people who actually can see it.
Now I query Solr for ‘stupid’ (that is query string) and it returns me N documents. When returning that to end-user I’ll remove the ‘Manager’s feedback’ field because end-user is not the manager of given people – but just presence of the document in resultset is already the evidence of ‘stupid’ guys …
The question is – what is workable approach to handle that use-case? Is it possible to plug into Solr/ES with home-grown security filter for outputs?
Caveats:
filtering out only fields do not work because of above mentioned scenario
filtering out complete documents will not work because of
search engine does not tell which fields matched – therefore no way to manually filter resultset by field http://elasticsearch-users.115913.n3.nabble.com/Best-way-to-return-which-field-matched-td2713071.html
even this does work, removing documents from result set will spoil down facets (e.g. number of matches by department) returned by the engine – I’ll have to either recalculate facets manually or they will not match to manually filtered records and will reveal what I actually do not want to show to end users
In Solr you can create multiValued fields. In your case you can use it to store de-normalized values of organization structure.
In described scenario you will create multi valued field ouId (Organization Unit Id) and store employee's ouId and all parent ouIds. In other words you will save allowed ouIds into this field.
In search scenario you will use FilterQuery - fq parameter filtering by ouId of manager.
Example:
..&fq=ouId:12
where 12 is organization unit id of selected manager.
Maybe this is helpful for you https://github.com/salyh/elasticsearch-security-plugin It adds Document level security to elasticsearch.
"Currently for user based authentication and authorization Kerberos/SPNEGO and NTLM are supported through 3rd party library waffle (only on windows servers). For UNIX servers Kerberos/SPNEGO is supported through tomcat build in SPNEGO Valve (Works with any Kerberos implementation. For authorization either Active Directory and generic LDAP is supported). PKI/SSL client certificate authentication is also supported (CLIENT-CERT method). SSL/TLS is also supported without client authentication.
You can use this plugin also without Kerberos/NTLM/PKI but then only host based authentication is available.
As of now two security modules are implemented:
Actionpathfilter: Restrict actions against Elasticsearch on a coarse-grained level like who is allowed to to READ, WRITE or even ADMIN rest api calls
Document level security (dls): Restrict actions on document level like who is allowed to query for which fields within a document"

Resources