How to filter autocomplete - azure-cognitive-search

How to filter autocomplete - azure-cognitive-search

I am feeding Azure Search with data from multi-tenant database, so every document in the index has a property TenantId. For searching, aggregations, suggestions I always filter by "TenantId eq 'xxx'" depending on the user calling it.
However for autocomplete it is not possible to filter, so if it returns "something", the tenant in context might not have "something" in his data. Any way to overcome this?

This feature is actively being developed and will be completed before the Autocomplete API reaches General Availability. I'll update this thread once we deploy the change so you can try it.

Related

Azure Search Design Question: Omit result if User has seen this result

I'm trying to design a solution where I don't have to use the SQL Server Database to answer a question: Show me Azure Index search results where the user has never seen this search result.
I can keep track of user document "views" in my SQL database, but how do I extend this functionality to Azure Search Index queries?
I mean I could do a $filter where document id is not in (1,2,3,etc), or I could filter the Index results before the user ever sees them from the server.
I'm just wondering if there's a more clever way to do this?
Thanks for your help!

Best way to achieve this is the first option you mentioned, once the first query comes on that user session, you can save which document ids were returned and then create a filter to exclude those ids for subsequent queries on the same session.

Conditional read access to DynamoDB table with AWS Amplify

I'm building an application with AWS Amplify, where I have three DynamoDB tables: Users, Posts and Subscriptions.
users can make posts
users subscribe to other users
user A can only see posts by user B if user A is subscribed to user B
Points 1. and 2. are easy to implement with standard graphQL mutations. But I'm stuck at how to implement 3. in an elegant way. Currently what I do is to use a lambda resolver.
Given inputs "user A wants to see user B", the lambda resolver does the following:
Query Subscriptions table to see if there's a document for "user A subscribed to user B"
if such a row exists, query Posts table and return documents. If not, return nothing.
This logic required two round trips, but since dynamo is fast I'm OK with this trade-off. There are other downsides though, so I'm wondering if there's a more Amplify-native way to do this? Some magic DynamoDB and #auth trickery perhaps?
Thank you!

If you are using multiple tables to store the data, the multiple query approach is your only option.
You can use transactions when mutating items across multiple tables, which is useful when you want to perform an operation based on a condition on an item in another table(s). But when it comes to a read operation, you have no such option.
Aside from re-designing your tables to support this access pattern, I don't think two reads is particularly bad.
If you wanted to handle authorization logic outside of DDB, you may want to look into AWS IAM and it's documentation on Fine-Grained Access Control. Among other features, IAM can restrict access to specific items in a table based on certain primary key values.

Intermediate Service between client and SOLR search

I want to create some custom search logic.
I found the logic quite custom so I dont see how this can implemented by extending SOLR.
More specifically, I want client to use the id to perform a search to find similar items of the same category. But the returned results need to be filtered with some very custom logic.
For that reason, I think I want to implement some custom service that will expose a REST API to the client and then it will forward the request to SOLR search.
Do you think that I can avoid this option by extending SOLR search implementation?
Which is best practice?

The best practice is to have a layer between Solr and the client anyway. Solr does not have security out of the box and anybody who can access it can issue delete commands as well as the search one.
So, doing a REST interface to the client and talking to Solr via a secure link (firewall/IP protected) is the good practice.

GAE datastore -- proper ways to implement search/data retrieval in response to a user request?

I am writing a web app and I am trying to improve the performance of search/displaying results. I am relatively new to programming this sort of thing, so I apologize in advance if these are simple questions/concepts.
Right now I have a database of ~20,000 sites, each with properties, and I have a search form that (for now) just asks the database to pull all sites within a set distance (for this example, say 50km). I have put the data into an index and use the Search API to find sites.
I am noticing that the database search takes ~2-3 seconds to:
1) Search the index
2) Get a list of key names (this is stored in the search index)
3) Using key names, pull from datastore (in a loop) and extract data properties to be displayed to the user
4) Transmit data to the user via jinja template variables
This is also only getting 20 results (the default maximum for a Search API query.. I haven't implemented cursors here yet, although I will have to).
For whatever reason, it feels quite slow.. I am wondering what websites do to make the process seem faster. Do they implement some kind of "asynchronous" search, where a page loads while in the background the search/data pulls are processed, and then subsequently shown to the user...?
Are there "standard" ways of performing searches here where the processing/loading feels seamless to the user?
Thanks.
edit
Would doing something like just passing a "query ID" via the page work, and then using AJAX to get data from the datastore via JSON work? Like... can app engine redirect the user to the final page, pass in only a "query ID", and then search in the meantime, and then once the data is ready, pass the information the user via JSON?

Make sure you are getting entities from the datastore in parallel. Since you already have the key names, you just have to pass your list of keys to the appropriate method.
For db:
MyModel.get_by_key_name(key_names)
For ndb:
ndb.get_multi([ndb.Key.from_path('MyModel', key_name) for key_name in key_names])
If you needed to do datastore queries, you could enable parallel fetches with the query.run (db) and query.fetch_async (ndb) methods.

Data security in result sets from Elastic Search, Solr or

I need to add full-text search capabilities to my existing database. Of course first turn is to something like Solr or Elastic Search. And the blocking point I’ve got to is – how to securely display results returned from underlying search engine (let’s think about Solr or Elastic Search for now, however any other solution or engine that hit the point are also appreciated).
The tricky context is that I have, for example, in my system Personal Profile records that are to be indexed. One of the fields in personal profile is – manager’s feedback. Normally in the system that field is visible only to employee’s direct manager and higher hierarchy, i.e. ‘manager’ from another branch will not be able to see that field. However, I want that field to be searchable via full text search but only for people who actually can see it.
Now I query Solr for ‘stupid’ (that is query string) and it returns me N documents. When returning that to end-user I’ll remove the ‘Manager’s feedback’ field because end-user is not the manager of given people – but just presence of the document in resultset is already the evidence of ‘stupid’ guys …
The question is – what is workable approach to handle that use-case? Is it possible to plug into Solr/ES with home-grown security filter for outputs?
Caveats:
filtering out only fields do not work because of above mentioned scenario
filtering out complete documents will not work because of
search engine does not tell which fields matched – therefore no way to manually filter resultset by field http://elasticsearch-users.115913.n3.nabble.com/Best-way-to-return-which-field-matched-td2713071.html
even this does work, removing documents from result set will spoil down facets (e.g. number of matches by department) returned by the engine – I’ll have to either recalculate facets manually or they will not match to manually filtered records and will reveal what I actually do not want to show to end users

In Solr you can create multiValued fields. In your case you can use it to store de-normalized values of organization structure.
In described scenario you will create multi valued field ouId (Organization Unit Id) and store employee's ouId and all parent ouIds. In other words you will save allowed ouIds into this field.
In search scenario you will use FilterQuery - fq parameter filtering by ouId of manager.
Example:
..&fq=ouId:12
where 12 is organization unit id of selected manager.

Maybe this is helpful for you https://github.com/salyh/elasticsearch-security-plugin It adds Document level security to elasticsearch.
"Currently for user based authentication and authorization Kerberos/SPNEGO and NTLM are supported through 3rd party library waffle (only on windows servers). For UNIX servers Kerberos/SPNEGO is supported through tomcat build in SPNEGO Valve (Works with any Kerberos implementation. For authorization either Active Directory and generic LDAP is supported). PKI/SSL client certificate authentication is also supported (CLIENT-CERT method). SSL/TLS is also supported without client authentication.
You can use this plugin also without Kerberos/NTLM/PKI but then only host based authentication is available.
As of now two security modules are implemented:
Actionpathfilter: Restrict actions against Elasticsearch on a coarse-grained level like who is allowed to to READ, WRITE or even ADMIN rest api calls
Document level security (dls): Restrict actions on document level like who is allowed to query for which fields within a document"