Azure Cognitive Search: Join types - azure-cognitive-search

Elasticsearch has Join field type
Solr has Join query parser
What is the equivalent in Azure Cognitive Search? (if any)
The closest concept I've been able to find is Complex Data Types, but they're more like the equivalent of "nested documents" in the other two technologies. They don't allow for individual management of child records. Instead they require that all children are written together (which doesn't work for a large number of child records).

Related

Fuzzy Matching in SnowFlake like EDIT_DISTANCE_SIMILARITY

Do we have any function for name fuzzy matching like we have UTL_MATCHING.EDIT_DISTANCE_SIMILARITY in oracle. I have to find the difference at row level.
Snowflake has EDITDISTANCE and SOUNDEX functions:
select editdistance('Duningham', 'Cunningham');
-- Result 2
select soundex('McArthur') = soundex('MacArthur');
-- Result TRUE
For EDITDISTANCE, unlike EDIT_DISTANCE_SIMILARITY lower scores are closer matches. There are many open source JavaScript implementations of fuzzy matching that you could plug into a Snowflake JavaScript UDF.
Interzoid (Disclaimer, I work there) has matching capabilities with native Snowflake connectivity, using knowledge bases (for different data types: name, company, address, etc.), heuristics, soundex, spelling analysis, derivatives, contextual ML, etc.) using a similarity key technology for use with one or more tables. It accesses an underlying API for each record in a table to generate the similarity keys (which can be appended to the table if desired) upon which the fuzzy matching is based -> https://connect.interzoid.com/matching-data-database - it would work on the above scenario.

Is there any way that a single search can encompass different indices at the same time in azure search?

I'm searching through the Azure search REST APIs, I have a data source, a set of cognitive skills, an index and an indexer, when I do any type of search from a data source with a single index, it performs the search correct, but if I have different indexes with others data source only the result of one of the indices will return.
my question is: how in a single search can I cover different indices at the same time and return the results I find from the different indexes?
Unfortunately Azure Search does not support searching over multiple indexes in a single query.
Please consider upvoting the 'Search multiple indexes at once' feature in Azure Search’s user-voice to help us to set feature priorities.

Recursive design in NoSQL

How could I express the following design?
There are two entities: user and group
Group can have users and other groups
User can't have other users or groups
Efficiently query any group and everything it contains
There are conceptually no depth limits (current hardware dictates it, e.g 5 for query speed)
Examples:
I need to use NoSQL and also be able to cache this data (Redis for example, which is NoSQL itself).
---
My current idea:
Every group is a single unit and only contains children (users and groups) IDs. Then I query all the children by IDs. If some of these also have children, I'll make another roundtrip and so on and on..
As you can imagine, this solution requires multiple queries and the amount increases with every "level of deepness". The good news is that I query all these items by ID which should be extremely fast.
Can anyone suggest a better way?
I would use a graph database as they are very powerful when dealing with this kind of queries.
Bear in mind that you won't be able to query the "parents" of a node though.
You could use Neo4j for this. They have a community edition that is free. https://neo4j.com/

How to manage multiple STOP-LIST for a single database? Where each STOP-LIST is created for a specific query result

I am using MS-SQL Server 2008, where I have created two different custom stop-list for my database. Now both the stop-list contains different stop-words. My aim is to use each of the stop-list for a specific fulltext query search result.
For example: There is a Job portal where Candidates are searching for the Jobs with some keywords, whereas Employers are searching for the right Candidates with some keywords. Now consider that there are two different stop-list that I would like to manage, one Stop-List for Job Search specific and another stop-list is for Candidate Search specific.
How can I achieve this in my SQL Query or Stored Procedure?
I don't have much FTS experience but the documentation states that stop lists are applied via full-text indexes, and only one full-text index per table or view is allowed. So applying stop lists dynamically in queries is obviously not possible.
What would be possible on the other hand is to create an indexed view on your table. Then you could put one full-text index on the table and a second one on the view, each with a different stop list. You would have to modify the queries too, of course, so that job searches use the table and candidate searches use the view (or vice versa).
If that approach doesn't work for you, then you would have to look into alternatives that have the functionality you need.

Are DocumentStores (alone) good for searching documents?

I am currently thinking how to best store web-crawling results in a database. In another question document-oriented databases were recommended to use for a web-crawler project: Database for web crawler in python?
Now I am wondering if map/reduce is the right way for such classification and value-generation. At least it seems to be able to do such stuff (map for only classification like years or authors, and map/reduce for calculating numerical values which I cannot think of an example at the moment).
However, would map-reduce / DocumentStores also be able to give me the right documents for a given word? In a relational database I would have to use a JOIN on some tables and then get documents containing these words:
SELECT * FROM docs d
JOIN doc_words dw ON dw.doc_id = d.id
JOIN words w ON dw.word_id = w.id
WHERE w.word = 'foo'
I guess DocumentStores are not capable of such an operation as they do not support fulltext index and are not intended to have many references / relations.
Would the better alternative be mixing several systems? E.g. one for searching by words, one for searching by different values if present (like year of publication, author, …)? I think DocumentStores are not so bad for storing the metadata, as sometimes there are specific values and sometimes not (and DocumentStores are easy to use across multiple servers if wanted, as soon as there are too many documents for one server). Yet, I am not sure what would the best way to implement searching for a collection of documents (including webpages, pdfs, images, which have always different meta-data, but often also need fulltext index).
To make a clear question: Should I use another database system together with DocumentStores, use DocumentStores alone (howto search for words quickly?) or another DB system alone?
PS: Another example for such a problem would be the linking between webpages, which cannot be saved in DocumentStores well either. However, OrientDB might solve this problem as it seems to combine graph database and document-oriented database.
Checkout RavenDB. It is a document DB with Map/Reduce queries, using Lucene under the hood, so full-text search is fully supported also within Map/Reduce queries.
Custom Lucene analyzers are supported as well, so there's a lot of room for further full-text extensions.
Other features like Includes and Live Projections may give you everything else a simple Map/Reduce will be missing.
See MarkLogic - which was designed specifically for searching documents. http://developer.marklogic.com/products/marklogic-server/which-nosql

Resources