Index structure for azure search - azure-cognitive-search

I'm putting together a query to index medicines. A user should be able to enter their search term into a single search box. Their search term might be either a brand name for a drug, a generic name (the underlying compound on which all brands are based) or an indication and they should be returned a list of medicines that correspond to their search. I'd like to have a category facet for the type - either indication, brand or generic.
To have a category facet, my understanding is that I'd have to send my data through as one row per search term where that search term might be a brand, indication or a generic, rather than one row per brand with columns for generic list and indication. Is this correct or is there another way to get at what I'm wanting to do?

I hope I understand your ask here. From the screenshot you provided, I would assume what you would want to do is make the field "MedicineInformationType" a Facetable field in your Azure Search index and make the field "SearchTerm", "Product", "GenericList", and "ActionList" all Searchable fields in your Azure Search index (although I am not sure why you would want the "SearchTerm" field if the term in this field is already in one of the other fields).
If you structure your index this way, you can do a search for say "phosphate" and facet over the "MedicineInformationType" field to get a count of the results that are generic or brands.
For example (as a REST call):
search=phosphate&facet=MedicineInformationType

Related

Neo4j - How to use createRelationshipIndex / createNodeIndex in full-text search

So I understand that Neo4j 3.5 and above implements full-text search in cypher query via createNodeIndex(), e.g.:
CALL db.index.fulltext.createNodeIndex("myIndex", ["PersonNode"], ["name"])
where myIndex is an arbitrary variable I make up to store the index, PersonNode is the name of my Node label, and name is one of the attributes of PersonNode where I want the full-text search performed.
And to actually perform the search by name, I can do something like the following:
CALL db.index.fulltext.queryNodes("myIndex", "Charlie")
But now assume that PersonNode has a relationship of type PURCHASED_ITEM, which is connected to another node label ProductNode as follows:
PersonNode-[:PURCHASED_ITEM]->ProductNode
And assume further that ProductNode has an attribute called productTitle indicating the display title name for each product.
My question is, I would like to set up an index for this relationship (using, presumably, createRelationshipIndex()), and perform a full-text search by productTitle and return a list of all PersonNode that purchased the given product. How can I do this?
Addendum: I understand that the above could be done by first getting a list of all ProductNode instances matching the given title, then performing a normal cypher query to extract all related PersonNode instances. I also understand that for the above example, a normal cypher query would be all that I need. But the reason I'm asking this question is that I eventually need to implement a single search bar that would allow the user to input any text, including possible misspellings and all, and have it perform a search through multiple attributes and/or relationships of PersonNode, and the results need to be sorted by some kind of relevance score. And in order to do this, I feel I need to first grasp exactly how the relationship queries work in neo4j.
Here is an example of how to create a full-text index for the productTitle property of PURCHASED_ITEM relationships:
CALL db.index.fulltext.createRelationshipIndex("myRelIndex", ["PURCHASED_ITEM"], ["productTitle"])
And here is a snippet showing the use of that index:
CALL db.index.fulltext.queryRelationships("myRelIndex", "Hula Hoop") YIELD relationship, score
...
product title is the property of product node not the purchased item

How to boost AND in a solr query?

Suppose a user enters a two word input for search, since the default boolean applied is OR, all entries containing all or both entries appear.
What I was interested to know, is that if conditions specifically meeting the AND condition could be boosted.
In case of multiple words, can words be specified to imply specific constraints in searching or boost few parameters in case these words are present.For e.g: , if input be "with x and y without z", can i make my solr to interpret it as (x AND y) AND (Not z)? or at least boost those entries which partially or fully meet the requirement?
EDIT:
I have tried using boost with edismax as shown here:
$query = $client->createSelect(); //create search query
$query->setQuery('memberType:'.$searchQuery.' firstName:'.$searchQuery.' gender:'.$searchQuery); //include fields required for searching //meantion fields to be searched and search query/ies
$edismax = $query->getEDisMax();
$edismax->setQueryFields('firstName memberType^3 gender^2'); //boost fields
$query->setStart($start)->setRows($rows); //vary bracketted numbers to vary results staring point and no. of rows to be displayed, use variables instead of constants
$query->setFields(array('id', 'firstName', 'lastName', 'eid', 'gender', 'memberType')); //set return fields
//$query->addSort('id', $query::SORT_ASC); //sort field and customisations
$resultSet = $client->select($query);
When i search for a name with a particular member type, like "sanjay candidate" i expect the order to be entries with sanjay and candidate, and then all users who are candidates and then all users who are sanjay, but instead i get sanjay and candidate then all who are sanjay and then all candidates.
I am not able to figure out what the issue may be or if i can provide a more customized boosting.
If you are using eDismax, you have a whole collection of boosting options for a phrase, bigram, a separate boosting query and so on. Reading through the wiki page and experiment. You should not need to do any custom coding for this scenario.

Difference between Solr Facet Fields and Filter Queries

I am using SolrMeter to test Apache Solr search engine. The difference between Facet fields and Filter queries is not clear to me. SolrMeter tutorial lists this as an exapmle of Facet fields :
content
category
fileExtension
and this as an example of Filter queries :
category:animal
category:vegetable
categoty:vegetable price:[0 TO 10]
categoty:vegetable price:[10 TO *]
I am having a hard time wrapping my head around it. Could somebody explain by example? Can I use SolrMeter without specifying either facets or filters?
Facet fields are used to get statistics about the returned documents - specifically, for each value of that field, how many returned documents have that value for that field. So for example, if you have 10 products matching a query for "soft rug" if you facet on "origin," you might get 6 documents for "Oklahoma" and 4 for "Texas." The facet field query will give you the numbers 6 and 4.
Filter queries on the other hand are used to filter the returned results by adding another constraint. The thing to remember is that the query when used in filtering results doesn't affect the scoring or relevancy of the documents. So for example, you might search your index for a product, but you only want to return results constrained by a geographic area or something.
A facet is an field (type) of the document, so category is the field. As Ansari said, facets are used to get statistics and provide grouping capabilities. You could apply grouping on the category field to show everything vegetable as one group.
Edit: The parts about searching inside of a specific field are wrong. It will not search inside of the field only. It should be 'adding a constraint to the search' instead.
Performing a filter query of category:vegetable will search for vegetable in the category field and no other fields of the document. It is used to search just specific fields rather than every field. Sometimes you know that the term you want only is in one field so you can search just that one field.

Is there any workaround for sorting on multiValued field?

Sorting can be done on the "score" of the document, or on any multiValued="false" indexed="true" field provided that field is either non-tokenized (ie: has no Analyzer) or uses an Analyzer that only produces a single Term (ie: uses the KeywordTokenizer)
docs:- http://wiki.apache.org/solr/CommonQueryParameters#sort
My original schema is (You can consider the following is a GROUP-BY) :-
products (id, unique)
users who make some comments(multiValued)
last_comment_date for each user (multiValued, one user can make multiple comments, but only the last comment date is captured)
If sorting on multiValued is allowed,
I can easily get list of the products commented by certain users,
then sort by last_activity_date.
However, it does not work.
The workaround I have currently is to reverse the schema to :-
user + product (as id, unique)
user (single value)
last_comment_date
products
Which mean I (sort of) manage to get list of the products commented by certain users,
order by last_comment_date,
of course it lead to duplication of products
as product will appear in each of the user's comment.
Any suggestion to simulate a group-by effect.
Between, I using solr 3.1.
Field collapsing does not apply.
Sorting by a multi-valued field is not something that is just pending to do or can be patched.
It can't be possibly done because it simply doesn't make any sense.
The way to do this is to have a single-valued field (populated at index-time with the last date) per document, then sort on that. I.e. when indexing traverse the list of users with their last activity date, find the latest date, and assign it to the document's last-activity-date field.

SOLR: Is it it possible to index multiple timestamp:value pairs per document?

Is it possible in solr to index key-value pairs for a single document, like:
Document ID: 100
2011-05-01,20
2011-08-23,200
2011-08-30,1000
Document ID: 200
2011-04-23,10
2011-04-24,100
and then querying for documents with a specific value aggregation in a specific time range, i.e. "give me documents with sum(value) > 0 between 2011-08-01 and 2011-09-01" would return the document with id 100 in the example data above.
Here is a post from the Solr User Mailing List where a couple of approaches for dealing with fields as key/value pairs are discussed.
1) encode the "id" and the "label" in the field value; facet on it;
require clients to know how to decode. This works really well for simple
things where the the id=>label mappings don't ever change, and are
easy to encode (ie "01234:Chris Hostetter"). This is a horrible approach
when id=>label mappings do change with any frequency.
2) have a seperate type of "metadata" document, one per "thing" that you
are faceting on containing fields for id and the label (and probably a
doc_type field so you can tell it apart from your main docs) then once
you've done your main query and gotten the results back facetied on id,
you can query for those ids to get the corrisponding labels. this works
realy well if the labels ever change (just reindex the corrisponding
metadata document) and has the added bonus that you can store additional
metadata in each of those docs, and in many use cases for presenting an
initial "browse" interface, you can sometimes get away with a cheap
search for all metadata docs (or all metadata docs meeting a certain
criteria) instead of an expensive facet query across all of your main
documents.

Resources