Azure Cognitive Search: add multiple suggesters to a single index - azure-cognitive-search

I'm trying to add multiple suggesters to an Azure Search index, but receive this error response:
An index cannot have more than one suggester with searchMode='analyzingInfixMatching'
The property on the index is called suggesters and is of type array.
The documentation states:
The only mode currently supported is analyzingInfixMatching
Is the api defined to support future capabilities? Or am I overlooking an option that will enable me to add multiple suggesters?
And to understand why:
I have fields in my index for different languages. I would like a suggester only to include the field for the language the current user has specified.
With the current implementation I can only provide suggestions or autocomplete based on all available languages.

I work on the Azure Cognitive Search team. As you guessed, the schema is designed to be adaptive to future changes but at the moment only one suggester is supported - analyzingFixMatching.
If I understand correctly, you want to apply suggestions selectively only on fields specified by user at query time. You can achieve that through searchFields property of suggestions API. Define a suggester with all the fields that can potentially be specified by users and then limit the required ones using the above property.
More details here - https://learn.microsoft.com/rest/api/searchservice/suggestions

Related

Is there any specific API to get multiple documents from a single Database in Marklogic DB?

I am trying to find a specific Rest API that can help to get multiple documents from a single database in Marklogic DB.
Kindly share the specific endpoint. Thank you!
If you know the document uris, you can feed them all in a single call to /v1/documents:
https://docs.marklogic.com/REST/GET/v1/documents
uri+ One or more URIs for documents in the database. If you specify multiple URIs, the Accept header must be multipart/mixed.
If you have search criteria, you can use /v1/search instead, which allows paging through a large result set. It doesn't return full documents by default, but you can specify an Accept header of multipart/mixed, use view/format settings, and/or search options to influence its behavior:
https://docs.marklogic.com/REST/GET/v1/search
HTH!

Example for highlighting keywords vespa

In the documentation of features, it is said that search engine like keyword highlighting is supported in vespa. I couldn't find any example on how to implement it.
You control this in the search definition document on a per field basis
See
https://docs.vespa.ai/documentation/reference/search-definitions-reference.html#bolding
You can also instruct it do make a dynamic small snippet with the summary:dynamic option

Get information from publishItem to publish end in sitecore

I got a setup with sitecore and solr.
Im looking to gather information (the different TemplatesIds) in publishItem, and then when the publish has ended, call solr with the names which needs to be reindex.
Ive managed to get all the template IDs both using PublishItemProcessor and as a publish:itemProcessed event, where i store the template ids in the PublishContext.CustomData as a Hashset.
But how can i, when the publishing is done get this information i've gathered during publishing? I want to call solr, once, and only once, after everything is published, with information gathered during the publishing.
Hope this makes sense guys, please help out.
You don't need to make a hack to reindex indexes after a publishing.
Sitecore has out of the box this functionality.
You use index update strategies to maintain indexes. You can configure each index with a unique set of index update strategies. You should not specify more than three update strategies per index for performance reasons.
Sitecore provides a varied set of index update strategies, and you can extend this set with more strategies.
All the strategies that are delivered with Sitecore are defined under the following node in the Sitecore.ContentSearch.Solr.Index.IndexName configuration files:
<configuration ref="contentSearch/indexConfigurations/defaultSolrIndexConfiguration" />
<strategies hint="list:AddStrategy">
You need to use of these default strategies:
RebuildAfterFullPublish
OnPublishEndAsync
More information about search, indexing and crawling you can find here:
https://doc.sitecore.net/sitecore_experience_platform/setting_up__maintaining/search_and_indexing

Index content of PDFs with Solr and Tika

The problem briefly: I would like Sitecore to index the contents of PDFs using Solr's built in functionality (supplied by Tika). I'm not sure how to configure Sitecore's indexing to use this feature in Solr(Tika). (I think I need to write a custom indexer.)
I'm working with Sitecore 7 (7.1 Update 1) and want to index content from PDFs (or other rich media types). I'd like to index this data for search purposes.
I have Solr (4.6.1) installed and working with Sitecore 7. When I index my site it saves all of the documents to the correct Solr core, and I can successfully retrieve these documents for display.
Using curl, I can send a PDF to my Solr instance and get it indexed.
curl "http://localhost:8983/solr/update/extract?literal._id=doc1&uprefix=attr_&fmap.content=attr_content&commit=true" -F "myfile=#sample.pdf"
This works, and I can read this content in my Sitecore web project and display it in views, so I know I can get access to this data. However, I would like the data to be attached to the items that I have uploaded in Sitecore.
I'd like something like this to happen when I upload a PDF to the Sitecore Media Library and publish the item, or at least when I re-index the site.
I'm currently walking through the following tutorial to learn some things about writing custom indexing (here is a link to part 1):
http://www.sitecore.net/Community/Technical-Blogs/Getting-to-Know-Sitecore/Posts/2013/04/Sitecore-7-Search-Provider-Part-1-Manually-Triggered-Indexing.aspx
Thanks for you patience.
For Sitecore, when handling media data, Lucene and Solr needed to index the content in a consistent way (so that you could switch between them if needed and still get data indexed in the same way). As Tika integration is very much a Solr thing it was decided that both should use the general windows concept of IFilters for indexing (http://en.wikipedia.org/wiki/IFilter)
This means that as long as you have the correct IFilter for that mime type installed on the machine that is doing the indexing, then the '_content' computedfield will be populated with the output.
This doesn't mean you can't use the Solr Tika integration but it is not supported by default and would be a customization.
It would be very simple to:
Disable the '_content' computed field
Set up a publish pipeline processor that looks at each item being published
Check if it is a media item
Check to see if it is a PDF
Issues a command to push the content to your Solr server for indexing by Tika.
You may want to see what results you get by using an IFilter, if the results are close enough to what you want then you can go with that, if Tika is producing better results for you then you should be able to switch to that, although you would probably have your media content indexed in a separate Solr core so you would lose any Sitecore specific metadata around the document.
Some blog posts that might be helpful:
http://www.samjgriffin.com/blog/2013/11/06/sitecore-7-pdf-and-document-content-search/
http://www.sitecore.net/Community/Technical-Blogs/John-West-Sitecore-Blog/Posts/2013/04/Sitecore-7-Indexing-Media-with-IFilters.aspx
I second the recommendation to use Sitecore's built-in MediaItemContentExtractor + IFilter approach, unless you've already ruled that out for some reason (IFilter difficulties, perhaps). If an IFilter is not an option, or if you're interested in the other approach regardless, I would integrate Tika a little differently than Stephen suggested, though.
The tutorial you referenced deals with writing your own search provider - i.e. replacing the built-in provider entirely. You should be able to leverage Sitecore's Solr provider and accomplish what you need with something lighter: a computed index field. The built-in media extractor mentioned above uses this approach, which enables you to put just about anything into your index during the normal indexing process. Here is a blog post from John West that walks through creating a basic computed index field: Sitecore 7: Computed Index Fields.
In short, write a class that implements IComputedIndexField and represents content extracted from PDFs or other rich documents. In your implementation of the ComputeFieldValue method:
Call GetMediaStream() on the document.
Pass the stream to Solr in an extract-only command and capture the result.
Return that result to store it in the computed index field.
If you need the media content to be in a particular existing index field, then configure the computed field as a copy field (see 3.3.3) into the existing field. Otherwise, configure your search to reference the computed field.
The main drawback here is the expense of passing the extracted content back and forth, rather than committing it directly to the index in a single step. Depending on your index size and contents, that may not be an issue for you.
One other potential option would be a post-rebuild task to add media content to the existing indexed documents. I am not certain this would work. It depends on knowing the IDs of the media items' documents and committing the rich document content in partial document updates, which this person was unsuccessful in attempting. If you try this, be sure to execute it in the indexing:end event prior to HTML cache clearing.
Whatever approach you take, if you want to work with Tika on a higher level than cURL, have a look at SolrNet's implementation of ExtractCommand and related classes.
If you could upgrade your site into sitecore 7.2 in which the media items content will be indexed automatically and there is no need to install the related IFilter, You should read the following:
Media Content Indexing in sitecore 7.2

appengine objectify text search

Maybe I am missing something but is there any way to use the new text search features as described in the 2011 presentation http://www.youtube.com/watch?v=7B7FyU9wW8Y (approx. 30min mark) with Objectify, Entities, and Java? I realize it is an experimental release but the text search features that are present don't seem to cover the full extent they discussed in the presentation. I don't want to have to write my own code to manage the creation, updates to documents. But I don't currently see another way??
Currently, you can't use Full Text Search to search through entities in Datastore; you'll need to create search documents in a search index to use the Full Text Search API, as described in these docs.

Resources