Select a subset of matched content? - azure-cognitive-search

I'm using search to index PDFs stored in Azure blob containers. I'd like to return approximately 500-1000 characters of matched text from my searched content so I can highlight this to a user in my web app. The content itself may be up to 200 000 characters which is too large to send over the network. Is it possible to do a substring on a part of the matched content on the server side before sending results to the client? I'm using SearchParameteres to filter matched data, identify facets, and select specific fields to return as follows:
`SearchParameters sp = new SearchParameters()
{
SearchMode = SearchMode.Any,
Top = 10,
Skip = currentPage - 1,
// select specific fields
Select = new List<String>() {"metadata_storage_path", "PubYear", "PubMonth", "PubDay", "NewspaperName", "content"},
IncludeTotalResultCount = true,
// Add facets
Facets = new List<String>() { "PubYear", "PubMonth", "PubDay", "NewspaperName" },
};`
Tks

The closest I can think of is using the hit highlighting API ->
https://learn.microsoft.com/en-us/rest/api/searchservice/search-documents#highlightstring-optional
which will return a few text fragments that matches your query for each document. You currently cannot configure the size of the text fragments, but each text fragments should at least include the full sentence where they are found.

Related

Firebase Flutter - Request with multiple where arrayContains

In my firebase documents, I have a field named "tags" that is a List, for example tags = ["Amazing", "Great", "Disappointing"].
I want to filter the documents to query, so the user select a list of tags, for example filteredTags = [Amazing", "Great"].
In my request, I want to retrieve all documents that have all elements of filteredTags in there tags list.
This query does not work because it looks for a list within tags, which is just a list of string :
query = query.where(KeyTags, whereIn: filteredTags);
And this query return an error, because flutter does not allow to have multiple arrayContains in the same query (works if I have only 1 tag in filteredTags list) :
for(dynamic tag in filteredTags){
query = query.where(KeyTags, arrayContains: tag);
}
Finally, this one work but is not what I look for (it retrieves documents that have one of the filteredTags, whereas I want documents that have all of them :
query = query.where(KeyTags, arrayContainsAny: filteredTags);
Any idea of how to do it ?
Thanks !
What you're describing is an arrayContainsAll type operator, which doesn't exist at the moment.
The only way to implement this now is to store the tags as a map with subfields for each tag and then a value, and then query for those values with equality checks in your query. For example:
tags: {
"Amazing": true,
"Great": true,
"Disappointing": true
}
And:
query
.where("tags.Amazing", isEqualTo: true)
.where("tags.Great", isEqualTo: true)
Also see:
Firestore search array contains for multiple values
Firestore array contains query for list of elements
Firestore query - array contains all
Firestore: Multiple 'array-contains'

OrientDB - find "orphaned" binary records

I have some images stored in the default cluster in my OrientDB database. I stored them by implementing the code given by the documentation in the case of the use of multiple ORecordByte (for large content): http://orientdb.com/docs/2.1/Binary-Data.html
So, I have two types of object in my default cluster. Binary datas and ODocument whose field 'data' lists to the different record of binary datas.
Some of the ODocument records' RID are used in some other classes. But, the other records are orphanized and I would like to be able to retrieve them.
My idea was to use
select from cluster:default where #rid not in (select myField from MyClass)
But the problem is that I retrieve the other binary datas and I just want the record with the field 'data'.
In addition, I prefer to have a prettier request because I don't think the "not in" clause is really something that should be encouraged. Is there something like a JOIN which return records that are not joined to anything?
Can you help me please?
To resolve my problem, I did like that. However, I don't know if it is the right way (the more optimized one) to do it:
I used the following SQL request:
SELECT rid FROM (FIND REFERENCES (SELECT FROM CLUSTER:default)) WHERE referredBy = []
In Java, I execute it with the use of the couple OCommandSQL/OCommandRequest and I retrieve an OrientDynaElementIterable. I just iterate on this last one to retrieve an OrientVertex, contained in another OrientVertex, from where I retrieve the RID of the orpan.
Now, here is some code if it can help someone, assuming that you have an OrientGraphNoTx or an OrientGraph for the 'graph' variable :)
String cmd = "SELECT rid FROM (FIND REFERENCES (SELECT FROM CLUSTER:default)) WHERE referredBy = []";
List<String> orphanedRid = new ArrayList<String>();
OCommandRequest request = graph.command(new OCommandSQL(cmd));
OrientDynaElementIterable objects = request.execute();
Iterator<Object> iterator = objects.iterator();
while (iterator.hasNext()) {
OrientVertex obj = (OrientVertex) iterator.next();
OrientVertex orphan = obj.getProperty("rid");
orphanedRid.add(orphan.getIdentity().toString());
}

Projection in query

In solr, there is parameter "fl", which provides projection in query. How can I achieve same feature in liferay?
Thank you in advance for your help and suggestions.
fl is not a projection in a SOLR query, it simply selects the result fields.
First of all: Liferay is using Lucene as search engine - not SOLR.
If "in liferay" means "in the UI": You can't select the result fields in the UI. The results are objects from the database that were filtered by Lucene (for some of the search forms and some configurations the result is retrieved even without Lucene directly from the database using SQL).
If "in liferay" means "in the API": You can select the result fields, if you access the Lucene indexer in low level (the ...ServiceUtil.search methods won't help) and use a FieldSelector:
IndexSearcher indexSearcher = LuceneHelperUtil.getIndexSearcher(companyId);
IndexReader indexReader = indexSearcher.getIndexReader();
FieldSelector fieldSelector = new FieldSelector() {
public FieldSelectorResult accept(String fieldName) {
// Only return "my-field"
if ("my-field".equals(fieldName)) {
return FieldSelectorResult.LOAD_AND_BREAK;
}
return FieldSelectorResult.NO_LOAD;
}
};
TopDocs topDocs = indexSearcher.query(luceneQuery, maxDocuments);
// Retrieve only the selected fields for the hits
List<Document> results = new ArrayList<Document>();
for (int i = 0; i < topDocs.scoreDocs.length; i++) {
results.add(indexReader.document(topDocs.scoreDocs[i].doc, fieldSelector));
}
You can use any of the other query methods as well.
Have a look at com.liferay.portal.search.lucene.LuceneIndexSearcher to find out how to build your query correctly.

Sitecore Solr Search by Multilist with Values from Another Multilist

I have a set of product items. Each product item has a multilist field that points to a set of product type items. When on a product page, I want to show a paged list of related items. These should be items that share a product type with the currently selected item. I'm running into some trouble because products can have multiple types. I need to split the type list on the current item and check that against the list of products in an expression. For some reason split and contains are throwing runtime exceptions and I can't really figure out why. I saw some things about the predicate builder being used for dynamic queries and I will try to use that with what I currently have but I'd like to know why this can't be done straight in the where clause.
Another issue I ran into is that the list of ids stored in solr are being stripped of their '{', '}', and '-' characters.
If you are already on the product page I assume you already have the product item and that product item should have a "ProductType" multilist field. You can use Sitecore.Data.Fields.MultilistFiled to avoid worrying about have to split the raw values.
You can then use Sitecore's Predicate Builder to build out your search predicate, which I assume you want to find all products that have one similar product type. You should adjust this search logic as needed. I am using the ObjectIndexerKey (see more here -> http://www.sitecore.net/Learn/Blogs/Technical-Blogs/Sitecore-7-Development-Team/Posts/2013/05/Sitecore-7-Predicate-Builder.aspx) to go after a named field, but you should build out a proper search model and actually define ProductTypes as a List< ID> or something similar. You may need to add other conditions to the search predicate as well such as path or templateid to limit your results. After that you can just execute the search and consume the results.
As far as Solr stripping the special characters, this is expected behavior based on the Analyzer used on the field. Sitecore and Solr will apply the proper query time analyzers to match things up so you shouldn't have to worry about formatting as long as the proper types are used.
var pred = PredicateBuilder.True<SearchResultItem>();
Sitecore.Data.Fields.MultilistField multilistField = Sitecore.Context.Item.Fields["ProductTypes"];
if (multilistField != null)
{
foreach (ID id in multilistField.TargetIDs)
{
pred = pred.Or(x => ((ID)x[(ObjectIndexerKey)"ProductType"]).Contains(id);
}
}
ISearchIndex _searchIndex = ContentSearchManager.GetIndex("sitecore_master_index"); // change to proper search index
using (var context = _searchIndex.CreateSearchContext())
{
var relatedProducts = context.GetQueryable<SearchResultItemModel>().Where(pred);
foreach(var relatedProduct in relatedProducts)
{
// do something here with search results
}
}
Just an Improvement to #Matt Gartman code,
The error (ID doesn't contain a definition for Contains) which keeps popping up is because .Contains is not a functionality of Type ID, I recommend you to Cast it in string type as below
foreach (ID id in multilistField.TargetIDs)
{
pred = pred.Or(x => (Convert.ToString((ID)x[(ObjectIndexerKey)"ProductType"]).Contains(id.toString())));
}

Query Data Using LINQ C# With Filter string[] array?

Based on the documentation here:
http://docs.composite.net/Data/AccessingDataWithCSharp/How-to-Query-Data-Using-LINQ#_How_to_Query
I need to query data in a table using a filter in type string[int1,int2,int3..] and can't work out how to go about it.
The string[] comes from a different table field which stores id values of a multiselect element on a form:
Table 'Profile' (AGlobal.profile) contains columns:
Id Types(profile_accomtypes)
1 1,2
2 4,7
3 12,4,6
4 3,6,9
Then I have a static table 'TypeDesc' (ALocal.proptype) listing a total of 12 'Type' values:
Id Description(proptype_names)
1 The first description
2 The second description
........
12 The twelfth description
I created a strongly coded class enabling me to easily handle the form content on submit from the client. Within the form was a couple of multiselects (one of them being 'Types" above in the Profile datatype table.) Each of the multiselects are passed to the server in serialized JSON format where I string.Join the 'Types' values with a comma separator to save into the Profile.Types column.
Now I want to serve the selections in a profile page to the client by loading the Types string[] of Profile Id and using the int id values to filter the TypeDesc table to only select the Type values with Description so that I can render the descriptions as a bullet list on the client.
The filter Types in the Profile table are always id integers
My code I'm using is:
var myProftype =
(from d in connection.Get<AGlobal.profile>() // find multiselected type string values
where d.Id == StUserSet.utoken
select d).First();
string sProftype = myProftype.profile_accomtypes;
string[] sTypes = sProftype.Split(',');
// now filter proptypes to sTypes
var myTAccomtypes =
(from d in connection.Get<ALocal.proptype>() // get all the types from the DB
where(r => sTypes.Contains(r.Field<int>("Id"))) //Lambda ?
select d).All;
StringBuilder sb = new StringBuilder(0); //create a bullet list string
// Loop over strings
foreach (string s in myTAccomtypes)
{
sb.append("<dd>"+ s +"</dd>");
}
TuAccomtypes = sb.ToString(); // pass string to JQuery Taconite as part of AJAX response to alter DOM.
I have an error on the Lambda trying to filter my types.
In VS2010:
Error = Cannot convert lambda expression to type bool because its not a delegate type.
I also do not know how to go about parsing the sTypes variables to int (if I need to) so that the filter works :(
Where am I going wrong? Is there a cleaner way to filter a dataset against a comma separated list queried from a column field within a db table?
Thank you for any help/ideas in advance.
Martin.
I'm not entirely sure about your model, but I think this will work for you. I changed your linq, and combined some statements. I'm also casting your Id field to string so that it can be found correctly in the array.Contains() function. You might want to do the reverse of casting your strings to ints and comparing that way, but that's up to you.
var myProftype = profiles.First(p => p.Id == StUserSet.utoken);
string sProftype = myProftype.profile_accomtypes;
string[] sTypes = sProftype.Split(',');
var myTAccomtypes = propTypes.Where(r => sTypes.Contains(r.Field<int>("Id").ToString()));
StringBuilder sb = new StringBuilder(0);
foreach (PropType s in myTAccomtypes)
{
sb.Append("<dd>" + s.Description + "</dd>");
}
Once splitting the string var from the original Linq query (which identified a single field with a joined string of comma separated id numbers.) I wasn't able to use "Contains" properly.
I cast the second Linq query ToList which evaluated the collection.
Then instead of working with a full result I limited the result to just the id and name fields.
Relying on an article posted by Vimal Lakhera:
http://www.c-sharpcorner.com/uploadfile/VIMAL.LAKHERA/convert-a-linq-query-resultset-to-a-datatable/
I converted the result set into a DataTable which allowed easy looping and selection of fields to output as an html string as part of a JQuery Taconite callback.
Here's what works for me...
// now filter proptypes to selected Types|
var myTAccomtypes = from d in connection.Get<ALocal.proptype>().ToList()
// ToList() will evaluate collection, you cannot pass sTypes array of integers to a sql query, at least not in that way
where sTypes.Contains(d.proptype_id.ToString())
select new { d.proptype_id, d.proptype_name };
DataTable AcomType = LINQToDataTable(myTAccomtypes);
StringBuilder sb = new StringBuilder();
// Loop over table rows
foreach (var row in AcomType.Rows.OfType<DataRow>().Take(19)) // will .Take up to a maximum of x rows from above
{
sb.Append("<dd>");
sb.Append(row["proptype_name"].ToString());
sb.Append("</dd>");
}
HldUserSet.TuAccomtypes = sb.ToString();
//HldUserSet.TuAccomtypes = string.Join(",", myTAccomtypes); //Check query content
Using Vimal's 'LINQToDataTable" with the tweak in the LINQ request means that I can use the class in numerous places of the site very quickly.
This works a treat for those with a single string of joined id's in the form of "2,7,14,16" that need to be split and then used to filter against a wider collection matching the id's from the string to record id numbers in a different collection.

Resources