In the new GAE API for Full Text Search, I can't find any option to activate stemming. I have tried to search for singular/plural words in my application, and indeed searching for "document" does not return the same result set as searching for "documents". Same goes for accentuated characters, searching for "vehicule" or "véhicule" does not return the same result set.
Is there an option somewhere, either in the API or in the query language syntax, that I can use to activate stemming ? Or do I have to build my own stemming by pre-processing the query and translate for example "document" into "(document OR documents)" ?
In this other SO question they discuss the same. You should use the now documented ~ operator
you should assume charset type.
Related
I'm trying to implement an Azure suggester feature into our pilot Azure search app and running into issues. The content I'm indexing are PDF files, so my suggester definition is based on the content field itself which can be thousands of lines of text. Following examples online, when I implement the suggester, I'm returned the entire content of the body of text from the PDF file. What I'd really like to do is return just a phrase found in the text.
For instance, suppose I'm indexing a Harry Potter book and I type into my search field "Dum", I'd like to see suggested results back like "Dumbledore", "Dementor", etc VS the whole book. Is this possible?
Tks
If we want to search for words sharing the same prefix, Autocomplete is the right API for this job. https://learn.microsoft.com/en-us/rest/api/searchservice/autocomplete
In contrast, Suggester API helps users find the documents containing words with that prefix. It returns text snippets containing those worlds.
If you still believe suggester api does not behave as expected and autocomplete is not suitable, let me know your source document, query and expected results.
I am trying to implement case-insensitive search to my title and content
fields, however to no avail. I have tried the following methods:
Adding <filter class="solr.LowerCaseFilterFactory" /> to 'text_general' field type in schema.xml / managed-schema.xml, to both 'index' and 'analyze' tokenizers.
My title and content field each will be of 'text_general' type.
I tried searching the following:
*abc* : No results of 'ABC' appears
*ABC* : Only results with 'ABC' appears.
This clearly shows that lowercase filters are not working. Also pasted below is the debug results of the first query.
Also below is the screenshot of the title field when analyzing a sample text. Output seems ok, but search does not work as per expected. Is this a search query issue?
Thanks for any help in advanced!
No, it doesn't clearly show that the lowercase filtering doesn't work - what you're experiencing is that most filters or tokenizers aren't applied when you're doing a wildcard search (since they really can't be applied cleanly for a wildcard search where they don't have the whole term to work with).
The solution is, if you want to perform a wildcarded, lowercased search, is to perform the lowercasing or processing of the field before actually indexing it, and using only a tokenizer to split the text as necessary (where LowercaseTokenizer seems to be the only one that is a MultiTermAwareComponent). Otherwise, if you don't want to perform any tokenization or splitting of the string, use a string field.
You can do this either in your own code that sends content to Solr or in an update processor.
Just tried GAE(1.7.7 Java) Full Text Search and found if the search string is work, surprisingly it will not match working, worked, or hardworking, homework, I'd like to know if i miss something in the API, i read the tutorial but did not found any document about this except plural match.
Thanks.
P.S. I tried unit test for search service, not in working environment.
Tucked away in the docs (but unfortunately not in the table of operators), there is a '~' operator
To search for plural variants of an exact query, use the ~ operator:
~"car" # searches for "car" and "cars"
Not sure how far that will get you. Unfortunately thats about it.
See https://developers.google.com/appengine/docs/java/search/overview#Queries_on_Fields
There is so little documentation on this,but just having tried it, it just works on plurals.
One approach would be to do your own stemming on the words in the document, (though you wouldn't return that as the text ;-) Then you could perform stemming on your search term and be able to match worked, working etc..
This is a late answer, but to follow up the previous answer, what you want to do is not possible with the basic API functions. The search API works on full-text searching principles. To get around this you can tokenise your searchable data pre-index and store this in a field with the relevant document.
See: Partial matching GAE search API
My text is like this: The searched word WildCard shall be partially highlighted
I search using wildcard expression "wild*".
What I need is to have the highlight snippet to be [tag]Wild[/tag]Card. What I got was [tag]WildCard[/tag], and I spent lots of time researching it, but could not find an answer.
This behavior can be illustrated on linkedin.com, where you type other people's name at the top right corner.
Once this is figured out, I will have a follow-up questions.
Thanks,
I am not sure if you can achieve what you want directly in solr. The obvious solution is to parse the returned doc yourself searching for [tag]WildCard[/tag] and find out what part of the term you need to highlight.
This is not possible with Solr. What you need to do is change the characters Solr uses to highlight found words to something you can easily remove (maybe an html comment) and then build a highlight yourself. You are likely already storing your query in a variable, so just scan the search return docs and highlight all instances of your query.
I'm very new with Solr,
And I really want a step by step to have my Solr search result like the google one.
To give you an idea, when you search 'PHP' in http://wiki.apache.org/solr/FindPage , the word 'php' shows up in bold .. This is the same result I want to have.
Showing only a parser even if the pdf is a very huge one.
You can use highlighting to show a matching snippet in the results.
http://wiki.apache.org/solr/HighlightingParameters
By default, it will wrap matching words with <em> tags, but you can change this by setting the hl.simple.pre/hl.simple.post parameters.
You may be looking at the wrong part of the returned data. Try looking at the 'highlighting' component of the returned data structure (i.e. don't look at the response docs). This should give you the snippets you want.