Need help in constructing code for GAE Search Api - google-app-engine

I have created Document, added it to Index and used the GAE Search API to search for a text successfully. Please find the sample code below.
search.Document(
fields=[search.TextField(name='id', value=id),
search.TextField(name='search', value=searchT)])
options = search.QueryOptions(returned_fields=['id'])
results = search.Index(name=_D_INDEX_NAME).search(search.Query(searchTxt, options=options))
Now I am unable to understand to to achieve these mentioned below: Some sample code would be really appreciated.
To search for plural variants of an exact query, use the ~ operator:
~"car" # searches for "car" and "cars"
To build queries that reference specific fields, use both field and value in your query, separated by a colon:
field:value
field:"value as a string"

When you add a document, you specify its schema by defining the fields of the document. In your case id and search.
To search for a term that only appears in a specific field you use the notation field:term
search.Index(name=_D_INDEX_NAME).search('search:programming')
For searching plural variants of a term you use the operator ~
search.Index(name=_D_INDEX_NAME).search('~car')
Note however that this won't work in the dev_appserver.

Related

Solr OR query on a text field

How to perform a simple query on a text field with an OR condition? Something like name:ABC OR name:XYZ so the resulting set would contain only those docs where name is exactly "XYZ" or "ABC"
Dug tons of manuals, cannot figure this out.
I use Solr 5.5.0
Update: Upgraded to Solr 6.6.0, still cannot figure it out. Below are illustrations to demonstrate my issue:
This works:
This works too:
This still works:
But this does not! Omg why!?
There are many ways to perform OR query. Below I have listed some of them. You can select any of it.
[Simple Query]
q=name:(XYZ OR ABC)
[Lucene Query Parser]
q={!lucene q.op=OR df=name v="XYZ ABC"}
Your syntax is right, but what you're asking for isn't what text fields are made for. A text field is tokenized (split into multiple tokens), and each token is searched by itself. So if the text inserted is "ABC DEF GHI", it will be split into three separate tokens, namely "ABC", "DEF" and "GHI". So when you're searching field:ABC, you're really asking for any document that has the token "ABC" somewhere.
Since you want to perform an exact match, you want to query against a field that is defined as a string field, as this will keep the value verbatim (including casing, so the matching will be case sensitive). You can tell Solr to index the same content into multiple fields by adding a copyFile instruction, telling it to take the content submitted for field foo and also copying it into field bar, allowing you to perform both an exact match if needed and a more general search if necessary.
If you need to perform exact, but case insensitive, searches, you can use a KeywordTokenizer - the KeywordTokenizer does nothing, keeping the whole string as a single token, before allowing you to add filters to the analysis chain. By adding a LowercaseFilter you tell Solr to lowercase the string as well before storing it (or querying for it).
You can use the "Analysis" page under the Solr admin page to experiment and see how content for your field is being processed for each step.
After that querying as string_field:ABC OR string_field:XYZ should do what you want (or string_field:(ABC OR XYZ) or a few other ways to express the same.
A wacky workaround I've just come up with:

Solr highlighting gives field/snippets with ANY term, instead of those that satisfy the query fully

I'm using Solr 5.x, standard highlighter, and i'm getting snippets which matches even one of the search terms only, even if i indicate q.op=AND.
I need ONLY the fields and snippets that matches ALL the terms (unless i say q.op=OR or just omit it), i.e. the field/snippet must satisfy the query. Solr does return the field/snippet that has all the terms, but also return many others.
I'm using hl.fl=*, to get the only fields having the terms, and searching against the default field ('text' containing full doc). Need to use * since i have multiple dynamic fields. Most fields are 'text_general' type (for search and HL), and some are 'string' type for faceting.
If its not possible for snippets to have all the terms, i MUST get only the fields that satisfy the query fully (since the question is more talking about matching all the terms, but the search query can become arbitrarily complex, so the fields/snippets should match the query).
Also, next is to get snippets highlighted with proximity based search/terms. What should i do/use for this? The fields coming in highlighting in this scenario should also satisfy the proximity query (unlike i get a field that contain any term, without regard to proximity constrains and other query terms etc)
Thanks for your help.
I've also encountered the same problem with highlighting. In my case, the query like
(foo AND bar) OR eggs
highlighted eggs and foo despite bar was not present in the document. I didn't manage to come up with proper solution, however I devised a dirty workaround.
I use the following query:
id:highlighted_document_id AND text:(my_original_query)
with debugQuery set to true. Then I parse explain text for highlighted_document_id. The text contains the terms from the query, which have contributed to the score. The terms, which should not be highlighted, are not present in the explanation.
The Python regex expressions I use to extract the terms (valid for Solr 5.2.1):
term_regex = re.compile(r'weight\(text:(.+) in')
wildcard_term_regex = re.compile(r'text:(.+), product')
then I simply search the markings in the highlighted text and remove them if the term doesn't match against any of the term in term_regex and wildcard_term_regex.
The solution is probably pretty limited, but works for me.

Solr Suggester: Return multiple fields in response

I am using Solr version 3.5. I want to implement an auto-suggest feature in my application through the Suggester approach. http://wiki.apache.org/solr/Suggester.
Can someone please help me with the following:
How can i return more than one fields in the query response. For example, i am trying to create an index based on the 'name' field, but i also want to return an 'id' field where these two fields are the product attributes i am search for [say movie titles]. Hence, the response should include both the 'id' and 'title' of the product
How can i do a case-insensitive search using Suggester? For example, a search term "abc" should return documents containing the name as "ABC", "Abc" etc.
Please help.
Regards.
If you're looking to get suggestions on a particular field but also return other fields in the document, you can use the 'Payload' tag. Only one payload field is allowed, but you can get around this by using a json format in the field.
https://cwiki.apache.org/confluence/display/solr/Suggester
https://stackoverflow.com/a/32558487/578582
I think you're not quite getting the point of the suggester. It is not designed to return suggestions for exactly one search result per entry (this is the only scenario where returning the ID would make sense).
You could, however, do normal wildcard searches on the title field and use the returned titles as suggestions. This way you could also get the ID (and any other index field) with the results. I imagine this could be implemented fairly easily with jQuery UI. It may be much slower than the suggest API, depending on your index schema design.
if you are not really interested in the order of the suggestions i found that the weight_field can be [ab]used to return the document id for each suggestion

How to configure SOLR queries in the conf files

My requirement is simple.
I need to search with the keyword similar to SQL LIKE.
Now the search shows results for "words" rather than checking partial characters.
Ex:-
Search query: "test"
Expected results: "test%" - Which gives "test", "tested",
"testing", etc...
Actual result: "test"
I found many query suggestions for SOLR. But I need to find the exact mechanism to put that on conf xml files.
Thanks in advance.
The quick and dirty solution is to use wildcard in your search query using an asterisk (*). For example: test*
The more proper solution would be to use stemming to remove common word endings when you index and query the data. In the default schema, the text_en_splitting field type would do this for you. Just define your field as text_en_splitting.
Are you building auto-complete?
If so, use Suggester. It's part of Solr, and it does what you're talking about extremely efficiently using either a dictionary file, or a field in your index you've designated.
http://wiki.apache.org/solr/Suggester

Solr Index appears to be valid - but returns no results

Solr newbie here.
I have created a Solr index and write a whole bunch of docs into it. I can see
from the Solr admin page that the docs exist and the schema is fine as well.
But when I perform a search using a test keyword I do not get any results back.
On entering * : *
into the query (in Solr admin page) I get all the results.
However, when I enter any other query (e.g. a term or phrase) I get no results.
I have verified that the field being queried is Indexed and contains the values I am searching for.
So I am confused what I am doing wrong.
Probably you don't have a <defaultSearchField> correctly set up. See this question.
Another possibility: your field is of type string instead of text. String fields, in contrast to text fields, are not analyzed, but stored and indexed verbatim.
I had the same issue with a new setup of Solr 8. The accepted answer is not valid anymore, because the <defaultSearchField> configuration will be deprecated.
As I found no answer to why Solr does not return results from any fields despite being indexed, I consulted the query documentation. What I found is the DisMax query parser:
The DisMax query parser is designed to process simple phrases (without complex syntax) entered by users and to search for individual terms across several fields using different weighting (boosts) based on the significance of each field. Additional options enable users to influence the score based on rules specific to each use case (independent of user input).
In contrast, the default Lucene parser only speaks about searching one field. So I gave DisMax a try and it worked very well!
Query example:
http://localhost:8983/solr/techproducts/select?defType=dismax&q=video
You can also specify which fields to search exactly to prevent unwanted side effects. Multiple fields are separated by spaces which translate to + in URLs:
http://localhost:8983/solr/techproducts/select?defType=dismax&q=video&qf=features+text
Last but not least, give the fields a weight:
http://localhost:8983/solr/techproducts/select?defType=dismax&q=video&qf=features^20.0+text^0.3
If you are using pysolr like I do, you can add those parameters to your search request like this:
results = solr.search('search term', **{
'defType': 'dismax',
'qf': 'features text'
})
In my case the problem was the format of the query. It seems that my setup, by default, was looking and an exact match to the entire value of the field. So, in order to get results if I was searching for the sit I had to query *sit*, i.e. use wildcards to get the expected result.
With solr 4, I had to solve this as per Mauricio's answer by defining type="text_en" to the field.
With solr 6, use text_general.

Resources