Solr Schemaless mode Query to match any word in a [ Text_general] field type - solr

I'm trying to Search solr query in schemaless mode. but a matching document was not found in the result.
q: i m from india
numResult: 0
q: parapfa551d3aef764ddca9e6e421fe8d50e8:i m from india
numResult: 1
my data(600+) set in solr all fields of document dynamic added using solr schemaless mode. see my solr document.
the first query I tried to run in solr schemaless mode. but numResult get zero.[solr standalone mode query working. but dynamic field not added in solr.]
can I best matching the document found in solr schemaless mode.?
{
"id":"d9263e11",
"titleh4cd06d47basdsa6d14ed8838a123":["User _ name"],
"parapfa551d3aef764ddca9e6e421fe8d50e8":[" My name is XYZ "],
"parapffe001011d4346ad9ce9edb67b3b85e4":[" i m from USA ...."],
"_version_":1748577992052834304},
{
"id":"d9263e20",
"titleh4cd06d47b6d14ed883842ae4cedab224":[" User_name "],
"parap759981766b644e229bda2b0cc5bd0bd9":[" my name is ...."],
"parapfa551d3aef764ddca9e6e421fe8d50e8":[" i m from INDIA"],
"_version_":1748577992544616448},
{
"id":"d9263e45",
"titlehdd4a37c0b21e4d9bbd03a56ba0120f01":[" User_name"],
"parapa2aa4798c7fc44aab5e4f6447c529f83":["my name is .... "],
"parap8ee9090e8e054d78b8dc7ff06a7fb702":[" i m from Germany"],
"_version_":13204902384923489909}
I'm trying to best match the document found in solr schemaless mode.

Your first query
q: i m from india
does not have a field specified to search on and therefore Solr will use on a default field (usually _text_) when searching. I suspect your index is probably not populating this default field and hence there is no match.
Your second query
q: parapfa551d3aef764ddca9e6e421fe8d50e8:i m from india
is searching for the string in the parapfa551d3aef764ddca9e6e421fe8d50e8 field, and in this case the match is found.
You can use Solr's debugQuery parameter to see how Solr handles each of these searches on your particular configuration.

Related

Elements getting added in Solr index but not able to search elements as desired

I'm working with solr to store web crawling search results to be used in a search engine. The structure of my documents in solr is the following:
{
word: The word received after tokenizing the body obtained from the html.
url: The url where this word was found.
frequency: The no. of times the word was found in the url.
}
When I go the Solr dashboard on my system, which is http://localhost:8983/solr/#/CrawlerSearchResults/query I'm able to find a word say "Amazon" with the query "word: Amazon" but on directly searching for Amazon I get no results. Could you please help me out with this issue ?
Image links below.
First case
Second case (No results)
Thanks,
Nilesh.
In your second example, the value is searched against the default search field (since you haven't provided a field name). This is by default a field named _text_.
To support just typing a query into the q parameter without field names, you can either set the default field name to search in with df=wordin your URL, or use the edismax query parser (defType=edismax) and the qf parameter (query fields). qf allows multiple fields and giving them a weight, but in your case it'd just be qf=word.
Second - what you're doing seems to replicate what Lucene is doing internally, so I'm not sure why you'd do it this way (each word is what's called a "token", and each count is what's called a term frequency). You can write a custom similarity to add custom scoring based on these parameters.

QueryingSolr : getRequestHandler returns result but not selectHandler

I have id field in solr that uniquely identifies a solr document
When querying solr using getHandler :
solr/{collection}/get?id=p_1266762970&fl=*
Result:
"doc":
{
"lastIndexed":"2014-12-25T09:48:56.509Z",
"id":"1266762970",
"solrId":"p_1266762970",
.....
}
But when querying using solr admin - selectHandler - no documents are returned.
Solr query looks like:
solr/{collection}/select?q=:&fq=solrId:p_1266762970
solr/{collection}/select?q=:&fq=id:1266762970
I tried doing a hard commit and it returned successfully but still the same results
I have other documents in solr as well that shows up correct results.This issue exist for some of the ids (8 out of 2.3 million) only.
Updated: UniqueKey is
<uniqueKey>solrId</uniqueKey>

Not use wildcard when query solr string field value

My question is I found when I use wildcard in solr query, it will be very slow. So I don't want to use the wildcard(*) query, for example, I want to search the mail address (gosling.abc#gmail.com), if I input 'abc' keywords, my q parameter maybe like q:abc
How should I do with this condition?
I am newbie in solr, can anyone help me?
(Currently, I have 10,937,547 document in my solr cloud. My solr version is 4.1)

Solr Ngram Synonyms Dismax

I have ngram-indexed 2 fields (columns in the database) and the third one is my full text field. Now my default text field is the full text field and while querying I use dismax handler and specify in it both the ngrammed field with certain boost values and also full text field with a certain boost value.
Problem for me if I dont use dismax and just search full text field(i.e. default field specified in schema) synonyms work correctly i.e. ca returns all results where california is there whereas if i use dismax ca is also searched in the ngrammed fields and return partial matches of the word ca and does not go at all in the synonym part.
I want to use synonyms in every case so how should I go about it?
Ensure you already correctly configured the "SynonymFilterFactory" filter in your ngram field's query analyzer.
If still doesn't work, the Solr admin's analysis interface can give more details of the tokenize/filter procedures, through which can check if the Synonym part already works as expected.

how to Index URL in SOLR so I can boost results after website

I have thousands of documents indexed in my SOLR which represents data crawled from different websites. One of the fields of a document is SourceURL which contains the url of a webpage that I crawled and indexed into this Document.
I want to boost results from a specific website using boost query.
For example I have 4 documents each containing in SourceURL the following data
https://meta.stackoverflow.com/page1
http://www.stackoverflow.com/page2
https://stackoverflow.com/page3
https://stackexchange.com/page1
I want to boost all results that are from stackoverflow.com, and not subdomains (in this case result 2 and 3 ).
Do you know how can I index the url field and then use boost query to identify all the documents from a specific website like in the case above ?
One way would be to parse the url prior to index time and specify if it is a primary domain ( primarydomain boolean field in your schema.xml file for example).
Then you can boost the primarydomain field in your query results. See using the DisMaxQParserPlugin from the Solr Wiki for an example on how to boost fields at query time.

Resources