Querying Solr without specifying field names - solr

I'm new to using Solr, and I must be missing something.
I didn't touch much in the example schema yet, and I imported some sample data. I also set up LocalSolr, and that seems to be working well.
My issue is just with querying Solr in general. I have a document where the name field is set to tom. I keep looking at the config files, and I just can't figure out where I'm going awry. A bunch of fields are indexed and stored, and I can see the values in the admin, but I can't get querying to work properly. I've tried various queries (http://server.com/solr/select/?q=value), and here are the results:
**Query:** ?q=tom
**Result:** No results
**Query:** q=\*:\*
**Result:** 10 docs returned
**Query:** ?q=*:tom
**Result:** No results
**Query:** ?q=name:tom
**Result:** 1 result (the doc with name : tom)
I want to get the first case (?q=tom) working. Any input on what might be going wrong, and how I can correct it, would be appreciated.

Set <defaultSearchField> to name in your schema.xml
The <defaultSearchField> Is used by
Solr when parsing queries to identify
which field name should be searched in
queries where an explicit field name
has not been used.
You might also want to check out (e)dismax instead.

I just came across to a similar problem... Namely I have defined multiple fields (that did not exist in the schema.xml) to describe my documents, and want to search/query on the multiple fields of the document, not only one of them (like the "name" in the above mentioned example).
In order to achieve this, I have created a new field ("compoundfield"), where I then put/copyField my defined fields (just like the "text" field on the schema.xml document that comes with Solr distribution). This results in something like this:
coumpoundfield definition:
<field name="compoundfield" type="text_general" indexed="true" stored="false" multiValued="true"/>
defaultSearchField:
<!-- field for the QueryParser to use when an explicit fieldname is absent -->
<defaultSearchField>compoundfield</defaultSearchField>
<!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
<solrQueryParser defaultOperator="OR"/>
<!-- copyField commands copy one field to another at the time a document
is added to the index. It's used either to index the same field differently,
or to add multiple fields to the same field for easier/faster searching. -->
<!-- ADDED Fields -->
<copyField source="field1" dest="compoundfield"/>
<copyField source="field2" dest="compoundfield"/>
<copyField source="field3" dest="compoundfield"/>
This works fine for me, but I am not sure if this is the best way to make such a "multiple field" search...
Cheers!

It seems that a DisMax parser
is the right thing to use for this end.
Related stackoverflow thread here.

The current solution is deprecated in newer versions of lucene/solr. To change the default search field either use the df parameter or change the field that is in:
<initParams
path="/update/**,/query,/select,/tvrh,/elevate,/spell,/browse">
<lst name="defaults">
<str name="df">default_field</str>
</lst>
</initParams>
inside the solrconfig.xml
Note I am using a non-managed schema and solr 7.0.0 at the time of writing

Going through the solr tutorial is definitely worth your time:
http://lucene.apache.org/solr/tutorial.html
My guess is that the "name" field is not indexed, so you can't search on it. You'd need to change your schema to make it indexed.
Also make sure that your XML actually lines up with the schema. So if you are adding a field named "name" in the xml, but the schema doesn't know about it, then Solr will just ignore that field (ie it won't be "stored" or "indexed").
Good luck

Well, despite of setting a default search field is quite usefull i don't understand why don't you just use the solr query syntax:
......./?q=name:tom
or
......./?q=:&fq=name:tom

Related

Indexing PDF files with Solr 6.6 while allowing highlighting matched text with context

I am new to Solr and I need to implement a full-text search of some PDF files. The indexing part works out of the box by using bin/post. I can see search results in the admin UI given some queries, though without the matched texts and the context.
Now I am reading this post for the highlighting part. It is for an older version of Solr when managed schema was not available. Before fully understand what it is doing I have some questions:
He defined two fields:
<field name="content" type="text_general" indexed="false" stored="true" multiValued="false"/>
<field name="text" type="text_general" indexed="true" stored="false" multiValued="true"/>
But why are there two fields needed? Can I define a field
<field name="content" type="text_general" indexed="true" stored="true" multiValued="true"/>
to capture the full text?
How are the fields filled? I don't see relevant information in TikaEntityProcessor's documentation. The current text extractor should already be Tika (I can see
"x_parsed_by":
["org.apache.tika.parser.DefaultParser","org.apache.tika.parser.pdf.PDFParser"]
in the returned JSON of some query). But even I define the fields as he said I cannot see them in the search results as keys in JSON.
The _text_ field seems a concatenation of other fields, does it contain the full text? Though it does not seem to be accessible by default.
To be brief, using The Elements of
Statistical Learning as an example, how to highlight the relevant texts for the query "SVM"? And if changing the file name into "The Elements of Statistical Learning - Trevor Hastie.pdf" and post it, how to highlight "Trevor Hastie" for the query "id:Trevor Hastie"?
Before I get started on the questions let me just give a brief how solr works. Solr in its core uses lucene when simply put is a matching engine. It creates inverted indexes of document with the phrases. What this means is for each phrase it has a list of documents which makes it so fast. Getting to your questions:
Solr does not convert your pdf to text,well its the update processor configured in the handler which does it ,again this can be configured in solrconfig.xml or write your own handler here.
Coming back why are there two fields. To simply put the first one(content) is a stored field which stores the data as it is. And the second one is a copyfield which copies the data for each document as per the configuration in schema.xml.
We do this because we can then choose the indexing strategy such as we add a lowercase filter factory to text field so that everything is indexed in lower case. Then "Sam" and "sam" when searched returns the same results.Or remove certain common occurring words such as "a","the" which will unnecessarily increase your index size. Which uses a lot of memory when you are dealing with millions of records, then you want to be careful which fields to index to better utilise the resources.
The field "text" is a copyfield which copies data from certain fields as mentioned in the schema to text field. Then when searching in general one does not need to fire multiple queries for each field. As everything thing is copied into "text" field and you get the result. This is the reason it's "multivaled". As it can stores an array of data. Content is a stored field and text is not,and opposite for indexed because when you return your result to the end user you show him what ever you saved not the stripped down data that you just did with the text field applying multiple filters(such as removing stop words and applying case filters,stemming etc).
This is the reason you do not see "text" field in the search result as this is used solr.
For highlighting see this.
For more these are some great blog yonik and joel.
Hope this helps. :)

Solr: how do I use dismax instead of using copyField?

I've been trying to figure this out for a bit now. If I create a schema without the directive:
<copyField source="*" dest="text" />
I can't seem to pull anything up. But when I add that directive, things magically appear. I'm trying my query with ?defType=dismax, but that doesn't seem to help.
Am I missing something? Do I need something special in my schema? I'm indexing all the fields I need to search against.
Thoughts?
Thanks!
If you use defType=lucene you need to specify the field before your search query like this:
q=title:test
If you don't specify a field solr will use the default field specified in solrconfig.xml. This field is text by default. As all the fields are copied to text the search works well.
If you decide to use dismax the query structure changes. You need to put your search term like that:
q=test
and specify the fields to search in other parameter like that:
<str name="qf">field1 field2</str>
Where field1 and field2 are the fields you want to search the terms.

Solr dynamicField not searched in query without field name

I'm experimenting with the Example database in Solr 4.10 and not understanding how dynamicFields work. The schema defines
dynamicField name="*_s" type="string" indexed="true" stored="true"
If I add a new item with a new field name (say "example_s":"goober" in JSON format), a query like
?q=goober
returns no matches, while
?q=example_s:goober
will find the match. What am I missing?
I would like to see the SearchHandler from solrconfig.xml file that you are using to execute the above mentioned query.
In SearchHandler we generally have Default Query Field i.e. qf parameter.
Check that your dynamic field example_s is present in that query field list of solrconfig file else you can pass it while sending query to search handler.
Hope this will help you in resolving your problem.
If you are using the default schema, here's what's happening:
You are probably using default end-point (/select), so you get the definition of search type and parameters from that. Which means, it is default (lucene) search and the field searched is text.
The text field is an aggregate and is populated by copyField instruction from other fields.
Your dynamic field definition for *_s allows you to index the text with any name ending in _s, such as example_s. It's indexed (so you could search against it directly) and stored (so you can see it when you ask for all fields). It will not however search it as a general text. Notice that (differently from ElasticSearch), Solr strings have to be matched fully and completely. If you have some multi-word text in it, there is barely any point searching it. "goober" is one word so it's not a very good example to understand the difference here.
The easiest solution for you is add another copyField instruction:
<copyField source="*_s" dest="text"/>, then all your *_s dynamic fields would also be searchable. But notice that the search analyzers will not be the ones for *_s definition, but the ones for the text field's definition, which is not string, but text_general, defined elsewhere in the file.
As to Solr vs. ElasticSearch, they both err on the different sides of magic. Solr makes you configure the system and makes it very easy to see the exact current configuration. ElasticSearch hides all of the configuration, but you have to rediscover it the second you want to change away from the default behaviour. In the end, the result is probably similar and meets somewhere in the middle.

Solr Search not working after dataimport successful

I am new in Solr. I have tried DataImport using a Oracle Database. The data gets successfully imported. When I try to search with query:
qt=standard
q=*
I get good results. But when I do a specific search, the results are empty showing no documents. The logger is empty and there are NO errors displayed.
Ok! I got it.
I observed that when I am using some pre-defined fields of schema.xml, the search on those fields are working fine. But when I defined some fields of my own, the result was still NOTHING.
Then I looked into "solr-config.xml's" "/select" request handler. There is a line
<str name="df">text</str>
which says that "txt" is the only field which is searchable. But then how does it searches the other fields?
Answer lies in "schema.xml's"
"<copyField>"
tag. The fields present by default are copied into "text" which makes them searchable. Hence if you want your defined field as searchable, just define your field and add it in copyField tag. ;)
TLDR Version: Define your fields as type="text" to start off. If you have a field called "product", add <field name="product" type="text" indexed="true" stored="true" /> to the default schema.xml inside the <fields> tag and you should be done. To search using the select request-handler, use q=<field_name>:<text_to_look_for> or q=*:* to show all documents.
There are a few mistakes you're making here. I'll be explaining using the 'select' request handler.
The format for a query is ?q=<field_name>:<text_to_look_for>. So if you want to return all the values matching all the fields, you'd say q=*:*
And if you were to look for the word "iPod" in the field "product" your query would be q=product:iPod
Another thing to keep in mind is that if in schema.xml, say if you specify the field product as type="string" which maps to class="solr.StrField", the query (<text_to_look_for>) should precisely match the value in the index, since Solr doesn't tokenize the StrField by default, i.e., ipod will not return results if your index holds it as iPod. If you need it to return it still, you could use the type="text" in schema.xml (the fieldType definition is present already in the default schema.xml.) The "text" fieldType has several analyzers(one analyzer ignores case) and tokenizers(tokenizer splits up the words in the field and indexes them so that if you search for a particular word, say "ipod", it would match the value "iPod 16GB White").
Regarding your own answer, the <str name="df">text</str> specifies the default field to search in, i.e, if you just said q=iPod, it would look in this field. The objective of this field called text is to hold all the other fields in the document, so that you could just search in this field and know that some or the other field in this document would match your query, thereby you wouldn't need to search in a specific field if you don't know what field you're expecting the value to be in.

Solr only search in default field

I can't search in other field than the default. With q=field:search doesn't return nothing (but documents exists obsviouly). Also didn't find it with defType=dismax.
In the schema.xml I have the field with indexed="true" and stored="true" just like the default one.
What I am missing ?
Thanks in advance.
So you are issuing a GET?
Have you checked your encoding the ':', as it should be %3A.
Without the right encoding you maybe trying to search the default field for 'field:search'.
Can you test your query in /solr/admin ?
just checking, if you changed your schema, you need to delete the data in /var/data and restart solr, otherwise you may be searching old not indexed stuff.
If you are using a DisMax search make sure that you have the fields you are searching added to the query fields parameter solr search. It might just be that you are declaring a qf parameter in your search statement and your field is missing from the list.
This may be a bit too plain, but did you - besides restarting your solr instance - remember reindex? Otherwise changes in the schema.xml won't apply.

Resources