SOLR term frequency - solr

I am using solr, and so far everything is going great. When I do a search, I want to get back how many times the search 'term' was per document, along with the document itself. I have found alot of information but after going trough it I still don't understand how I can do this. Is it that extreme hard ?
Can anyone help me out?
Altough I do get results, the fl field is always 0
http://localhost:8983/solr/collection1/select?q=text:*mySearchTerm*&fl=*,fl:termfreq(text,*mySearchTerm*)

Ok, I found out that termfreq doesn't work for MultiValues fields. So I used a copy field and added termVectors="true" termPositions="true" termOffsets="true". Now it works

You might want to check out http://wiki.apache.org/solr/LukeRequestHandler - You should see the tag "numTerms" in the xml returned for you to use based on each field where required too.

Related

Solr query wildcard problem, mismatch in results number vs real document count

Ok, so the problem is that I get some crazy results using solr query from the admin console.
I try to search for some documents which have an alfresco property with a specific name.
The field name is "edm:uid"
So if I try to pass to "q" the following:
edm:uid:FULL_NAME_OF_THE_DOCUMENT
everything works perfectlly.
But if I try to use wildcards everythig breaks.
If I query for example "edm:uid:DOC_01_20190202*", I get let's say 5000 results, everything might be good. If I query "edm:uid:DOC_01*", I get around 1000 result, which I find crazy, as I remove from the matching, the resulting number should increase. If I query "edm:uid:DOC*" I get still around 1000 results, and I should have millions.
I really don't know how solr works, if anyone knows why this happens?
I tried several versions too and doesn't change the results, versions like:
edm:uid:"DOC*"
edm:uid:DOC*
edm:uid:"DOC*"
so tried to put quotes to value, or escape ":" or both didn't change anything.
Also, I found the schema with the fields, and that "edm:uid" is indexed and tokenized.
I also ticked the "debugQuery" option, but I don't understand anything there, just some scores.
Thanks in advance for any suggestions.

Extending Solr Tutorial with custom fields/core

After standing up a basic jetty Solr example. I've tried to make my own core to represent the data my company will be seeing. I made a directory structure with conf and data directories and copied core.properties, schema.xml, and solrconfig.xml from the collection1 example.
I've editted core.properties to change the core name, and I've added 31 fields (most of type text_general, indexed, stored, not required or multivalued) to the schema.
I'm pretty sure I've set it up correctly as I can see my core in the admin page drop down and interact with it. The problem is, when I feed a document designed for the new fields, I cannot get a successful query for any of the values. I believe the data is fed as I got the same command line response:
"POSTing file incidents.xml...
1 file indexed. ....
COMMITting..."
I thought, the Indexing process took more time, but when I copy a field node out of an example doc (e.g <field name="name">Apple 60 GB iPod with Video Playback Black</field> from ipod_video.xml) into a copy of my file (incidents2.xml) searches on any of those strings instantly succeed.
The best example of my issue is both files have the field:
<field name="Brand" type="text_general" indexed="true" stored="true" required="false" multiValued="false"/>
<field name="Brand">APPLE</field>
However, only the second document (with the aforementioned name field) is returned with a query for apple.
Thanks for reading this far; my questions are:
1) Is there a way to dump the analysis/tokenization phase of document ingestion? Either I don't understand it or the Analysis tab isn't designed for this. The debugQuery=true parameter gives relevance score data but no explanation of why a document was excluded.
2) Once I solve my overall issue, I we would like to have large text fields included in the index, can I wrap long form text in CDATA blocks in solr?
Thanks again.
To debug any query issues in Solr, there's a few useful things to check. You might also want to add the output of your analysis page and the field you're having issues with from your schema.xml to your question. It's also a good idea to have a smaller core to work with (use three or four fields just to get started and get it to work) when trying to debug any indexing issues.
Are the documents actually in the index? - Perform a search for : (q=*:*) to make sure that there are any documents present in the index. *:* is a shortcut that means "give me all documents regardless of value". If there are no documents returned, there is no content in the index, and any attempt to search it will give zero results.
Check the logs - Make sure that SolrLogging is set up, so you get any errors thrown in your log. That way you can see if there's anything in particular going wrong when the query or indexing is taking place, something which would result in the query never being performed or any documents being added to the index.
Use the Analysis page - If you have documents in the index, but they're not returned for the queries you're making, select the field you're querying at the analysis page and add both the value given when indexing (in the index column) and the value used when querying (in the query field). The page will then generate all the steps taken both when indexing and querying, and show you the token stream at each step. If the tokens match, they will be highlighted with a different background color, and depending on your setting, you might require all tokens present on the query side to be present on the indexing side (i.e. every token AND-ed together). Start with searching for a single token on the query side for that reason.
If you still doesn't have any hits, but have the documents in the index, be more specific. :-)
And yes, you can use CDATA.

Solr single query with AND and OR option

[Sorry if it is a duplicate Question]
I wanted to extract results from solr which obeys multiple conditions on the same field, with both AND and OR operations in it. Is it possible to do something like this
q=_word:* AND _link:0 OR !_link:*
If I query this I am not getting any response.
Can anyone help me to achieve what I want? If possible share a link, I searched for it but I am not able to find how to solve this.
(_word:* AND !_link:*) OR (_word:* AND _link:0)
This works

/select with 'q' parameter does not work

Whenever i query with q=: it shows all the documents but when i query with q=programmer 0 docs found.(contents is the default search field)
my schema has: id(unique),author,title,contents fields
Also query works fine for:
q=author:"Value" or q=title:"my book" etc, only for contents field no results.
Also when i query using spell checker(/spell?q=programmer) output shows spelling suggestions for this word,when 'programmer' is the right word and present in many documents.
I referred the example docs for configurations.
All of a sudden i am getting this,initially it worked fine.
I guess there some problem only in the contents field,but cannot figure it out.
Is it because indexes are not created properly for contents field?
(I am using solr 4.2 on Windows 7 with tomcat as webserver)
Please help.Thanks a lot in advance.
Are you sure you set the default search field? The reason you have this problem might be because you didn't set the <defaultSearchField> field in your schema.xml file. This is why "q=author:value" works while q=WHATEVER doesn't.
The Is used by Solr when parsing queries to
identify which field name should be searched in queries where an
explicit field name has not been used.
But also consider this:
The is used by Solr when parsing queries to
identify which field name should be searched in queries where an
explicit field name has not been used. It is preferable to not use or
rely on this setting; instead the request handler or query LocalParams
for a search should specify the default field(s) to search on. This
setting here can be omitted and it is being considered for
deprecation.
Do you have any data in your instance. try q=*:* and see what it returns. "for" is a stop word, may be it was filtered out. Look for something else as value to test.

field listing in solr with "fl" parameter for a field having space in between

I have a field in my solr schema as "Post Date"(exclude the quotes). when i fire a query with "fl" (field list) parameter in order to view only Post Date of search results, since this field contains a space I am not getting anything in the docs responses. I tried using +, %20 but still i get no results. Please help.
I would like to inform that i have found a solution to this. I tried experimenting and hence came up with a solution on putting \+ as the substitute for white space in the query. Hence the query should be Post\+Date:[ranges]
I couldnt aford to change my schema as many teams are depending on it and we are upgrading our system to a new search engine.
You can specify (what Solr deems crazy) fields by wrapping them like this:
field(Post Date)
This actually changes the returned results fieldname too so you'll get back something like:
"field(Post Date)" : "2010-01-01"
And not just the name as you might imagine.
As a possible workaround, you might be able to use a wild card to achieve your results. Using the solr wiki http://wiki.apache.org/solr/CommonQueryParameters#glob you may be able to specify fl=Post*Date which would possibly get around your problem. I have not verified this but it might work.
Update: This doesn't seem to work on either version of solr I tried (1.4.0 and 3.6.1). Looks like this may have been discussed at http://wiki.apache.org/solr/FieldAliasesAndGlobsInParams but it does not appear to be implemented.

Resources