Find total number of terms indexed in a particular document solr - solr

I have read extensively about solr and it gives me the ability to find termfreq i.e. the number of times the required text is available in the document. But I need to know the total number of terms that have been indexed in a particular document. The query I am trying is
/solr/live/select?qt=albumsearch&q=pak%20pak&fl=%2Cscore&wt=json&indent=true&defType=edismax&q.alt=as&qf=a%5E10+l%5E10&bf=12234&boost=termfreq(song,.)
Any help will be appreciated.

You can use either the Luke Request Handler with a docId parameter, or use the Stats Component with a query / fq that results the document you're interested in.

Related

How to debug the score value calculated for a document and query string which is displayed on solr admin console?

I have indexed few documents and now while trying to query a string from solr admin console, I am able to retrieve score value for each result retrieved by selecting field as score. But I would need to check the doc score, termfreq and other parameters considered for calculating this score which can help me to debug and understand. Can anyone help me with the possible ways? Are there any certain keywords or fields or query parameters to be specified while querying from solr admin console? Solr Version that I am using is 7.6.0.
Add debug=all (the new version of debugQuery=true) to your query string. It'll include a detailed explanation of how each part of your query contributes to the score.

Solr Custom Boosting if a specific field matches the query

We are trying to implement a very interesting search logic with custom boosting and I am wondering if Solr can support this.
We have the following fields in our index:
Name
Description
Keywords (array)
Each keyword will have an amount(int value) paired to it.
A search is run across Name, description and keywords field. If a keyword matches the search text, the corresponding index must be boosted based on the amount of the matching keyword only.
I've read through Solr DisMax and they can only boost a field using a fixed amount.
My scenario will be to boost the result by X amount based on matching keywords only.
Thanks in advance
The only viable solution i see to this problem (assuming ofcourse you DO NOT know the number of keywords in advance) would be to just make the query as a filter query (to skip the scoring stage), get all documents matching ( a bit problematic), then just sort them on your side using the matched term to build the a java Comparator.
Problems may arise when you get a particularly large number of documents, but you could probably side step this issue by pagination
If you don't have too much different amounts maybe you can try this on index-time:
Store "keywords" in different fields(dynamicfields->boost-*) based on it's amount:
boost-1 = keyword1,keyword4,keyword6 <br/>
boost-10 = keyword2<br/>
boost-100 = keyword5
You can search across all your boost fields(edismax), boost every dynamicfield with his amount in your (e)dismax conf(boost-1^1,boost-10^10,boost-100^100).

Solr Fuzzy search in multiValued field with max distance between terms

Hello stackOverflowers
I have a field in a Solr document collection with a field called
names_txt - this is a multiValue="true" field.
This field contains all the names of the associated persons to a document
I want to be able to both do a fuzzy search and at the same time limit the number of terms between the to matching terms.
The query
names_txt:("markus foss"~2)
Will return all documents where you find the terms markus and foss where theres max 2 terms between them.
But when i search in a fuzzy way AND want to also specify the max number of terms between the matches, I cant get the syntax right.
The query:
names_txt:(markus~0.7 foss~0.7)
This does work, but returns false postives, since it will return a document with "markus something" in one value, and "foss somethingElse" in another.
What I would like to write is:
(markus~0.7 foss~0.7)~2
but this syntax is illegal in solr.
Anyone out there have a solution for my problem?
Since in one single query term Solr can either process a word distance restraint or a fuzzy search restraint, we will need two terms for this:
names_txt:("markus foss"~2) AND names_txt:(markus~0.7 foss~0.7)
Note that quantifying fuzzyness by a float number is deprecated. Internally, lucene converts converts the float number to an int between 0 and 2 anyway, so we should use this integer (Damereau Levenshtein) edit distance right from the beginning in our search terms. So my final proposal states:
names_txt:("markus foss"~2) AND names_txt:(markus~1 foss~1)
(For those who are interested: The deprecated, somewhat quirky function that converts the similarity float to an edit distance int can be found at the end of this code file.)
I think you could do that using SpanQuery The issue is that the usual query parsers in Solr dont support them. Look at this article that mentions those that support spans: Surround, Xml-Query-Parser and Qsol. But check the status of each in current solr version.

Solr get calculated distance while using dismax

I'm starting to think that what I want to do is not possible but thought I would give this a try.
I'm running Solr 3.5.
I currently have two types of search:
A basic spatial query which returns the calulated distance between two points in the score field.
Sample Query from my Solr logs:
?fl=*,score&sort=score+asc&start=0&q={!func}geodist()&sfield=coordinates&pt=59.2363514,18.092783&version=2
A dismax query which allows free text queries on a number of fields.
Sample Query from Solr log:
mm=1&d=100.0&sfield=coordinates&qf=field1^5.0+fields2^3.0&defType=edismax&version=2&fl=*,score&start=1&q=monkeyhopper&pt=59.2363514,18.0927830000&fq={!geofilt}}
I want to replace my first query with the dismax query but I really need to get the calculated distance in the response. Yes, I can calulate the distance programatically but I would prefer not having to do this as Solr has done it for me already.
I still want to be able to sort my dismax query "by relevance", distance or any other field so the score given by my boosts could be interesting for sorting but I don't need it to be returned.
If I understood correctly you want to have the result of a function in your Solr response. The SOLR-2444 issue is what you're looking for I guess: it allows to include in the fl parameter pseudo-fields, functions etc. The only problem is that it's been committed only on trunk, so it isn't available on the current Solr release, neither will be in the coming 3.6 release. You have to wait for the 4 release but I don't think it will take a lot of time. Maybe you can already start playing around with a snapshot of the last successful Jenkins build.
Pseudo-fields are now available in Solr 4+ which allow you to do just this.
http://localhost:8983/solr/collection1/browse?q=*:*&rows=1000&wt=xml&pt=37.763649,-122.24313&sfield=store&fl=dist:geodist()
For instance, this request allows me to return a field "dist" which contains the distance of each entry to the stated point.

Solr statistical information

Is that possible to get some kind of stats from solr. E.g. Most frequently used words (unigrams), or phrases (bi- trigrams)?
Take a look at the schema browser (e.g. http://localhost:8983/solr/admin/schema.jsp), it gives you the top terms for any given field. You can also access this information with the LukeRequestHandler (e.g. http://localhost:8983/solr/admin/luke).
The TermsComponent also gives you information about indexed terms in a field and the number of documents that match each term.
The StatsComponent gives you statististics about numeric fields.

Resources