Solr More Like This (MLT) using a different unique identifier than the default one id - solr

I m trying to use MLT but I have as unique identifier doc_id instead of id and if I do this :
http://localhost:8983/solr/mlt/?q=doc_id:question#11 I have no results
where If I do this
http://localhost:8983/solr/mlt/?q=id:11 I have results
<requestHandler name="/mlt" class="solr.MoreLikeThisHandler">
<lst name="defaults">
<str name="mlt.fl">title,text</str>
<str name="mlt.mintf">1</str>
<str name="mlt.mindf">2</str>
<str name="mlt.minwl">2</str>
<str name="mlt.boost">true</str>
<int name="rows">5</int>
<str name="fl">id,doc_id,title,content_type,user_id,topic_id,score</str>
</lst>
</requestHandler>
How can I use MLT with doc_id as my unique identifier ?

What you have looks fine. MLT just users the query to find a doc and if found use that doc for the source document. Are you sure a document is returned with the query "doc_id:question#11". Put the value in quotes and see if that get you the document back, ex. doc_id:"question#11". What is the datatype for doc_id?

Related

Solr query with repeating parameter

I try to "extend" one of the examples from the Solr Query Parsing presentation ( http://www.slideshare.net/erikhatcher/sa-22830939 ). I'd like to extend it in a way that I'm able to retrieve multiple solr documents at once with a querystring which http://.../solr/docs?id=1&id=2&id=3
The original configuration for the requestHandler looks like this:
<requestHandler name="/docs" class="solr.SearchHandler">
<lst name="defaults">
<str name="q">{!term f=id v=$id}</str>
</lst>
<arr name="components">
<str>query</str>
<str>highlight</str>
<str>debug</str>
</arr>
</requestHandler>
But this works only for a single id parameter ( http://.../solr/docs?id=1 ) - which query parser or configuration would I have to use to match it against multiple id parameters?
Thanks for your help.

Facet Name Based on the result in Solr

I want to find the name of the facets out put based on the results i mean if i have the out put as
<lst name="facet_fields">
<lst name="state">
<int name="kerala">3312</int>
<int name="andaman">10</int>
<int name="andhra">0</int>
<int name="arunachal">0</int>
<int name="assam">0</int>
</lst>
</lst>
i want the result of output as kerala,andaman as both of them having the count > 0
is there any possibility,please help me on this
I guess you want to specify the minimum count as 1 in your query. It can be achieved using facet.mincount

Solr: Facet one field with two outputs

I'm using Solr for indexing products and organising them into several categories. Each document has a taxon_names multi value field, where the categories are stored as human readable strings for a product.
Now I want to fetch all the categories from Solr and display them with clickable links to the user, without hitting the database again. At index time, I get the permalinks for every category from the MySQL database, which is stored as a multi value field taxon_permalinks. For generating the links to the products, I need the human readable format of the category and its permalink (otherwise you would have such ugly URLs in your browser, when just using the plain human readable name of the category, e.g. %20 for space).
When I do a facet search with http://localhost:8982/solr/default/select?q=*%3A*&rows=0&wt=xml&facet=true&facet.field=taxon_names, I get a list of human readable taxons with its counts. Based on this list, I want to create the links, so that I don't have to hit the database again.
So, is it possible to retrieve the matching permalinks from Solr for the different categories? For example, I get a XML like this:
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
</lst>
<result name="response" numFound="6580" start="0"/>
<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields">
<lst name="taxon_names">
<int name="Books">2831</int>
<int name="Music">984</int>
...
</lst>
</result>
And inside the taxon_names array I would need the name of the permalink.
Maybe it's possible by defining a custom field type in the config XMLs. But for this, I don't have enough experience with Solr.
Since it appears from your description that you are faceting permalink in the taxon_permalink field and the values in that field should correspond to the same category names in the taxon_names field. Solr allows you to facet on multiple fields, so you can just facet on both fields and walk the two facet results grabbing the display name from the taxon_names facet values and the permalink from the taxon_permalink facet values.
Query:
http://localhost:8982/solr/default/selectq=*%3A*&rows=0&wt=xml
&facet=true&facet.field=taxon_names&facet.field=taxon_permalink
Your output should then look like similar to the following:
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
</lst>
<result name="response" numFound="6580" start="0"/>
<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields">
<lst name="taxon_names">
<int name="Books">2831</int>
<int name="Music">984</int>
...
</lst>
<lst name="taxon_permalink">
<int name="permalink1">2831</int>
<int name="permalink2">984</int>
...
</lst>
</result>

Solr : How can I group on two different fields?

My schema is like :
product_id
category_id
A category contains products.
In solr 3.6, I group results on category_id and it works well.
I just added a new field:
group_id
A group contains products that vary on size or color.
Example: shoes in blue, red and yellow are 3 differents products and have the same group_id.
Additionally to the result grouping on field category_id, I would like to have in my results only one product for a group_id, assuming group_id can be null (for products that aren't part of a group).
To follow the example of the shoes, it means that for the request "shoe", only one of the 3 products should be in results.
I thought to do a second result grouping on group_id, but I doesn't seem possible to do that way.
Any idea?
EDIT : For now, i process the results in php to delete documents that have a group_id that is already in the results. I leave this subject open, in case someone finds how to group on 2 fields
If your aim is to get grouping counts based on multiple "group by" fields, you can use pivot faceting to achieve this.
&facet.pivot=category_id,group_id
Solr will give you back a hierarchy of grouped result counts, following the page of search results, under the facet_pivot element.
http://wiki.apache.org/solr/SimpleFacetParameters?highlight=%28pivot%29#Pivot_.28ie_Decision_Tree.29_Faceting
It is not possible to group by query on two fields.
If you need count then you can use facet.field(For single field) or facet.pivot(For multiple field).
It is not actually group but you can get count of that group for multiple field.
Example Output:
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<bool name="zkConnected">true</bool>
<int name="status">0</int>
<int name="QTime">306</int>
</lst>
<result name="response" numFound="667" start="0" maxScore="0.70710677">
<doc>
<int name="idField">7393</int>
<int name="field_one">12</int>
</doc>
</result>
<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields"/>
<lst name="facet_ranges"/>
<lst name="facet_intervals"/>
<lst name="facet_heatmaps"/>
<lst name="facet_pivot">
<arr name="field_one,field_two">
<lst>
<str name="field">field_one</str>
<int name="value">3</int>
<int name="count">562</int>
<arr name="pivot">
<lst>
<str name="field">field_two</str>
<bool name="value">true</bool>
<int name="count">347</int>
</lst>
<lst>
<str name="field">field_two</str>
<bool name="value">false</bool>
<int name="count">215</int>
</lst>
</arr>
</lst>
<lst>
<str name="field">field_one</str>
<int name="value">12</int>
<int name="count">105</int>
<arr name="pivot">
<lst>
<str name="field">field_two</str>
<bool name="value">true</bool>
<int name="count">97</int>
</lst>
<lst>
<str name="field">field_two</str>
<bool name="value">false</bool>
<int name="count">8</int>
</lst>
</arr>
</lst>
</arr>
</lst>
</lst>
</response>
Example Query :
http://192.168.100.145:7983/solr/<collection>/select?facet.pivot=field_one,field_two&facet=on&fl=idField,field_one&indent=on&q=field_one:(3%2012)&rows=1&wt=xml
if you can change the data that you are posting to solr, then I suggest that you create a string field which will have a concatenation of category_id and group_id. For example, if the category_id = 5 and group_id=2, then your string field can be :- '5,2' (using ',' or any other character as a delimiter). You can then group on this string field.

Can I restrict the search to a specific date range?

I want to get all results AFTER a given date, can you do this with solr?
(http://lucene.apache.org/solr/)
Right now the results are search the entire result set, I want to filter for anything after a given date.
Update
This isn't working for me yet.
My returned doc:
trying:
http://www.example.com:8085/solr/select/?q=test&version=2.2&start=0&rows=10&indent=on&indexed_at:2009-08-27T13%3A15%3A27.73Z
<doc>
<str name="apptype">Forum</str>
<str name="collapse">forum:334</str>
<str name="content"> testing </str>
<str name="contentid">357</str>
<str name="createdby">some_user</str>
<str name="date">20090819</str>
<str name="dummy_id">1</str>
<int name="group">5</int>
<date name="indexed_at">2009-08-25T16:48:45.121Z</date>
<str name="rating">000.0</str>
<str name="rawcontent"><p>testing</p></str>
−
<arr name="roles">
<str>1</str>
<str>2</str>
<str>3</str>
<str>4</str>
<str>14</str>
<str>15</str>
<str>16</str>
</arr>
<int name="section">79</int>
<int name="thread">334</int>
<str name="title">testing</str>
<str name="titlesort">testing</str>
<str name="type">forum</str>
−
<str name="unique_id">
BLAHBLAH|357
</str>
<str name="url">/blahey/f/79/p/334/357.aspx#357</str>
<str name="user">21625</str>
<str name="username">some_user</str>
</doc>
Yes you can I assume you have a field with the date value you want to filter on. Then you do
yourdatefield:[2008-08-27T23:59:59.999Z TO *]
a sample url would be localhost:8983/solr/select?q=yourdatefield:[2008-08-27T23:59:59.999Z TO *]
you want to submit the date part as a query so in the value of q like
localhost:8983/solr/select/q=(text:test+AND+indexed_at:`[2009-08-27T13:A15:A27.73Z TO *`])
So the entire query is contained within the q querystring paramter.
the format of the date is ISO 8601.
You can add a automatic timestamp to the documents as they are indexed using:
<field name="timestamp" type="date" indexed="true" stored="true" default="NOW" multiValued="false"/>
in the schema.xml. The default schema has this commented out so if you copied the default, you just need to uncomment it.
You could add that and use olle's suggested search pattern to find the documents indexed after a certain date. (You'd have to update yourdatefield with timestamp or whatever you name the field in the xml.
You will need to create a query that compares dates, here is the syntax for queries:
http://wiki.apache.org/solr/SolrQuerySyntax
And here is how you can make date comparisons in the query:
http://lucene.apache.org/solr/api/org/apache/solr/util/DateMathParser.html

Resources