Solr search on concatenated text - solr

I would like to make a Solr query, which would for data like
{ "date": ..., "project": ..., "text": ... }
do in order:
Filter them by date range
Group by project, so that I get single row per project with text concatenated
On top of this, do full-text search, with things like some word there is in the concatenated text or it isn't
As a result, I want to get projects for which related texts(for each project all texts concatenated from given date range) contain/don't contain some phrases (depending on a query parameter).
I need to have documents separated so I can filter them by date range, but after filtering I need to concatenate them by project field, so I can make a full-text search query on them as if I would keep whole text for given project together.
I was able to find that for related things it's possible to do something like:
&fq=date:[2013-07-17T00:00:00Z TO 2013-07-20T00:00:00Z]
-
&q=+text:mars-text:venus
I don't know how to do the 2., and how to do 3. so that it's applied to the concatenated texts (at the end). I found that there's some grouping feature but I don't know how to concatenate the text in each group, so I get single entry per group to apply 3. on it.
Is it possible to do such query in Solr? How should it be properly done if it's possible? If no, is it possible to do it effectively with something different than Solr?
Thanks for help.

Related

How to search documents in within date range but based on text format info in Solr?

All,
I wonder if there is a way can solve this problem that:
I have a lot of solr documents have field pub_date, but unfortunately with text format like"20180901", if I currently want to search by pub_date within a range, how should I achieve it in Solr query?
Thanks,
Assuming that all your fields have the same format, i.e. 20180901 - mapping to YYYYMMDD, you can use a regular range search. The lexiographic ordering of the strings will be the same as for the interval.
pub_date:[20180820 TO 20180901]
will give you any entries between (and including) those two dates. This assumes that there is no other formats used that cause other entries to be inserted in the sort order between those two values.

Multiple Full Text Search SQL Queries Merged and Scored (Ranked Search Results)

I have a bunch of articles in one table that I'd like to query for search results. Using Full Text Search I can return a list of items that have the search keywords "near" each other.
Full text search does not seem to allow thesaurus (FORMSOF) with the NEAR delimiter.
What I'd like to do, in SQL, is create a query, or a number of queries, which search the same data, in different ways, and return a score (or RANK if using Full Text Search), then I would like to merge these results so there are no duplicates, and total up the ranks/scores, so that I can ORDER BY those scores.
Add in that I would also like to search a separate link table of "tags" that the documents have been assigned, and also assign extra score for those with corresponding tags.
What is the best practice way of fulfilling these requirements?
Full-text search can do search like ('"word*" near "another*"') in CONTAINSTABLE statement. The asterisk will help to search any words started with 'word' and 'another' near each other with ranking.
On the other side you can launch FORMSOF(Thesaurus, word) AND FORMSOF(Thesaurus, another) search with CONTAINSTABLE statement.
Then MERGE the results and use ORDER BY to sort by both given RANKs.

Solr - How do I get the number of documents for each field containing the search term within that field in Solr?

Imagine an index like the following:
id partno name description
1 1000.001 Apple iPod iPod by Apple
2 1000.123 Apple iPhone The iPhone
When the user searches for "Apple" both documents would be returned. Now I'd like to give the user the possibility to narrow down the results by limiting the search to one or more fields that have documents containing the term "Apple" within those fields.
So, ideally, the user would see something like this in the filter section of the ui after his first query:
Filter by field
name (2)
description (1)
When the user applies the filter for field "description", only documents which contain the term "Apple" within the field "description" would be returned. So the result set of that second request would be the iPod document only. For that I'd use a query like ?q=Apple&qf=description (I'm using the Extended DisMax Query Parser)
How can I accomplish that with Solr?
I already experimented with faceting, grouping and highlighting components, but did not really come to a decent solution to this.
[Update]
Just to make that clear again: The main problem here is to get the information needed for displaying the "Filter by field" section. This includes the names of the fields and the hits per field. Sending a second request with one of those filters applied already works.
Solr just plain Doesn't Do This. If you absolutely need it, I'd try it the multiple requests solution and benchmark it -- solr tends to be a lot faster than what people put in front of it, so an couple few requests might not be that big of a deal.
you could achieve this with two different search requests/queries:
name:apple -> 2 hits
description:apple -> 1 hit
EDIT:
You also could implement your own SearchComponent that executes multiple queries in the background and put it in the SearchHandler processing chain so you only will need a single query in the frontend.
if you want the term to be searched over the same fields every time, you have 2 options not breaking the "single query" requirement:
1) copyField: you group at index time all the fields that should match togheter. With just one copyfield your problem doesn't exist, if you need more than one, you're at the same spot.
2) you could filter the query each time dynamically adding the "fq" parameter at the end
http://<your_url_and_stuff>/?q=Apple&fq=name:Apple ...
this works if you'll be searching always on the same two fields (or you can setup them before querying) otherwise you'll always need at least a second query
Since i said "you have 2 options" but you actually have 3 (and i rushed my answer), here's the third:
3) the dismax plugin described by them like this:
The DisMaxQParserPlugin is designed to process simple user entered phrases
(without heavy syntax) and search for the individual words across several fields
using different weighting (boosts) based on the significance of each field.
so, if you can use it, you may want to give it a look and start from the qf parameters (that is what the option number 2 wanted to be about, but i changed it in favor of fq... don't ask me why...)
SolrFaceting should solve your problem.
Have a look at the Examples.
This can be achieved with Solr faceting, but it's not neat. For example, I can issue this query:
/select?q=*:*&rows=0&facet=true&facet.query=title:donkey&facet.query=text:donkey&wt=json
to find the number of documents containing donkey in the title and text fields. I may get this response:
{
"responseHeader":{"status":0,"QTime":1,"params":{"facet":"true","facet.query":["title:donkey","text:donkey"],"q":"*:*","wt":"json","rows":"0"}},
"response":{"numFound":3365840,"start":0,"docs":[]},
"facet_counts":{
"facet_queries":{
"title:donkey":127,
"text:donkey":4108
},
"facet_fields":{},
"facet_dates":{},
"facet_ranges":{}
}
}
Since you also want the documents back for the field-disjunctive query, something like the following works:
/select?q=donkey&defType=edismax&qf=text+titlle&rows=10&facet=true&facet.query=title:donkey&facet.query=text:donkey&wt=json

How to return column that matched the query in Solr..?

I am using apache Solr for searching my database..!!
Suppose i have indexed 4 columns from one of my table..!!..I just want that only those columns that contains my query term are returned in response..!!..is that possible..??
For example :
I have a table cars with columns : name, displayName, description, extra ..!!
Now i make a query , something like :
localhost:8983/solr/select?q=maruti&wt=json
Now some in some rows only name may contain the word "maruti"
So, In return, i want only name (along with some other fixed fields like ID) ..
Similarly, If description contains this word, then only description should be returned..and not other columns..!!
How can i acheive this..??
You may be able to do this with Solr 4 and a custom transformer - my reading of the documentation would seem to indicate as much. But it would be quite a bit of work, I think. Ultimately you may have to write a front-end filter, but that would be difficult with complex queries.
Update:
Here's how to do this in Solr without custom transformers, etc. Enable highlighting for all four columns:
hl=on&hl.fl=name,displayName,description,extra
Solr will return a "highlighting" structure containing the key and the field(s) that match the query. You will also get highlighted snippets, whether you use them is up to you. See here for additional params: http://wiki.apache.org/solr/HighlightingParameters

Solr: Query and return x number of types

I have a large index of files. One of the fields I have is "content_type". This field stores the basic type for a file (i.e. pdf, image, video, document, spreadsheet, etc).
I'm running a search on files names (my "title" field). How can I structure the query so that it returns only a certain amount of each type?
For example, say I have 1000 files with the word "work" in the title. I want to search for "work" in the title, but I want 5 results from each "content_type" returned first. (assuming that each specific content_type has 5 or more items). So on my search results page I can say:
1,000 items were found for "work"
Then I start listing listing the items, 5 for each type.
Can anyone help me build a query that will do this? I'm pretty new to Solr, but I'm hoping this can be done.
Seems basically you want to limit and group the results per content type.
Check out the Solr field Collapsing and grouping feature
This will help you to group the results per content type using group.field=content_type
The number of results in a group can be limited by group.limit=5
For the complete list of options refer to the link above.
And you can use the normal query parameters to search the results i.e. q=work
This feature is only available from the Solr 3.3 build.

Resources