Determine page number when using Solr cursorMark

Determine page number when using Solr cursorMark - solr

I am building an application with Solr that should give users the first N results of a search, and using cursorMark to paginate through R rows at a time.
The problem is that with the client+server relationship, the client knows the page number and the cursorMark, but the server is only told the cursorMark. It's also not safe to trust a page number from the client.
Is there any way I'd be able to determine the offset from a given cursorMark server-side without also storing a list of page number + cursorMark combinations for every search?
For example, I'd like to be able to reject a request after using a cursorMark that would yield results > 10000 for a given search.

Related

Solr does not search data in correct fashion

We have a customer web application, which uses Apache Solr APIs internally. We do not have access to the SolrUI on customer site.
Due to recent changes in Solr at customer end, whenever we try to search the data like
NAME:PRANAV AND AGE:1, Solr does not show any results.(shows numFound:0)
whereas when we search
NAME=PRANAV AND AGE:1, it gives the result.(shows numFound value greater than 0)
So String searches works with = and numeric search works with :.
But when we search
NAME=PRA* , we do not get any result in Solr. (shows numFound:0)
Can someone please advise, what should be changed on Solr side to correct the searches?
We want to have wild card searches (*, ?) to work and also String search should work with : instead of =.

solr filter query on document value

I'm looking for a solution where my very long query strings are returning a 414 http response. Some queries can reach up to 10,000 chars, I could look at changing how many chars apache/jetty allows, but I'd rather not allow my webserver to have anyone post 10,000 chars.
Is there a way in solr where I can save a large query string in a document and use it in a filtered query?
select?q=*:*&fq=id:123 - this would return a whole document, but is there a way to return the value of a field in document 123 in the query
The field queryValue in document with the id of 123 would be Intersects((LONGSTRING))
So is there a way to do something like select?q=*:*&fq=foo:{id:123.queryValue}
this would be the same as select?q=*:*&fq=foo:Intersects((LONGSTRING))?

Two possibilities:
Joining
You can use the Join query parser to fetch the result from one collection / core and use that to filter results in a different core, but there are several limitations that will be relevant when you're talking larger installations and data sizes. You'll have to experiment to see if this works for your use case.
The Join Query Parser
Hashing
As long as you're only doing exact matches, hash the string on the client side when indexing and when querying. Exactly how you do this will depend on your language of choice. For python you'd get the hash of the long string using hashlib, and by using sha256, you'll get a resulting string that you can use for indexing and querying that's 64 bytes if you're using the hex form, 44 if you're using base64.
Example:
>>> import hashlib
>>> hashlib.sha256(b"long_query_string_here").hexdigest()
'19c9288c069c47667e2b33767c3973aefde5a2b52d477e183bb54b9330253f1e'
You would then store then 19c92... value in Solr, and do the same transformation when you have value you're querying after.
fq=hashed_id:19c9288c069c47667e2b33767c3973aefde5a2b52d477e183bb54b9330253f1e

There might be alternative methods to what you are looking for before doing literal solution you seek:
You can POST query to Solr instead of using GET. There is no URL limit on that
If you are sending a long list of ids and using OR construct, there are alternative query parsers to make it more efficient (e.g. TermsQueryParser)
If you have constant (or semi-constant) query parameters, you could factor them out into defaults on request handlers (in solrconfig.xml). You can create as many request handlers as you want and defaults can be overriden, so this effectively allows you to pre-define classes/types of queries.

Find total number of terms indexed in a particular document solr

I have read extensively about solr and it gives me the ability to find termfreq i.e. the number of times the required text is available in the document. But I need to know the total number of terms that have been indexed in a particular document. The query I am trying is
/solr/live/select?qt=albumsearch&q=pak%20pak&fl=%2Cscore&wt=json&indent=true&defType=edismax&q.alt=as&qf=a%5E10+l%5E10&bf=12234&boost=termfreq(song,.)
Any help will be appreciated.

You can use either the Luke Request Handler with a docId parameter, or use the Stats Component with a query / fq that results the document you're interested in.

solr pagination and grouping

I am using LucidWorks and Solr to implement search in a large and diverse web app which has many different types of pages. The spec calls for a single search results page grouped by page type with pagination of search results in each group.
I can group easily enough with something like this
q=[searchterm]&group=true&group.field=[pagetypefield]
which returns nicely grouped results.
I can also do:
q=[searchterm]&group=true&group.field=[pagetypefield]&group.offset=[x]&group.limit=[y]
which will get me y results per group starting at result x
However what i want to be able to do is supply an offset and limit per group because i might want to get results 0-4 for group 1 and results 5-9 for group 2.
The values for [pagetypefield] are a list of known values so i can do multiple queries like:
q=[searchterm]&group=true&group.query=[pagetypefield]:[value]&group.offset=[x]&group.limit=[y]
for each known value of [pagetypefield]
or to not use group.offset and in my example get results 0-9 for both groups and just discard the results i don't need.
I don't really like either option but i can't find a way in the documentation to specify offset and limit on a per group basis.
Any advice would be most appreciated.

I have confirmation from LucinWorks that what i want to do is not possible and they recommended the multiple search solution as the first search will be chached so subsequent searches will be really fast.
What I think I'm going to end up doing is to group the search results taking first n results for each group, then use ajax to paginate each group.

GAE Search API. Obtain total amount of matching documents

Hi,
I am using GAE Search API, and it seems to be a really great feature, which by the way adds so vital functionality lacked in standard datastore queries.But i have faced a problem to implement a standard pagination, namely to get a total amount of matching the query documents. Certainly, i can implement a list with a "show more" button using Cursor, but it would be also great to be able to obtain a total amount.
Any ideas on how to do this?
Thank you very much in advance!

Step 1:
set your accuracy
QueryOptions options = QueryOptions.newBuilder()
...set other options
.setNumberFoundAccuracy(1000);
.build();
Sets the accuracy requirement for Results.getNumberFound(). If set,
getNumberFound() will be accurate up to at least that number. For
example, when set to 100, any getNumberFound() <= 100 is accurate.
This option may add considerable latency / expense, especially when
used with setFieldsToReturn(String...).
Step 2 run query
Query query = Query.newBuilder().setOptions(options).build(queryString);
Results<ScoredDocument> results = getIndex().search(query);
Step 3 call getNumberFound()
results.getNumberFound();
The number of results found by the search. If the value is less than
or equal to the corresponding QueryOptions.getNumberFoundAccuracy(),
then it is accurate, otherwise it is an approximation Returns: the
number of results found

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Determine page number when using Solr cursorMark - solr

Related

Solr does not search data in correct fashion

solr filter query on document value

Find total number of terms indexed in a particular document solr

solr pagination and grouping

GAE Search API. Obtain total amount of matching documents

Categories

Resources