I am working on Datastax Cassandra with Apache Solr for multiple partial search.
Issue is, everytime I am getting only 10 rows even once I am doing count(*) query, I am able to check there are 1300 rows belong to particular query.
nandan#cqlsh:testo> select id from empo where solr_query = 'isd:9*';
id
--------------------------------------
5ee5fca6-6f48-11e6-8b77-86f30ca893d3
27e3e3bc-6f48-11e6-8b77-86f30ca893d3
f3156e76-6f47-11e6-8b77-86f30ca893d3
f315ac74-6f47-11e6-8b77-86f30ca893d3
f315bc82-6f47-11e6-8b77-86f30ca893d3
27e3058c-6f48-11e6-8b77-86f30ca893d3
4016eee4-6f47-11e6-8b77-86f30ca893d3
1bd33e34-6f47-11e6-8b77-86f30ca893d3
8f0a9168-6f47-11e6-8b77-86f30ca893d3
6669cc42-6f47-11e6-8b77-86f30ca893d3
(10 rows)
After searching few links, I make changes into solrconfig.xml file. and changes are as below.
<requestHandler class="solr.SearchHandler" default="true" name="search">
<!-- default values for query parameters can be specified, these
will be overridden by parameters in the request
-->
<lst name="defaults">
<int name="rows">1000000</int>
</lst>
<!-- SearchHandler for CQL Solr queries:
this handler doesn't support any additional components, only default parameters
-->
<requestHandler class="com.datastax.bdp.search.solr.handler.component.CqlSearchHandler" name="solr_query">
<lst name="defaults">
<int name="rows">1000000</int>
</lst>
</requestHandler>
But still I am getting same issue. Please let me know what will be the solution for this.
Thanks.
I don't think it should be managed from the schema. The query has a rows and a start parameter. Use those: rows defines the max number of items to return, start defines the first item in the list to return:
q=isd:9*&rows=22&start=17&wt=json
isd:9* returns all items where isd starts with 9.
start=17 says begin at the 18th item in the list.
rows=22 returns 22 items, from 18 to 40.
try this
select id from empo where solr_query = 'isd:9*' limit 1300;
this will give you all 1300 rows, by default solr limits the rows it return to 10.
Related
Is it possible to tell Solr to use a specific filter value if no other filter is defined for that field?
Example:
If there is no other fq entry present for a field age then search by default for age > 18.
Yes, you can add these to the requestHandler definition:
<lst name="defaults">
<str name="fq">age:[18 TO *]</str>
</lst>
(or if you really meant larger than 18 and not 18 or older, {18 TO *] or [19 TO *]).
You can also use appends and invariants instead of defaults to add a filter query to all queries or set a parameter to a static value that an URL parameter can't override.
I have reviewed Solr this document here. As per this document Solr returns 100 records for facets. What if I want fetch more records. How can I change default value for facet records?
I do not want to get it through Solr query. Is there any way to change its default value?
In your requestHandler name="/select" you can set the any value you want. If you are using some other requestHandler then set it in the respective requestHandler as per your needs.
<lst name="defaults">
....
<str name="facet.limit">1000</str>
....
</lst>
I have a situation where all my queries have some sub filter queries which are added each time and are very long.
The query filters are the same each time so it is a waste of time sending them over and over to Solr server and parsing them on the other side just to find them in the cache.
Is there a way I can send filter query definition once to the Solr server and then reference it in following queries?
You can add a static configuration directive in your solr config (solrconfig.xml):
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="appends">
<str name="fq">foo:value</str>
</lst>
</requestHandler>
.. this will always append a fq= term to the query string before the SearchHandler receives the query. Other options are invariants or defaults. See Request Handlers and Search Handlers on the community wiki for more information.
I try to count issue 1 to 5 with this range facet query:
...&facet.range=issue&facet.range.start=1&q=magid:abc&facet.range.end=5&facet.range.gap=1
It returns:
<lst name="issue">
<lst name="counts">
<int name="1">5</int>
<int name="2">7</int>
<int name="3">9</int>
<int name="4">7</int>
</lst>
There's no issue 5 ##??? Also issue 1 should be 3, 5 is for issue 2 (Then I think "Hey! IT CAN'T BE array element starts from 0" problem, right?!..."). I chnage facet.range.start to 0 and do query again. This time it returns:
<lst name="issue">
<lst name="counts">
<int name="0">3</int>
<int name="1">5</int>
<int name="2">7</int>
<int name="3">9</int>
<int name="4">7</int>
</lst>
Oh My! it should be issue 1~5, instead 0~4? Why are Solr doing this? It is really confusing me!
I am sure that these are not 0-based index values. The values you see are the actual values being indexed as tokens, so if you index values from 1 to 5 you should see values from 1 to 5
So, if you want to make sure if you have documents with value 5 or not, the best way to debyg this from the Schema Browser -> Term info
So, go to Solr Admin interface, select the core, click on schema browser, choose the field name you want to see term info for, then click on Load term info.
I recently started playing around with Apache Solr and currently trying to figure out the best way to benchmark the indexing of a corpus of XML documents. I am basically interested in the throughput (documents indexed/second) and index size on disk.
I am doing all this on Ubuntu.
Benchmarking Technique
* Run the following 5 times& get average total time taken *
Index documents [curl http://localhost:8983/solr/core/dataimport?command=full-import]
Get 'Time taken' name attribute from XML response when status is 'idle' [curl http://localhost:8983/solr/core/dataimport]
Get size of 'data/index' directory
Delete Index [curl http://localhost:8983/solr/core/update --data '<delete><query>*:*</query></delete>' -H 'Content-type:text/xml; charset=utf-8']
Commit [curl http://localhost:8983/solr/w5/update --data '<commit/>' -H 'Content-type:text/xml; charset=utf-8']
Re-index documents
Questions
I intend to calculate my throughput by dividing the number of documents indexed by average total time taken; is this fine?
Are there tools (like SolrMeter for query benchmarking) or standard scripts already available that I could use to achive my objectives? I do not want to re-invent the wheel...
Is my approach fine?
Is there an easier way of getting the index size as opposed to performing a 'du' on the data/index/ directory?
Where can I find information on how to interpret XML response attributes (see sample output below). For instance, I would want to know the difference between the QTime and Time taken values.
* XML Response Used to Get Throughput *
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
</lst>
<lst name="initArgs">
<lst name="defaults">
<str name="config">w5-data-config.xml</str>
</lst>
</lst>
<str name="status">idle</str>
<str name="importResponse"/>
<lst name="statusMessages">
<str name="Total Requests made to DataSource">0</str>
<str name="Total Rows Fetched">3200</str>
<str name="Total Documents Skipped">0</str>
<str name="Full Dump Started">2012-12-11 14:06:19</str>
<str name="">Indexing completed. Added/Updated: 1600 documents. Deleted 0 documents.</str>
<str name="Total Documents Processed">1600</str>
<str name="Time taken">0:0:10.233</str>
</lst>
<str name="WARNING">This response format is experimental. It is likely to change in the future.</str>
</response>
To question 1:
I would suggest you should try to index more than 1 XML (with different dataset) file and compare the given results. Thats the way you will know if it´s ok to simply divide the taken time with your number of documents.
To question 2:
I didn´t find any of these tools, I did it by my own by developing a short Java application
To question 3:
Which approach you mean? I would link to my answer to question 1...
To question 4:
The size of the index folder gives you the correct size of the whole index, why don´t you want to use it?
To question 5:
The results you get in the posted XML is transfered through a XSL file. You can find it in the /bin/solr/conf/xslt folder. You can look up what the termes exactly means AND you can write your own XSL to display the results and informations.
Note: If you create a new XSL file, you have to change the settings in your solrconfig.xml. If you don´t want to make any changes, edit the existing file.
edit: I think the difference is, that the Qtime is the rounded value of the taken time value. There are only even numbers in Qtime.
Best regards