SOLR query runs OK only on 1 node out of the 3 nodes with SOLR installed - solr

Here's our problem. We're runnig DSE Enterprise 4.8 with the following configuration:
6 servers with Cassandra
on 3 of them Spark is installes (DSE Analytics)
on the orher 3 SOLR is installed (DSE Search)
We want to do "query stats" with SOLR on a table using and excluding certain filter queries.
When we try to execute a simple query like this:
/select?q=*:*&wt=json&indent=true&fq={!tag=fq1}test:100&stats=true&stats.field={!ex=fq1}test&rows=1
it runs OK only on 1 node out of the 3 with SOLR installed.
For the other 2 nodes we get this exception:
{
"responseHeader":{
"status":400,
"QTime":2},
"error":{
"msg":"undefined field: \"{!ex=fq1}test\"",
"code":400},
"params":"q=*:*&indent=true&stats=true&fq={!tag%3Dfq1}test:100&rows=1&wt=json&stats.field={!ex%3Dfq1}test"}
Could you help us identify the reason of the "undefined field" exception when using exluding filter queries inside the stats.field parameter.
Also it would help us to use only a subset of the stats functions ( for example only count ) :
stats.field={!count=true}test
But these types of parameters seem to be ignored and the whole set of stats functions are computed...
Many Thanks

Related

Getting started with Cassandra Solr Search

I have an existing Datastax Cassandra cluster that I am just experimenting with currently. Cassandra itself was very easy to get going and is working very well. However, I honestly can't seem to figure out how to get Solr Searching working.
I am supposed to have a solrconfig.xml file however I don't seem to have that anywhere on the machine. Solr and cassandra certainly appear to be installed correctly.
I tried a solr_query request which does not work. I tried it connected to a normal node and a Solr node with the same results.
test.user#cqlsh:Datafyer> select "Title" from "Table" where solr_query = 'title:test*';
InvalidRequest: Error from server: code=2200 [Invalid query] message="Undefined name solr_query in where clause ('solr_query = 'title:test*'')"
I have indeed verified that on the search node SOLR_ENABLED=1.
And the node itself is part of the system as you can see below.
administrator#dse-search-qa01:/usr/share/dse$ nodetool ring
Datacenter: Analytics
==========
Address Rack Status State Load Owns Token
10.10.98.7 rack1 Up Normal 325.86 KB ? -7438423332917368512
Datacenter: Cassandra
==========
Address Rack Status State Load Owns Token
6175281243369380764
10.10.98.3 rack1 Up Normal 441.55 KB ? 4412916390327649050
10.10.98.5 rack1 Up Normal 442.44 KB ? 4563214312080485226
10.10.98.1 rack1 Up Normal 451.64 KB ? 6175281243369380764
Datacenter: Solr
==========
Address Rack Status State Load Owns Token
10.10.98.9 rack1 Up Normal 447.89 KB ? -8974470140210234803
It looks like you didn't create the indices for 'solr_query' to work. If you're just experimenting, you can simply run:
dsetool create_core <keyspace>.<table> generateResources=true reindex=true
(for more options, see: https://docs.datastax.com/en/datastax_enterprise/5.0/datastax_enterprise/tools/dsetool.html )
This will create the Solr config and schema XML files for you, as well as index the data already at rest. By default, this will index all columns and do auto-type detection to create the respective indices.
When you're ready to get more in depth with DSE Search, I recommend checking out this course: https://academy.datastax.com/resources/ds310-datastax-enterprise-search
Best,
Marc

Syntax for Solr Query with Cassandra Datastax integration

I'm trying to use DataStax Cassandra/Solr integration to do a facet query with both pivot facets and interval facets
My query look like this:
select * from data where solr_query='{"facet":{"limit":5,"pivot":"event_type,key","interval":"past_visits","f.past_visits.facet.interval.set":["{!key=visit_13_month}[NOW-13MONTH/MONTH,NOW]","{!key=visit_1_month}[NOW-1MONTH/DAY,NOW]"]},"q":"*:*"}']
The error that I am getting back seems to show that the required parameter is not being set (but it is)
08:30:38.244 [New I/O worker #4] WARN c.d.driver.core.RequestHandler - /10.239.133.151:9042 replied with server error (Missing required parameter: f.past_visits.facet.interval.set (or default: facet.interval.set)), trying next host.
When i run an equivalent query directly to Solr (using query params), it works as expected.
/data/select?q=*:*&facet=true&facet.pivot=event_type,key&facet.limit=5&facet.interval=past_visits&f.past_visits.facet.interval.set=%7B!key=visit_13_month%7D[NOW-13MONTH/MONTH,NOW]&f.past_visits.facet.interval.set=%7B!key=visit_1_month%7D[NOW-1MONTH/DAY,NOW]"
I'm trying to follow the Datastax documentation at this link:
http://docs.datastax.com/en/datastax_enterprise/4.8/datastax_enterprise/srch/srchJSON.html
There seems to be something wrong with the way that I am creating the JSON for the Datastax Solr query, but I do not see what I should change.
We don't currently support facet intervals via CQL. Anyway, you don't need to repeat "facet", so it should eventually be something like the following:
select * from data where solr_query='{"facet" {"limit":5,"pivot":"event_type,key","interval":"past_visits","f.past_visits.interval.set":["{!key=visit_13_month}[NOW-13MONTH/MONTH,NOW]","{!key=visit_1_month}[NOW-1MONTH/DAY,NOW]"]},"q":"*:*"}']

Solr Query Performance

We are using solr 4.3 for search funcationlity. We have configured 2 shard and 2 replicas.
We have total 132905 Solr documents in index.
Our search query is very long takes around 3 second (from Solr Admin Console) for below query.
id:(FOLD5002861 FOLD5002890 FOLD5219963 FOLD4105003 FOLD4105005 FOLD4105006 FOLD4105007 FOLD4105008 FOLD4105009 FOLD4105010 FOLD4105011 FOLD4105012 FOLD4105013 FOLD4105014 FOLD4105018 FOLD4105019 FOLD4105020 FOLD4105021 FOLD4105022 FOLD4105023 FOLD4105024 FOLD4105025 FOLD4105026 FOLD4105027 FOLD5220166 FOLD5220168 FOLD5220169 FOLD5220170 FOLD5220171 FOLD5220172 FOLD5220173 FOLD5220174 FOLD5220175 FOLD5220176 FOLD5220177 FOLD5220178 FOLD5220179 FOLD5220180 FOLD5220181 FOLD4100876 FOLD4100877 FOLD4100878 FOLD4100879 FOLD4100880 FOLD4100881 FOLD4655426 FOLD4655428 FOLD4655429 FOLD4655430 FOLD4655431 FOLD4655432 FOLD4655433 FOLD4655434 FOLD4655435 FOLD4655436 FOLD4655437 FOLD4655438 FOLD4655439 FOLD4655483 FOLD4655487 FOLD4655523 FOLD4655874 FOLD4655884 FOLD4655856 FOLD4655858 FOLD4655859 FOLD4655860 FOLD4655861 FOLD4655862 FOLD4655863 FOLD4655864 FOLD4655865 FOLD4655866 FOLD4655867 FOLD4655868 FOLD4655869 FOLD4655870 FOLD4655871 FOLD4655872 FOLD4655882 FOLD4655892 FOLD4649510 FOLD4649512 FOLD4649513 FOLD4649514 FOLD4649515...50000 times)
We want to trace where it is taking time. we tried debugQuery option in solr Admin console but not getting useful information.
Is there any way to improve the query? How can we track detail timing?
If you transform this query into a filter query it will give better performance most of the time because it will apply the filter on top of a query (that is a subset of your index and that can be cached) and not on the entire index.
The best would be if you had another piece of query that you could run with this and then apply this as a filter on top of that , but if you donøt have it also running a query for field:[* TO *] should still give better performances.
have a look at this question that explain the difference between filter query and normal query:
SOLR filter-query vs main-query
Take a look at some factors that affect Solr performance.
Try to optimize your index with:
curl http://<solr_host>:<port>/solr/<core_name>/update -F stream.body=' <optimize />'

Solr 4.3.1 error "you must pass fillFields=true to IndexSearcher.search" on using group.query

I`m using solr-4.3.1 on ubuntu and start solr over jetty. I have custom schema.xml and all fields of query are in it. My collection "collection1" consists of 8 shards
I try grouping data by some field and i use:
http://solr-node1:8983/solr/collection1/select/?q=*:*&group=true&group.field=rgn_str
Solr correct answer and provides the results, but when I try to use the group.query
http://solr-node1:8983/solr/collection1/select/?q=*:*&group=true&group.query=rgn_str:test
an error "shard 7 did not set sort field values (FieldDoc.fields is null); you must pass fillFields=true to IndexSearcher.search on each shard"
The documentation for solr I could not find how to specify this parameter.
How to do it?
To repeat the problem, do the following
Start a node1 of SolrCloud (4.3.1 default configs) (java -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -DzkRun -jar start.jar)
Import to collection1 -> shard1 some data
Try group.query e.g. node1:8983/solr/collection1/select?q=:&group=true&group.query=someFiled:someValue. it is important to have hit on index data.
The result is, there is no error
Start a node2 of SolrCloud (java -Djetty.port=7574 -DzkHost=localhost:9983 -jar start.jar)
On node2 add new core for collection1 -> shard2. Default core "collection1" unload. We have one collection over two shard. Shard1 - have data, shard2 - no data.
Again try group.query node1:8983/solr/collection1/select?q=:&group=true&group.query=someFiled:someValue.
Error: shard 0 did not set sort field values (FieldDoc.fields is null); you must pass fillFields=true to IndexSearcher.search on each shard
"Wait, you can't do this. You're changing the number
of shards? Your original startup specified a single shard,
bringing up another node and calling it "shard2" isn't
consistent.
If you'd brought up a single shard while telling SolrCloud that
there were 2 shards, you shouldn't have been able to index
anything.
So what are you trying to do? Create your cluster with the
number of shards you intend it to have. Or split shards. Or
something, but just bringing up a second node and calling
it "shard2" isn't supported." - Erick Erickson.
Great thank you to him for that.

Solr 4 - query with both group.facet and group.func fails

When doing a search like
http://localhost:8983/solr/select?group=true&group.func=product(fildname1,fieldname2)&group.facet=true&facet=true&facet.field=fieldname3
an error is returned in response where facets are normally returned:
java.lang.ArrayIndexOutOfBoundsException: 0 at org.apache.solr.request.SimpleFacets.getGroupedCounts(SimpleFacets.java:358) ...
The function used can be any function, not product only. There is no such error if group.facet is omitted or group.field is used instead of group.func. It seems that group.field parameter is expected to be defined when calculating grouped facets.
The question: is there another way to use both query functions, or an appropriate work-around, or a tip on where in Solr source to look into this?
This question cross posted from a Solr Jira issue 3742. This issue relates to Solr 4 beta that launched very recently.

Resources