SOLR CLOUD - Uploading LTR features - solr

We have a SOLR Cloud cluster with 4 nodes. Collections are created with 4 shards and 2 replicas.
I was using REST endpoints (pointing to a single instance for all operations), to create feature(s) and model(s).
http://{{SOLRCLOUD-HOST}}:8983/solr/{{ACTIVE_INDEX_NAME}}/schema/feature-store
http://{{SOLRCLOUD-HOST}}:8983/solr/{{ACTIVE_INDEX_NAME}}/schema/model-store
When I execute REST endpoint to fetch the existing feature(s) and models(s)
http://{{SOLRCLOUD-HOST}}:8983/solr/{{ACTIVE_INDEX_NAME}}/schema/feature-store
http://{{SOLRCLOUD-HOST}}:8983/solr/{{ACTIVE_INDEX_NAME}}/schema/model-store
I see my feature/model created sometimes and the other times it says they don't exist.
At this point, when restart my cluster, thre GET calls always return the created features and models.
Couple of questions -
Like config sets, is there a way to upload features and models without using REST endpoint?
Is restart required after uploading features and models.
Should the feature/mode be executed against all collections in the cluster (assume I have more than one collection with the same data created for different purpose, plz don't ask why, I have them)
Are the features/models created available for collections created in the future with the same config set, I ask this question because the feature/model uploaded is seen inside the config set as - _schema_model-store.json and _schema_feature-store.json
Please advice. Thanks!

Did you find any answers?
I was stuck with feature-store not being available on all shards. Your suggestion of restarting solr helped. Is that the permanent solution?
To answer your Question #3:
You need to upload the features/models for each collection, since collection is part of the upload url, notice the "techproducts" in feature upload example from solr doc:
curl -XPUT 'http://localhost:8983/solr/techproducts/schema/feature-store' --data-binary "#/path/myFeatures.json" -H 'Content-type:application/json'

Just reload the collection to make the feature and model json file to be available on all shards of the collection. The restart of solr is not required.

Related

How to make config changes take effect in Solr 7.3

We are using solr.SynonymFilterFactory with synonyms.txt in Solr during querying. I realized that there is an error in synonyms.txt, corrected it and uploaded the new file. I can see the modified synonyms.txt from Admin. But it looks like the queries are still using the old synonyms.txt. I am executing test queries from Admin with debugQuery=true and can see the synonyms getting used. How can this be fixed? It is a production environment with 3 nodes using zookeeper for management.
You'll need to reload your core for the changes to take effect.
In a single-node Solr you can do that from the Admin page: go to Core Admin, select your core, and hit Reload. This will slow down some queries but it shouldn't drop queries or connections.
You can also reload the core via the API:
curl 'http://localhost:8983/solr/admin/cores?action=RELOAD&core=your-core'
I am not sure how this works on an environment with 3 nodes, though.

WRRCSR42 Create solr cluster

I followed the instructions listed in Getting started with the Retrieve and Rank service to create solr cluster, however I received the following message : WRRCSR42:The requesting service instance may not create any more free solar clusters(current limit:1)
My Questions: what this message mean? and what should I do to get the cluster id?
Thank you,
The error tells you that you've already created a Solr Cluster. IBM Watson R&R only provides one free cluster.
To retrieve the list of existing clusters, you can use the same endpoint as when you attempt to create the cluster, but issue a regular GET request instead of a POST request.
https://gateway.watsonplatform.net/retrieve-and-rank/api/v1/solr_clusters
The response lists your existing Solr clusters, and includes your solr_cluster_id

REST API for SOLR analyzer

I'm going to test my SOLR analyzer and I've found instructions how to do it here: https://cwiki.apache.org/confluence/display/solr/Running+Your+Analyzer.
But I need to check several thousand of words, so I'm going to do it programmatically, not manually. Does SOLR have any REST API to run analyzer?
Thank you!
The Solr Admin page is just a set of static HTML files that uses the REST API offered by Solr behind the scenes. If you watch the Network tab in your browser's developer tools while navigating it, you'll see all the endpoints it talks to.
After doing this on the Analysis page, you can see that it makes requests to three endpoints, one to fetch the HTML, then two new requests to get the schema (for the field list) and one to perform the actual analysis:
http://localhost:8983/solr/corename/analysis/field?wt=json&analysis.showmatch=true&analysis.fieldvalue=asd&analysis.query=asd&analysis.fieldname=content

Solr Cloud : no servers hosting shard

We have a cluster of standalone Solr cores (Solr 4.3) for which we had built some custom plugins. I'm now trying to prototype converting the cluster to a Solr Cloud cluster. This is how I am trying to deploy the cores (in 4.7.2).
Start solr with zookeeper embedded.
java -DzkRun -Djetty.port=8985 -jar start.jar
upload a config into Zookeeper (same config as the standalone cores)
zkcli.bat -zkhost localhost:9985 -cmd upconfig -confdir myconfig -confname myconfig
Create a new collection (mycollection) of 2 shards using the Collections API
http://localhost:8985/solr/admin/collections?action=CREATE&name=mycollection&numShards=2&replicationFactor=1&maxShardsPerNode=2&collection.configName=myconfig
So at this point I have two shards under my solr directory with the appropriate core.properties
But when I go to http://localhost:8985/solr/#/~cloud, I see that the two shards' status is "Down" when they are supposed to be active by default.
And when I try to index documents in them using SolrJ (via CloudSolrServer API) , I get the error "No live SolrServers available to handle this request". I restarted Solr but same issue.
private CloudSolrServer cloudSolr;
cloudSolr = new CloudSolrServer(zkHOST);
cloudSolr.setZkClientTimeout(zkClientTimeout);
cloudSolr.setDefaultCollection(collectionName);
cloudSolr.connect();
cloudSolr.add(doc)
What am I doing wrong? I did a lot of digging around and saw an old Jira bug saying that Solr Cloud shards won't be active until there are some documents in the index. If that is the reason, that's kind of like a catch-22 isn't it?
So anyways, I also tried adding some test documents manually and committed to see if things improved. Now on the shard statistics page, it correctly gives me the Numdocs count but when I try to query it says "no servers hosting shard". I next tried passing in shards.tolerant=true as a query parameter and search, but no cigar. It says 0 documents found.
Any help would be appreciated. My main objective is to rebuilt the old standalone cores using SolrCloud and test to see if our custom requesthandlers still work as expected. And at this point, I can't index documents inside of the 4.7 Solr Cloud collection I have created.
Thanks and Regards

Search support for Google App Engine Go runtime

There is search support (experimental) for python and Java, and eventually Go also may supported. Till then, how can I do minimal search on my records?
Through the mailing list, I got an idea about proxying the search request to a python backend. I am still evaluating GAE, and not used backends yet. To setup the search with a python backed, do I have to send all the request (from Go) to data store through this backend? How practical is it, and disadvantages? Any tutorial on this.
thanks.
You could make a RESTful Python app that with a few handlers and your Go app would make urlfetches to the Python app. Then you can run the Python app as either a backend or a frontend (with a different version than your Go app). The first handler would receive a key as input, would fetch that entity from the datastore, and then would store the relevant info in the search index. The second handler would receive a query, do a search against the index, and return the results. You would need a handler for removing documents from the search index and any other operations you want.
Instead of the first handler receiving a key and fetching from the datastore you could also just send it the entity data in the fetch.
You could also use a service like IndexDen for now (especially if you don't have many entities to index):
http://indexden.com/
When making urlfetches keep in mind the quotas currently apply even when requesting URLs from your own app. There are two issues in the tracker requesting to have these quotas removed/increased when communicating with your own apps but there is no guarantee that will happen. See here:
http://code.google.com/p/googleappengine/issues/detail?id=8051
http://code.google.com/p/googleappengine/issues/detail?id=8052
There is full text search coming for the Go runtime very very very soon.

Resources