How to index data in a specific shard using solrj - solr

I am using solrj as client to index documents into solr cloud (Using solr4.5)
I had a requirement to save documents based on tenant_id, so i am trying to do document routing. Which is possible only if the collection is created using numShards parameter (http://searchhub.org/2013/06/13/solr-cloud-document-routing/)
I have two instances of solr in solr cloud(example1/solr and example2/solr) and exrenal zookeeper which is running in 2181 port.
Both the instances consist collection called collection1
I created one more collection called newCollection(With two shards and two replicas) using
http://localhost:8501/solr/admin/collectionsaction=CREATE&name=newCollection&numShards=2&replicationFactor=2&maxShardsPerNode=2&router.field=id
So in example1/solr-> I have newCollection_shard1_replica1 & newCollection_shard2_replica1,
In example2/solr -> I have newCollection_shard1_replica2 & newCollection_shard2_replica2
I copied example1/solr/collection1/conf to all shards and replicas
I restarted zookeeper server as well as solr instances:
zookeeper->zkServer.cmd
example1/solr-> java -Dbootstrap_confdir=./solr/newCollection_shard1_replica1/conf -Dcollection.configName=myconf -DzkHost=localhost:2181 -jar start.jar
example2/solr->java -DzkHost=localhost:2181 -jar start.jar
(Both instances are running at different port, one is at 8081 and other at 8051)
I am using solrj client to index documents
Here is my sample code
String url="http://localhost:8081/solr"
ConcurrentUpdateSolrServer solrServer= new ConcurrentUpdateSolrServer(url, 10000, 4);
SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", "shard1!513");
doc.addField("name", "Santhosh");
solrServer.add(documents);
solrServer.commit();
But it is saving document in collection1 with id shard1!513, is there any configuration changes required in solrconfig.xml (I am using default solrconfig.xml which came with solr4.5)
How to save documents in my newCollection? and how to do document routing?
Please help me out with issue.
Thanks!

You can Use CloudSolrServer and UpdateRequest
SolrServer solrServer = new CloudSolrServer(zkHost) // zkHost is your solr zookeeper host string
SolrInputDocument doc = new SolrInputDocument();
UpdateRequest add = new UpdateRequest();
add.add(document);
add.setParam("collection", "newCollection");
add.process(solrServer);
UpdateRequest commit = new UpdateRequest();
commit.setAction(UpdateRequest.ACTION.COMMIT, true, true);
commit.setParam("collection", "newCollection");
commit.process(solrServer);

I appended Core name of new Collection to the URL. so it is working fine now.
Instead of:
String url="http://localhost:8081/solr"
I used:
String url="http://localhost:8081/solr/newCollection_shard1_replica1"
ConcurrentUpdateSolrServer solrServer= new ConcurrentUpdateSolrServer(url, 10000, 4);
SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", "shard1!513");
doc.addField("name", "Santhosh");
solrServer.add(documents);
solrServer.commit();

You should use CloudSolrServer http://lucene.apache.org/solr/4_2_1/solr-solrj/org/apache/solr/client/solrj/impl/CloudSolrServer.html
Because in solrcloud, updates must be routed via zookeeper, as zookeeper knows the status of leaders in cloud.One more thing you need not to append collection name to url, just use setDefaultCollection(collectionName); method of CloudSolrServer to send your updates to 'collectionName' collection

Related

Solr Cloud - node sync not happening when document added using CloudSolrClient

I have a SolrCloud setup with 2 nodes and 2 replicas. I am using CloudSolrClient (Java) to add a document. After adding the document programmatically, when I search for the document through the Solr console UI, I find the document in only 1 node. On the other node, until I reload the collection, it is not returned in the search.
String zkHosts = "zookeperhostname:9983";
String solrCollectionName = "my_coll";
SolrClient solrClient = new CloudSolrClient.Builder().withZkHost(zkHosts).build();
((CloudSolrClient)solrClient).setDefaultCollection(solrCollectionName);
DocVO docVO = createDocumentVO();
List<DocVO> newDocVOList = new ArrayList<DocVO>();
newDocVOList.add(docVO);
solrClient.addBeans(newDocVOList);
solrClient.commit();
Kindly let me know what I am missing here.

Solr Cloud: How to disable document (pdf, office) metadata as fields

I am new to Solr and using Solr 7.3.1 in solr cloud mode
and trying to index pdf, office documents in solr, using contentextraction in solr.
I created a collection with
bin\solr create -c tsindex -s 2 -rf 2
in SolrJ my code looks like
public static void main(String[] args) {
System.out.println("Solr Indexer");
final String solrUrl = "http://localhost:8983/solr/tsindex/";
HttpSolrClient solr = new HttpSolrClient.Builder(solrUrl).build();
String filename="C:\\iSampleDocs\\doc-file.doc";
ContentStreamUpdateRequest solrRequest = new ContentStreamUpdateRequest("/update/extract");
try {
solrRequest.addFile(new File(filename), "application/msword");
solrRequest.setParam("litral.ts_ref", "ts-456123");
//solrRequest.setParam("defaultField", "text");
solrRequest.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
NamedList<Object> result= solr.request(solrRequest);
System.out.println(result);
} catch (IOException e) {
e.printStackTrace();
}catch ( SolrServerException e) {
e.printStackTrace();
}
}
I am getting multiple issues
Although I have created field ts_ref as text_general in Solr Admin UI, this field does not get set at all.
My goal is to index the complete document including its metadata in one field and then set couple of more fileds refrencing document in another system like e.g. ts_ref field. But what actually happens is the solr extracts the metadata of files and create seperate fileds for each metadata value.
I have tried disabling data driven schema functionality by bin\solr config -c tsindex -zkHost localhost:9983 -property update.autoCreateFields -value false
When I uncomment line solrRequest.setParam("defaultField", "text"); from beginning, there is not separate fields for all metadata extracted, but as soon as I comment this line and upload the files, the meta data are again in separate fields afterwards (even if I uncomment its again).
"litral.ts_ref" there is a typo here, missing an e
you can achieve ignoring all metadata fields by using uprefix field, and a dynamic field that goes with it. See the doc that shows exactly that case.

trouble in solr connect java

I try to use solr 6.5.0 to connect java . I have added following .jar files to the library:
commons-io-2.5
httpclient-4.4.1
httpcore-4.4.1
httpmine-4.4.1
jcl-over-slf4j-1.7.7
noggit-0.6
slf4j-api-1.7.7
stax2-api-3.1.4
woodstox-core-asl-4.4.1
zookeeper-3.4.6
solr-solrj-6.5.0
but when i try use following code to connect the solr:
import org.apache.http.impl.bootstrap.HttpServer;
import org.apache.solr.client.solrj.SolrQuery;
import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.client.solrj.impl.HttpSolrServer;
import org.apache.solr.client.solrj.response.QueryResponse;
import org.apache.solr.common.SolrDocumentList;
public class SolrQuery {
public static void main(String[] args) throws SolrServerException {
HttpSolrServer solr = new HttpServer("http://localhost:8983/solr/collection1");
SolrQuery query = new SolrQuery();
query.setQuery("*");
QueryResponse response = solr.query(query);
SolrDocumentList results = response.getResults();
for (int i = 0; i < results.size(); ++i) {
System.out.println(results.get(i));
}
}
}
before i compile it, I got an error in the:
import org.apache.solr.client.solrj.impl.HttpSolrServer;
import org.apache.solr.client.solrj.SolrQuery;
HttpSolrServer solr = new HttpServer("http://localhost:8983/solr/collection1");
Can anyone help me how to solve it?
The piece of code in your question was written for an old version of Solr before ver. 5.0. You'll find many sources and example around written for old Solr versions, but in most of the cases all you have to do is change the old SolrServer class with the new SolrClient (and now correct) class.
Both were the representations of the Solr instances you want to use.
Read the Solr Documentation - Using SolrJ
I warmly suggest to not use for your classes the same name of an already existing class (in your example your class is named SolrQuery).
The catch all string for Solr queries is *:* which means: search any match for all available fields. So change the statement query.setQuery into:
query.setQuery("*:*");
I suppose you're using a Solr client for a standalone instance so, as you're already aware, the correct way to instance a SolrClient is:
String urlString = "http://localhost:8983/solr/gettingstarted";
SolrClient solr = new HttpSolrClient.Builder(urlString).build();
And this is an easier way I suggest to iterate through all returned document:
for (SolrDocument doc : response.getResults()) {
System.out.println(doc);
}
Have a look at the documentation of SolrDocument class that explain how to use it and correctly read field values.
I founded that i need to import a .jar file which is not contain in the /dist library which named slf4j-simple-1.7.25 , and also
HttpSolrServer solr = new HttpServer("http://localhost:8983/solr/gettingstarted");
SolrQuery query = new SolrQuery();
need to change to the
String urlString = "http://localhost:8983/solr/gettingstarted";
SolrClient solr = new HttpSolrClient.Builder(urlString).build();
after that it finally can run already!!!

how to solve org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException

I got the following error:
This is my code:
At first glance, I see one problem: You never tell the SolrServer which core or collection to address. As is stated in the SolrJ docs:
There are two ways to use an HttpSolrClient:
1) Pass a URL to the constructor that points directly at a particular core
SolrClient client = new HttpSolrClient("http://my-solr-server:8983/solr/core1");
In this case, you can query the given core directly, but you cannot query any other cores or issue CoreAdmin requests with this client.
2) Pass the base URL of the node to the constructor
SolrClient client = new HttpSolrClient("http://my-solr-server:8983/solr");
QueryResponse resp = client.query("core1", new SolrQuery("*:*"));
BTW: You should use that Solrclient object that you commented out. The SolrServer objects are deprecated.

Make CloudSolrServer to run dataimports only on a leader

I set up a two servers Solr cluster with SolrCloud. Currently I have Master and Replica.
I want to dataimports go to the leader since it doesn't make any sense to make delta-imports on slave (updates wouldn't be distributed to the leader).
From the documentation I get that CloudSolrServer knows cluster state (obtained from Zookeeper) and by default sends all updates only to the leader.
What I want is to make CloudSolrServer to send all dataimport commands to the master. I have the following code:
SolrServer solrServer = new CloudSolrServer("localhost:2181");
ModifiableSolrParams params = new ModifiableSolrParams();
params.set("qt", "/dataimport");
params.set("command", "delta-import");
QueryResponse response = solrServer.query(params);
But I see that the requests still goes to both my servers
localhost:8080 and localhost:8983. Is there any way to fix this?
Just replace your solr server initialization to below
SolrServer solrServer = new CloudSolrServer("zkHost1:port,zkHost2:port");
THis will cause the solr server client to consult zookeeper for solrcloud state.
For more details read CloudSolrServer documentation to init from zookeeper ensemble.
try { CloudSolrServer css = new CloudSolrServer("host1:2181,host2:2181"); css.connect(); ZkStateReader zkSR2 = css.getZkStateReader(); String leader = zkSR2.getLeaderUrl("collection_name", "shard1", 10); } catch (KeeperException e) { } catch (IOException
e) { } catch (InterruptedException e) {}

Resources