Solr Cloud - node sync not happening when document added using CloudSolrClient

Solr Cloud - node sync not happening when document added using CloudSolrClient - solr

I have a SolrCloud setup with 2 nodes and 2 replicas. I am using CloudSolrClient (Java) to add a document. After adding the document programmatically, when I search for the document through the Solr console UI, I find the document in only 1 node. On the other node, until I reload the collection, it is not returned in the search.
String zkHosts = "zookeperhostname:9983";
String solrCollectionName = "my_coll";
SolrClient solrClient = new CloudSolrClient.Builder().withZkHost(zkHosts).build();
((CloudSolrClient)solrClient).setDefaultCollection(solrCollectionName);
DocVO docVO = createDocumentVO();
List<DocVO> newDocVOList = new ArrayList<DocVO>();
newDocVOList.add(docVO);
solrClient.addBeans(newDocVOList);
solrClient.commit();
Kindly let me know what I am missing here.

Related

Solr Cloud: How to disable document (pdf, office) metadata as fields

I am new to Solr and using Solr 7.3.1 in solr cloud mode
and trying to index pdf, office documents in solr, using contentextraction in solr.
I created a collection with
bin\solr create -c tsindex -s 2 -rf 2
in SolrJ my code looks like
public static void main(String[] args) {
System.out.println("Solr Indexer");
final String solrUrl = "http://localhost:8983/solr/tsindex/";
HttpSolrClient solr = new HttpSolrClient.Builder(solrUrl).build();
String filename="C:\\iSampleDocs\\doc-file.doc";
ContentStreamUpdateRequest solrRequest = new ContentStreamUpdateRequest("/update/extract");
try {
solrRequest.addFile(new File(filename), "application/msword");
solrRequest.setParam("litral.ts_ref", "ts-456123");
//solrRequest.setParam("defaultField", "text");
solrRequest.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
NamedList<Object> result= solr.request(solrRequest);
System.out.println(result);
} catch (IOException e) {
e.printStackTrace();
}catch ( SolrServerException e) {
e.printStackTrace();
}
}
I am getting multiple issues
Although I have created field ts_ref as text_general in Solr Admin UI, this field does not get set at all.
My goal is to index the complete document including its metadata in one field and then set couple of more fileds refrencing document in another system like e.g. ts_ref field. But what actually happens is the solr extracts the metadata of files and create seperate fileds for each metadata value.
I have tried disabling data driven schema functionality by bin\solr config -c tsindex -zkHost localhost:9983 -property update.autoCreateFields -value false
When I uncomment line solrRequest.setParam("defaultField", "text"); from beginning, there is not separate fields for all metadata extracted, but as soon as I comment this line and upload the files, the meta data are again in separate fields afterwards (even if I uncomment its again).

"litral.ts_ref" there is a typo here, missing an e
you can achieve ignoring all metadata fields by using uprefix field, and a dynamic field that goes with it. See the doc that shows exactly that case.

How to create a new core in solr using java code?

I am new in solr and i want to create a new core in solr by using java code and i don't want create it by terminal and GUI of solr, this is code that i am using and i am using 6.2.1 version of solr, please help me . thanx in advance.
coreName="metademo";
String solrDir = "/home/manish/Downloads/solr-6.2.1/server/solr/";
String baseSolrUrl ="http://localhost:8983/solr/";
CoreAdminRequest.Create create = new CoreAdminRequest.Create();
create.setCoreName("metademo");
create.setInstanceDir(solrDir +File.separator );
SolrClient client2=new HttpSolrClient.Builder(baseSolrUrl).build();
create.setDataDir(solrDir + File.separator + coreName + File.separator + "data");
HttpSolrServer solrServer1 = new HttpSolrServer(solrDir,client);
CoreAdminRequest.createCore(coreName, solrDir, client2);
create.createCore(coreName, solrDir, client2);
System.out.println("Created core with name: " + coreName);

First of all, you have to create the core folder in the solr directory (in your case: /home/manish/Downloads/solr-6.2.1/server/solr/metademo).
This folder has to have the same name that you'll use in your java code.
Then inside of this new core directory (in your case named "metademo") copy from /.../solr-6.2.1/server/solr/configsets/basic_configs, the so called /conf directory.
Once copied, inside the /.../solr-6.2.1/server/solr/metademo/conf folder, you have to change the name of managed-schema file in schema.xml.
I try this:
String coreName = "metademo";
String solrDir = "/.../solr-6.2.1/server/solr/metademo";
String baseSolrUrl = "http://localhost:8983/solr/";
SolrClient client = new HttpSolrClient(baseSolrUrl);
CoreAdminRequest.Create createRequest = new CoreAdminRequest.Create();
createRequest.setCoreName(coreName);
createRequest.setInstanceDir(solrDir);
createRequest.process(client);
and works. Without previous operation your code can only throw exceptions.

The following code snippet works with Solr 8.5.2:
String core = "test";
CoreAdminRequest.Create createRequest = new CoreAdminRequest.Create();
createRequest.setCoreName(core);
createRequest.setInstanceDir("./" + core);
createRequest.setConfigSet("_default");
createRequest.process(solrClient);
The call to setConfigSet() is necessary, so that the server can know how to initialize configurations for the new core based on the specified configset. Otherwise, some exception message like "Unable to create core [test] Caused by: Can't find resource 'solrconfig.xml' in classpath" would be thrown.

trouble in solr connect java

I try to use solr 6.5.0 to connect java . I have added following .jar files to the library:
commons-io-2.5
httpclient-4.4.1
httpcore-4.4.1
httpmine-4.4.1
jcl-over-slf4j-1.7.7
noggit-0.6
slf4j-api-1.7.7
stax2-api-3.1.4
woodstox-core-asl-4.4.1
zookeeper-3.4.6
solr-solrj-6.5.0
but when i try use following code to connect the solr:
import org.apache.http.impl.bootstrap.HttpServer;
import org.apache.solr.client.solrj.SolrQuery;
import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.client.solrj.impl.HttpSolrServer;
import org.apache.solr.client.solrj.response.QueryResponse;
import org.apache.solr.common.SolrDocumentList;
public class SolrQuery {
public static void main(String[] args) throws SolrServerException {
HttpSolrServer solr = new HttpServer("http://localhost:8983/solr/collection1");
SolrQuery query = new SolrQuery();
query.setQuery("*");
QueryResponse response = solr.query(query);
SolrDocumentList results = response.getResults();
for (int i = 0; i < results.size(); ++i) {
System.out.println(results.get(i));
}
}
}
before i compile it, I got an error in the:
import org.apache.solr.client.solrj.impl.HttpSolrServer;
import org.apache.solr.client.solrj.SolrQuery;
HttpSolrServer solr = new HttpServer("http://localhost:8983/solr/collection1");
Can anyone help me how to solve it?

The piece of code in your question was written for an old version of Solr before ver. 5.0. You'll find many sources and example around written for old Solr versions, but in most of the cases all you have to do is change the old SolrServer class with the new SolrClient (and now correct) class.
Both were the representations of the Solr instances you want to use.
Read the Solr Documentation - Using SolrJ
I warmly suggest to not use for your classes the same name of an already existing class (in your example your class is named SolrQuery).
The catch all string for Solr queries is *:* which means: search any match for all available fields. So change the statement query.setQuery into:
query.setQuery("*:*");
I suppose you're using a Solr client for a standalone instance so, as you're already aware, the correct way to instance a SolrClient is:
String urlString = "http://localhost:8983/solr/gettingstarted";
SolrClient solr = new HttpSolrClient.Builder(urlString).build();
And this is an easier way I suggest to iterate through all returned document:
for (SolrDocument doc : response.getResults()) {
System.out.println(doc);
}
Have a look at the documentation of SolrDocument class that explain how to use it and correctly read field values.

I founded that i need to import a .jar file which is not contain in the /dist library which named slf4j-simple-1.7.25 , and also
HttpSolrServer solr = new HttpServer("http://localhost:8983/solr/gettingstarted");
SolrQuery query = new SolrQuery();
need to change to the
String urlString = "http://localhost:8983/solr/gettingstarted";
SolrClient solr = new HttpSolrClient.Builder(urlString).build();
after that it finally can run already!!!

how to solve org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException

I got the following error:
This is my code:

At first glance, I see one problem: You never tell the SolrServer which core or collection to address. As is stated in the SolrJ docs:
There are two ways to use an HttpSolrClient:
1) Pass a URL to the constructor that points directly at a particular core
SolrClient client = new HttpSolrClient("http://my-solr-server:8983/solr/core1");
In this case, you can query the given core directly, but you cannot query any other cores or issue CoreAdmin requests with this client.
2) Pass the base URL of the node to the constructor
SolrClient client = new HttpSolrClient("http://my-solr-server:8983/solr");
QueryResponse resp = client.query("core1", new SolrQuery("*:*"));
BTW: You should use that Solrclient object that you commented out. The SolrServer objects are deprecated.

How to index data in a specific shard using solrj

I am using solrj as client to index documents into solr cloud (Using solr4.5)
I had a requirement to save documents based on tenant_id, so i am trying to do document routing. Which is possible only if the collection is created using numShards parameter (http://searchhub.org/2013/06/13/solr-cloud-document-routing/)
I have two instances of solr in solr cloud(example1/solr and example2/solr) and exrenal zookeeper which is running in 2181 port.
Both the instances consist collection called collection1
I created one more collection called newCollection(With two shards and two replicas) using
http://localhost:8501/solr/admin/collectionsaction=CREATE&name=newCollection&numShards=2&replicationFactor=2&maxShardsPerNode=2&router.field=id
So in example1/solr-> I have newCollection_shard1_replica1 & newCollection_shard2_replica1,
In example2/solr -> I have newCollection_shard1_replica2 & newCollection_shard2_replica2
I copied example1/solr/collection1/conf to all shards and replicas
I restarted zookeeper server as well as solr instances:
zookeeper->zkServer.cmd
example1/solr-> java -Dbootstrap_confdir=./solr/newCollection_shard1_replica1/conf -Dcollection.configName=myconf -DzkHost=localhost:2181 -jar start.jar
example2/solr->java -DzkHost=localhost:2181 -jar start.jar
(Both instances are running at different port, one is at 8081 and other at 8051)
I am using solrj client to index documents
Here is my sample code
String url="http://localhost:8081/solr"
ConcurrentUpdateSolrServer solrServer= new ConcurrentUpdateSolrServer(url, 10000, 4);
SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", "shard1!513");
doc.addField("name", "Santhosh");
solrServer.add(documents);
solrServer.commit();
But it is saving document in collection1 with id shard1!513, is there any configuration changes required in solrconfig.xml (I am using default solrconfig.xml which came with solr4.5)
How to save documents in my newCollection? and how to do document routing?
Please help me out with issue.
Thanks!

You can Use CloudSolrServer and UpdateRequest
SolrServer solrServer = new CloudSolrServer(zkHost) // zkHost is your solr zookeeper host string
SolrInputDocument doc = new SolrInputDocument();
UpdateRequest add = new UpdateRequest();
add.add(document);
add.setParam("collection", "newCollection");
add.process(solrServer);
UpdateRequest commit = new UpdateRequest();
commit.setAction(UpdateRequest.ACTION.COMMIT, true, true);
commit.setParam("collection", "newCollection");
commit.process(solrServer);

I appended Core name of new Collection to the URL. so it is working fine now.
Instead of:
String url="http://localhost:8081/solr"
I used:
String url="http://localhost:8081/solr/newCollection_shard1_replica1"
ConcurrentUpdateSolrServer solrServer= new ConcurrentUpdateSolrServer(url, 10000, 4);
SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", "shard1!513");
doc.addField("name", "Santhosh");
solrServer.add(documents);
solrServer.commit();

You should use CloudSolrServer http://lucene.apache.org/solr/4_2_1/solr-solrj/org/apache/solr/client/solrj/impl/CloudSolrServer.html
Because in solrcloud, updates must be routed via zookeeper, as zookeeper knows the status of leaders in cloud.One more thing you need not to append collection name to url, just use setDefaultCollection(collectionName); method of CloudSolrServer to send your updates to 'collectionName' collection

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Solr Cloud - node sync not happening when document added using CloudSolrClient - solr

Related

Solr Cloud: How to disable document (pdf, office) metadata as fields

How to create a new core in solr using java code?

trouble in solr connect java

how to solve org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException

How to index data in a specific shard using solrj

Categories

Resources