I am using the following code to index documents in solr server.
String urlString = "http://localhost:8080/solr";
SolrServer solr = new CommonsHttpSolrServer(urlString);
java.io.File file=new java.io.File("C:\\Users\\Guruprasad\\Desktop\\Search\\47975832.doc");
if (file.canRead()) {
System.out.println("adding " + file);
try {
ContentStreamUpdateRequest req = new ContentStreamUpdateRequest("/update/extract");
String parts[] = file.getName().split("\\.");
String type = "text";
if (parts.length>1) {
type = parts[1];
}
req.addFile(file);
req.setParam("literal.id", file.getAbsolutePath());
req.setParam("literal.name", file.getName());
req.setParam("literal.content_type", type);
req.setParam("uprefix", "attr_");
req.setParam("fmap.content", "attr_content");
req.setAction(ACTION.COMMIT, true, true);
solr.request(req);* //**Line no 36** here i am getting exception
While executing this code i am getting following exception.
Exception: org.apache.solr.common.SolrException
Exception message:
Internal Server Error Internal Server Error request:
http://localhost:8080/solr/update/extract?literal.id=C:\Users\Guruprasad\Desktop\Search\47975832.doc&literal.name=47975832.doc&literal.content_type=doc&uprefix=attr_&fmap.content=attr_content&commit=true&waitFlush=true&waitSearcher=true&wt=javabin&version=2
Exception trace:
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:435)
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at com.solr.search.test.IndexFiles.indexDocs(IndexFiles.java:36)*
Any help will be useful
i dont't suggest you use dih to index your database data, you can use solrj to index your data , solrj is simple , if you can use jdbc , then things is simple , you can use solrj build solr document and batch data commit to solr server . there are a solrj wiki , hope it can help you solrj wiki
solr 5.0 comes with inbuilt utility DIH handler for indexing data from database which you are using but its configuration is important and tricky could you please post your configuration of DIH handler or share logs of import , it looks like configuration problem to me
Related
I was connecting the solr to index some database entries that i am getting with spring hibernate transaction. But facing the below exception.
Error while calling watcher
java.lang.ClassCastException: java.util.HashMap cannot be cast to java.lang.String
at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.execute(ExecutorUtil.java:173)
at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:110)
at org.apache.solr.common.cloud.SolrZkClient$3.process(SolrZkClient.java:261)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
org.apache.solr.common.SolrException: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper x.x.x.x:8000 within 10000 ms
at org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:181)
at org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:115)
at org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:105)
at org.apache.solr.common.cloud.ZkStateReader.(ZkStateReader.java:207)
at org.apache.solr.client.solrj.impl.CloudSolrClient.connect(CloudSolrClient.java:465)
at org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:822)
at org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:805)
at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135)
at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:107)
at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:72)
at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:86)
CloudSolrClient cloudSolrClient = null;
TransactionStatus transactionStatus = null;
List levelDefinitions = new LinkedList();
try
{
DefaultTransactionDefinition transactionDefinition = new DefaultTransactionDefinition();
transactionStatus = getTransactionManager().getTransaction(transactionDefinition);
//database query
cloudSolrClient = new CloudSolrClient("x.x.x.x:8000");
cloudSolrClient.setDefaultCollection("CMD_OUSTR_NEW_dev");
SolrInputDocument solrInputDocument = new SolrInputDocument();
//field setting of SolrInputDocument
LinkedList solrInputDocuments = new LinkedList();
solrInputDocuments.add(solrInputDocument);
cloudSolrClient.add(solrInputDocuments);
cloudSolrClient.commit();
getTransactionManager().commit(transactionStatus);
NOTE: the same solr code is working if spring hibernate transaction is not getting opened in the thread.
Issue resolved
The issue was due to incompatible log4j implementation for MDC inside a activemq jar version 5.8 and we are using MDC for transaction logging. We were putting a map for transaction info that ExcecutorUtil(line 172) of solrj jar
is casting into String for getting submitter context.
Replaced the jar with 5.10 version. Now its working
I am using the following solrj code for to index the document.
ContentStreamUpdateRequest req = new ContentStreamUpdateRequest("/update/extract");
ContentStreamBase.FileStream fs = new FileStream(new File(filename));
req.setWaitSearcher(false);
req.setMethod(METHOD.POST );
//req.addFile(new File(filename), null);
req.addContentStream(fs);
req.setParam("literal.id", filename);
req.setParam("resource.name", filename);
//req.setParam("uprefix", "attr_");
//req.setParam("fmap.content", "attr_content");
//req.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
NamedList<Object> result = this.mHttpSolrClient.request(req);
However, I am always getting an SolrException right after the request, which says
4767120 [qtp559670971-21] INFO org.apache.solr.update.processor.LogUpdateProcessor [ test] – [test] webapp=/solr path=/update/extract params={waitSearcher=false&resource.name=/tmp/RUNNING.txt&literal.id=/tmp/RUNNING.txt&wt=javabin&version=2} {} 0 6
4767121 [qtp559670971-21] ERROR org.apache.solr.core.SolrCore [ test] – org.apache.solr.common.SolrException: ERROR: [doc=/tmp/RUNNING.txt] Error adding field 'stream_size'='null' msg=For input string: "null"
I am using the latest version of solr 5.1.0. Any ideas?
I fixed the issue by setting the HttpSolrClient to enable multi part post. Ask me how, I don't know. Solr documentation just does not explain things well.
mHttpSolrClient.setUseMultiPartPost(true);
i use solr 4.3 for my website ,
whie i query one data with morelike this function ,
sometime this exception goes out :
org.apache.solr.common.SolrException: parsing error
at org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:43)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:385)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
parsing error??
how was it ?
why it occures ?
thanks for your reply .
this code :
the code :
public MoreLikeThisQueryResponse(QueryResponse queryResponse) {
this.queryResponse = queryResponse;
NamedList<Object> res = this.queryResponse.getResponse();
for (int i = 0; i < res.size(); i++) {
String name = res.getName(i);
if ("match".equals(name)) {
this.matchResult = (SolrDocumentList) res.getVal(i);
}
}
}
This type of "parsing error" in the BinaryResponseParser.processResponse method typically indicates that the server is returning a raw HTML text response rather than a Solr javabin response. Typically that means that the server container (Tomcat or Jetty) is detecting and reporting an error before Solr gets a chance to handle the error and obey the wt parameter which sets the response format to javabin.
Check the server log for the actual error.
I had a similar problem.
It seems that there might be a compatibility error is your are not using the EXACT compatible versions of Solr and Solrj.
To resolve this, you have to specify
server.setParser(new XMLResponseParser());
Quoting from Solrj
SolrJ generally maintains backwards compatibility, so you can use a newer SolrJ with an older Solr, or an older SolrJ with a newer Solr. There are some minor exceptions to this general rule:
If you're mixing 1.x and a later major version, you must set the response parser to XML, because the two versions use incompatible versions of javabin.
I am using basic authentication. My solr version is 4.1. I can get query results but when I try to index documents I am getting the following error message:
org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://192.168.0.1:8983/solr/my_core
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:416)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102)
at warningletter.Process.run(Process.java:128)
at warningletter.WarningLetter.parseListPage(WarningLetter.java:81)
at warningletter.WarningLetter.init(WarningLetter.java:47)
at warningletter.WarningLetter.main(WarningLetter.java:21)
Caused by: org.apache.http.client.ClientProtocolException
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:822)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:353)
... 8 more
Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot retry request with a non-repeatable request entity.
at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:625)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:464)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
... 11 more
This is my code piece:
DefaultHttpClient httpclient = new DefaultHttpClient();
httpclient.getCredentialsProvider().setCredentials(AuthScope.ANY, new UsernamePasswordCredentials("user", "password"));
HttpSolrServer server = new HttpSolrServer("http://192.168.0.1:8983/solr/warning_letter/", httpclient);
SolrInputDocument solrDoc = new SolrInputDocument();
solrDoc.addField("id", "id1");
solrDoc.addField("letter", "letter");
server.add(solrDoc);
server.commit();
What am I doing wrong?
The trick is to use the preemptive authentication so as to avoid the need to repeat the query after the "unauthorized" response is sent.
Here is the example
Preemptive Basic authentication with Apache HttpClient 4
I am using Solr 4.0 and DIH (data import handler) with TikaProcessor for extracting text from PDF files stored in database. When I run indexing it gets failed to parse some PDF files and got the stack trace mentioned below.
Since Solr 4.0 uses Tika 1.2 I have written a unit test to parse the same PDF file using Tika 1.2 API, I got the same error.
The same problem with Tika 1.3 jars also. But when I tried using Tika 1.1 jars it works fine. Please let me if any of you have seen this error and how to fix this?
(I have posted the same in tika mailing list, but not much luck)
When I open the PDF file it is showing PDF/A mode. Not sure if this something related to the problem.
Here is the exception:
org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pdf.PDFParser#1fbfd6<mailto:org.apache.tika.parser.pdf.PDFParser#1fbfd6>
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at com.pc.TikaWithIndexing.main(TikaWithIndexing.java:53)
Caused by: java.lang.ClassCastException: org.apache.pdfbox.cos.COSString cannot be cast to org.apache.pdfbox.cos.COSDictionary
at org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotationLink.getAction(PDAnnotationLink.java:93)
at org.apache.tika.parser.pdf.PDF2XHTML.endPage(PDF2XHTML.java:148)
at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:444)
at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:366)
at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:322)
at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:66)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:153)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
... 3 more
Here is the code snippet in JAVA:
String fileString = "C:/Bernard A J Am Coll Surg 2009.pdf";
File file = new File(fileString );
URL url = file.toURI().toURL();
ParseContext context = new ParseContext();;
Detector detector = new DefaultDetector();;
Parser parser = new AutoDetectParser(detector);;
Metadata metadata = new Metadata();
context.set(Parser.class, parser); //PPt,word,xlsx-- pdf,html
ByteArrayOutputStream outputstream = new ByteArrayOutputStream();
InputStream input = TikaInputStream.get(url, metadata);
ContentHandler handler = new BodyContentHandler(outputstream);
parser.parse(input, handler, metadata, context);
input.close();
outputstream.close();