Lucene indexWriter update does not affect Solr search - solr

I implement a small code for the purpose of extracting some keywords out of Lucene index. I did implement that using search component. My problem is when I tried to update Lucene IndexWriter, Solr index which is placed on top of that, does not affect. As you can see I did the commit part.
BooleanQuery query = new BooleanQuery();
for (String fieldName : keywordSourceFields) {
TermQuery termQuery = new TermQuery(new Term(fieldName,"N/A"));
query.add(termQuery, Occur.MUST_NOT);
}
TermQuery termQuery=new TermQuery(new Term(keywordField, "N/A"));
query.add(termQuery, Occur.MUST);
try {
//Query q= new QueryParser(keywordField, new StandardAnalyzer()).parse(query.toString());
TopDocs results = searcher.search(query,
maxNumDocs);
ScoreDoc[] hits = results.scoreDocs;
IndexWriter writer = getLuceneIndexWriter(searcher.getPath());
for (int i = 0; i < hits.length; i++) {
Document document = searcher.doc(hits[i].doc);
List<String> keywords = keyword.getKeywords(hits[i].doc);
if(keywords.size()>0) document.removeFields(keywordField);
for (String word : keywords) {
document.add(new StringField(keywordField, word, Field.Store.YES));
}
String uniqueKey = searcher.getSchema().getUniqueKeyField().getName();
writer.updateDocument(new Term(uniqueKey, document.get(uniqueKey)),
document);
}
writer.commit();
writer.forceMerge(1);
writer.close();
} catch (IOException | SyntaxError e) {
throw new RuntimeException();
}
private IndexWriter getLuceneIndexWriter(String indexPath) throws IOException {
FSDirectory directory = FSDirectory.open(new File(indexPath).toPath());
Analyzer analyzer = new StandardAnalyzer();
IndexWriterConfig iwc = new IndexWriterConfig(analyzer);
return new IndexWriter(directory, iwc);
}
Please help me through solving this problem.
Update:
I did some investigation and found out that the retrieving part of documents works fine while Solr did not restarted. But the searching part of documents did not work. After I restarted Solr it seems that the core corrupted and failed to start! Here is the corresponding log:
org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.<init>(SolrCore.java:896)
at org.apache.solr.core.SolrCore.<init>(SolrCore.java:662)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:513)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:278)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:272)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1604)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1716)
at org.apache.solr.core.SolrCore.<init>(SolrCore.java:868)
... 9 more
Caused by: org.apache.lucene.index.IndexNotFoundException: no segments* file found in NRTCachingDirectory(MMapDirectory#C:\Users\Ali\workspace\lucene_solr_5_0_0\solr\server\solr\document\data\index lockFactory=org.apache.lucene.store.SimpleFSLockFactory#3bf76891; maxCacheMB=48.0 maxMergeSizeMB=4.0): files: [_2_Lucene50_0.doc, write.lock, _2_Lucene50_0.pos, _2.nvd, _2.fdt, _2_Lucene50_0.tim]
at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:821)
at org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:78)
at org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:65)
at org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:272)
at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:115)
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1573)
... 11 more
4/7/2015, 6:53:26 PM
ERROR
SolrIndexWriter
SolrIndexWriter was not closed prior to finalize(),​ indicates a bug -- POSSIBLE RESOURCE LEAK!!!
4/7/2015, 6:53:26 PM
ERROR
SolrIndexWriter
Error closing IndexWriter
java.lang.NullPointerException
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:2959)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:2927)
at org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:965)
at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1010)
at org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:130)
at org.apache.solr.update.SolrIndexWriter.finalize(SolrIndexWriter.java:183)
at java.lang.ref.Finalizer.invokeFinalizeMethod(Native Method)
at java.lang.ref.Finalizer.runFinalizer(Finalizer.java:101)
at java.lang.ref.Finalizer.access$100(Finalizer.java:32)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:190)
There for my guess would be problem with indexing the keywordField and also problem related to closing the IndexWriter.

Related

MultipartRequest.calculateContentLength error

I am trying to make a multipart upload withsome binary data. However, I am getting a very strange error which I have never encountered. Here is my code
MultipartRequest request = new MultipartRequest();
request.setPost(true);
request.setUrl(URL+"/agenci/worker/addWorker/");
..,
request.addData("photoFile", dir + "user.jpg", "image/jpeg");
I get this exception at runtime
[EDT] 0:0:0,1 - Exception: java.lang.NullPointerException - null
at com.codename1.io.MultipartRequest.calculateContentLength(MultipartRequest.java:295)
at com.codename1.io.MultipartRequest.initConnection(MultipartRequest.java:128)
at com.codename1.io.ConnectionRequest.performOperationComplete(ConnectionRequest.java:798)
at com.codename1.io.NetworkManager$NetworkThread.run(NetworkManager.java:340)
at com.codename1.impl.CodenameOneThread.run(CodenameOneThread.java:176)
user.jpg is present in the path supplied. Is this a new bug?
[EDIT]
I think the path to the file keeps returning null, even though the file is presentin FileSystemStorage. This is the exception when I try to create the image to be uploaded from an InputStream
[EDT] 0:0:0,1 - Exception: java.lang.NullPointerException - null
at com.codename1.io.Storage.createInputStream(Storage.java:172)
at com.jajitech.agenci.webservice.SignUpService.saveWorker(SignUpService.java:49)
at com.jajitech.agenci.login.signup.SignUp.completeSignUp(SignUp.java:333)
at com.jajitech.agenci.login.signup.SignUp.lambda$completeSignUpForm$3(SignUp.java:304)
at com.codename1.ui.util.EventDispatcher.fireActionSync(EventDispatcher.java:459)
at com.codename1.ui.util.EventDispatcher.fireActionEvent(EventDispatcher.java:362)
at com.codename1.ui.Button.fireActionEvent(Button.java:687)
at com.codename1.ui.Button.released(Button.java:728)
at com.codename1.ui.Button.pointerReleased(Button.java:835)
at com.codename1.ui.LeadUtil.pointerReleased(LeadUtil.java:153)
at com.codename1.ui.Form.pointerReleased(Form.java:3694)
at com.codename1.ui.Component.pointerReleased(Component.java:4691)
at com.codename1.ui.Display.handleEvent(Display.java:2352)
at com.codename1.ui.Display.edtLoopImpl(Display.java:1244)
at com.codename1.ui.Display.mainEDTLoop(Display.java:1162)
at com.codename1.ui.RunnableWrapper.run(RunnableWrapper.java:120)
at com.codename1.impl.CodenameOneThread.run(CodenameOneThread.java:176)
Please what could be wrong that my calls to an existing file in the FIleSystemStorage returns null
[EDIT 2]
Here is my complete method, using InputStream. I had tried using normal path with same error. The line on createInputStream returns null.
try {
MultipartRequest request = new MultipartRequest();
request.setPost(true);
request.setUrl(URL+"/agenci/worker/addWorker/");
request.addArgument("name", getName());
request.addArgument("dob", getDob());
request.addArgument("address", getAddress());
request.addArgument("email", getEmail());
request.addArgument("phone", getPhone());
request.addArgument("agencyId", getPass());
request.addArgument("gender", getGender());
try
{
InputStream is = Storage.getInstance().createInputStream(dir + "a.jpg");
Image img = Image.createImage(is);
EncodedImage em = EncodedImage.createFromImage(img, true);
byte[] data = em.getImageData();
is.close();
request.addData("photoFile", data , "image/jpeg");
}catch(Exception er){System.out.println("error3");er.printStackTrace();}
NetworkManager.getInstance().addToQueueAndWait(request);
if(request.getResponseData() == null)
{
return "error1";
}
The Exception
Exception: java.lang.NullPointerException - null
at com.codename1.io.Storage.createInputStream(Storage.java:172)

RestEasy client does not use correct encoding

I'm using RestEasy as a client to read news from a service.
ResteasyClient listClient = new ResteasyClientBuilder().build();
ResteasyWebTarget listTarget = listClient.target("https://someservice.com/file.xml");
Response r = listTarget.request().get();
final HexMl feedList = r.readEntity(HexMl.class);
The service does not return an encoding or media type in the response header, only an encoding in the xml itself
<?xml version="1.0" encoding="windows-1252"?>
RestEasy does not seem to evaluate this so I get an exception:
javax.ws.rs.ProcessingException: org.jboss.resteasy.plugins.providers.jaxb.JAXBUnmarshalException: javax.xml.bind.UnmarshalException
- with linked exception:
[org.xml.sax.SAXParseException; lineNumber: 116; columnNumber: 30; Invalid byte 2 of 3-byte UTF-8 sequence.]
at org.jboss.resteasy.client.jaxrs.internal.ClientResponse.readFrom(ClientResponse.java:300)
at org.jboss.resteasy.client.jaxrs.internal.ClientResponse.readEntity(ClientResponse.java:196)
at org.jboss.resteasy.specimpl.BuiltResponse.readEntity(BuiltResponse.java:218)
at com.roche.services.NewsImportService.importFeed(NewsImportService.java:72)
at com.roche.commands.NewsImportCommand.execute(NewsImportCommand.java:26)
at com.roche.commands.NewsImportCommand$execute.call(Unknown Source)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:45)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:110)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:122)
at Script1.run(Script1.groovy:4)
at info.magnolia.module.groovy.console.MgnlGroovyConsole$1.call(MgnlGroovyConsole.java:154)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.jboss.resteasy.plugins.providers.jaxb.JAXBUnmarshalException: javax.xml.bind.UnmarshalException
Is there a way to overwrite the encoding RestEasy uses or intercept the response before the entity is read?
I tried
Response r = listTarget.request().accept(APPLICATION_XML + ";charset=windows-1252").get();
and
Response r = listTarget.request(APPLICATION_XML + ";charset=windows-1252").get();
and
#Consumes(APPLICATION_XML + ";charset=windows-1252")
public class HexMl { ... }
without success. The XML itself seems to be correctly encoded in windows-1252.
For now I'm using a ReaderInterceptor, but this doesn't seem right. So I'd still be glad about better suggestions.
ResteasyClientBuilder clientBuilder = new ResteasyClientBuilder();
ResteasyProviderFactory providerFactory = new ResteasyProviderFactory();
RegisterBuiltin.register( providerFactory );
providerFactory.getClientReaderInterceptorRegistry().registerSingleton( new ReaderInterceptor() {
#Override
public Object aroundReadFrom(ReaderInterceptorContext context) throws IOException, WebApplicationException {
InputStream is = context.getInputStream();
String responseBody = IOUtils.toString( is , "windows-1252");
LOGGER.debug( "received response:\n{}\n\n", responseBody );
context.setInputStream( new ByteArrayInputStream( responseBody.getBytes() ) );
return context.proceed();
}
} );
clientBuilder.providerFactory( providerFactory );
Then I use this clientBuilder to create my client.

How to handle exception while parsing JSON in Flink

I am reading data from Kafka using flink 1.4.2 and parsing them to ObjectNode using JSONDeserializationSchema. If the incoming record is not a valid JSON then my Flink job fails. I would like to skip the broken record instead of failing the job.
FlinkKafkaConsumer010<ObjectNode> kafkaConsumer =
new FlinkKafkaConsumer010<>(TOPIC, new JSONDeserializationSchema(), consumerProperties);
DataStream<ObjectNode> messageStream = env.addSource(kafkaConsumer);
messageStream.print();
I am getting the following exception if the data in Kafka is not a valid JSON.
Job execution switched to status FAILING.
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'This': was expecting ('true', 'false' or 'null')
at [Source: [B#4f522623; line: 1, column: 6]
Job execution switched to status FAILED.
Exception in thread "main" org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
The easiest solution is to implement your own DeserializationSchema and wrap JSONDeserializationSchema. You can then catch the exception and either ignore it or perform custom action.
As suggested by #twalthr, I implemented my own DeserializationSchema by copying JSONDeserializationSchema and added exception handling.
import org.apache.flink.api.common.serialization.AbstractDeserializationSchema;
import org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.ObjectMapper;
import org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.node.ObjectNode;
import java.io.IOException;
public class CustomJSONDeserializationSchema extends AbstractDeserializationSchema<ObjectNode> {
private ObjectMapper mapper;
#Override
public ObjectNode deserialize(byte[] message) throws IOException {
if (mapper == null) {
mapper = new ObjectMapper();
}
ObjectNode objectNode;
try {
objectNode = mapper.readValue(message, ObjectNode.class);
} catch (Exception e) {
ObjectMapper errorMapper = new ObjectMapper();
ObjectNode errorObjectNode = errorMapper.createObjectNode();
errorObjectNode.put("jsonParseError", new String(message));
objectNode = errorObjectNode;
}
return objectNode;
}
#Override
public boolean isEndOfStream(ObjectNode nextElement) {
return false;
}
}
In my streaming job.
messageStream
.filter((event) -> {
if(event.has("jsonParseError")) {
LOG.warn("JsonParseException was handled: " + event.get("jsonParseError").asText());
return false;
}
return true;
}).print();
Flink has improved null record handling for FlinkKafkaConsumer
There are two possible design choices when the DeserializationSchema encounters a corrupted message. It can either throw an IOException which causes the pipeline to be restarted, or it can return null where the Flink Kafka consumer will silently skip the corrupted message.
For more details, you can see this link.

Flink Queryable State Error

I am trying to use queryable state on Flink (version 1.4.2) but unfortunately I keep getting the following error:
INFO my.test.flink.QueryableState - Params are a96438fa12879b7598c9cf32684e2669, kafka-cluster_jobmanager_1, 6123
INFO my.test.flink.QueryableState - Before the call java.util.concurrent.CompletableFuture#26aa12dd[Not completed]
java.util.concurrent.ExecutionException: java.lang.IndexOutOfBoundsException: readerIndex(0) + length(4) exceeds writerIndex(0): PooledUnsafeDirectByteBuf(ridx: 0, widx: 0, cap: 0)
at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
at my.test.flink.QueryableState.main(QueryableState.java:67)
Caused by: java.lang.IndexOutOfBoundsException: readerIndex(0) + length(4) exceeds writerIndex(0): PooledUnsafeDirectByteBuf(ridx: 0, widx: 0, cap: 0)
at org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBuf.checkReadableBytes(AbstractByteBuf.java:1166)
at org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBuf.readInt(AbstractByteBuf.java:619)
at org.apache.flink.queryablestate.network.messages.MessageSerializer.deserializeHeader(MessageSerializer.java:231)
at org.apache.flink.queryablestate.network.ClientHandler.channelRead(ClientHandler.java:76)
at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at org.apache.flink.shaded.netty4.io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at org.apache.flink.shaded.netty4.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:745)
On the client side I am using flink-queryable-state-client-java_2_11.jar and the relevant part of code for the queryable client is
QueryableStateClient client = new QueryableStateClient(jobManagerHost, jobManagerPort);
TypeInformation<MyEvent> typeInformation = TypeInformation.of(new TypeHint<MyEvent>() {});
ListStateDescriptor<MyEvent> descriptor = new ListStateDescriptor<MyEvent>("myEvents",
typeInformation.createSerializer(new ExecutionConfig()));
CompletableFuture<ListState<MyEvent>> resultFuture =
client.getKvState(JobID.fromHexString(jobIdParam),"myEvents", "1",
BasicTypeInfo.STRING_TYPE_INFO , descriptor );
logger.info("Before the call " + resultFuture);
try {
logger.info("Finished"+ resultFuture.get());
} catch(Exception ex) {
ex.printStackTrace();
}
Finally the job running on Flink has a ListState configured as it can been seen below. Note that data are keyed on ListState by String
TypeInformation<MyEvent> typeInformation = TypeInformation.of(new TypeHint<MyEvent>() {});
ListStateDescriptor<MyEvent> eventState =
new ListStateDescriptor<MyEvent>("myEvents",typeInformation);
eventState.setQueryable("myEvents");
eventListState = getRuntimeContext().getListState(eventState);
It seems to me like a serialization error but I do not know what I need to do to fix it. Does anybody have an idea what might be wrong with code above ? Am I missing something?
I ran into that exact same problem when updating this queryable state demo for Flink 1.4. If I recall correctly, the important part is dealing with the CompletableFuture correctly -- you can't just call get() straightaway.
See the code for a working example, the key part of which looks something like this:
try {
CompletableFuture<FoldingState<BumpEvent, Long>> resultFuture =
client.getKvState(jobId, EventCountJob.ITEM_COUNTS, key,
BasicTypeInfo.STRING_TYPE_INFO, countingState);
resultFuture.thenAccept(response -> {
try {
Long count = response.get();
// now we could do something with the value
} catch (Exception e) {
e.printStackTrace();
}
});
resultFuture.get(5, TimeUnit.SECONDS);
} catch (Exception e) {
e.printStackTrace();
}

Why does this code - adding wordnet synonyms to index - fail?

I am writing this code as part of my CustomAnalyzer:
public class CustomAnalyzer extends Analyzer {
SynonymMap mySynonymMap = null;
CustomAnalyzer() throws IOException {
SynonymMap.Builder builder = new SynonymMap.Builder(true);
FileReader fr = new FileReader("/home/watsonuser/Downloads/wordnetSynonyms.txt");
BufferedReader br = new BufferedReader(fr);
String line = "";
while ((line = br.readLine()) != null) {
String[] synset = line.split(",");
for(String syn: synset)
builder.add(new CharsRef(synset[0]), new CharsRef(syn), true);
}
br.close();
fr.close();
try {
mySynonymMap = builder.build();
} catch (IOException e) {
System.out.println("Unable to build synonymMap");
e.printStackTrace();
}
}
public TokenStream tokenStream(String fieldName, Reader reader) {
TokenStream result = new PorterStemFilter(new SynonymFilter(
(new StopFilter(true,new LowerCaseFilter
(new StandardFilter(new StandardTokenizer
(Version.LUCENE_36,reader)
)
),StopAnalyzer.ENGLISH_STOP_WORDS_SET)), mySynonymMap, true)
);
}
}
Now, if I use the same CustomAnalyzer as part of my querying, then if I enter the query as
myFieldName: manager
it expands the query with synonyms for manager.
But, I want the synonyms to be part of only my index and I don't want my query to be expanded with synonyms.
So, when I removed the SynonymFilter from my CustomAnalyzer only when querying the index, the query remains as
myFieldName: manager
but, it fails to retrieve documents that have the synonyms of manager.
How do we solve this problem?
If you do not have your synonym builder during Query processing then the only term it will match is what you mapped to during indexing. And you are not showing that part here.
The best way to troubleshoot this is to look at Admin/Core/Analysis screen (in Solr 4+) and put your text in. It will show what happens with the text after each stage in indexing and queries is run.
You don't even need to run reindexer. You can just define a bunch of different types you are trying to figure out and then run the analysis of the sample sentences directly against those types.

Resources