Apache Camel: How to stream large files from AWS? - apache-camel

I have a route as follows. It works for small files, but for large ones (around 6GB or more), my program runs out of memory. How do I stream the content without storing in memory?
public void configure() throws Exception {
S3LastModifiedFilter lastModifiedFilter =
new S3LastModifiedFilter(s3Properties.getLastModifiedWithinSeconds(), s3Properties.getPrefix());
from(inboundS3Uri)
.filter().method(lastModifiedFilter, "accept")
.idempotentConsumer(header(S3Constants.KEY),
MemoryIdempotentRepository.memoryIdempotentRepository(inboundCacheSize))
.convertBodyTo(byte[].class, UTF_8.name())
.process(new ForHttpMessageProcessor(httpProperties))
.to(outboundHttpUri);
}
Error:
Caused by: org.apache.camel.TypeConversionException: Error during type conversion from type: java.lang.String to the required type: byte[] with value [Body is instance of java.io.InputStream] due java.lang.OutOfMemoryError: Java heap space
at org.apache.camel.impl.converter.BaseTypeConverterRegistry.createTypeConversionException(BaseTypeConverterRegistry.java:629)
at org.apache.camel.impl.converter.BaseTypeConverterRegistry.mandatoryConvertTo(BaseTypeConverterRegistry.java:190)
at org.apache.camel.impl.MessageSupport.getMandatoryBody(MessageSupport.java:108)
... 23 common frames omitted
Caused by: org.apache.camel.RuntimeCamelException: java.lang.OutOfMemoryError: Java heap space
at org.apache.camel.util.ObjectHelper.wrapRuntimeCamelException(ObjectHelper.java:1774)

Turns out the line convertBodyTo(byte[].class, UTF_8.name()) was the problem. It was trying to buffer the content in memory and convert to string. I commented it out, and now the code works.

Related

Camel-JCFIS large file 94MB fails

I am copying file using camel-jcifs. When the files are large it started failing. Below are the options I am passing
smb://server/Reports?bufferSize=4280&delay=60000&delete=true&include=.*.xls&localWorkDirectory=/tmp&moveFailed=.failed&readLock=changed&readLockCheckInterval=60000&readLockLoggingLevel=WARN&readLockMinLength=0&readLockTimeout=3600000
Error message :
Caused by: jcifs.smb.SmbException: Transport1 timedout waiting for response to SmbComWriteAndX[command=SMB_COM_WRITE_ANDX
Any help is much appreciated.

SHFileOperation: randomly raises exceptions when deleting files

I am using SHFileOperation() to delete directories from a specific path. It is done in multiple threads and the directory that's deleted is always different.
From time to time, it throws exceptions:
Exception thrown at 0x00007FF8AF5D9D2A (ntdll.dll) in del.exe:
0xC0000008: An invalid handle was specified
and this one:
Exception thrown at 0x00007FF8ACC90A36 (shell32.dll) in del.exe:
0xC0000005: Access violation reading location 0x0000000000000001.
modules:
shell32.dll 00007FF8ACBD0000-00007FF8AE0D8000
ntdll.dll 00007FF8AF530000-00007FF8AF701000
This is the code:
SHFILEOPSTRUCTW tFileOptions = { 0 };
/* Initialize the file options structure for the deletion process */
tFileOptions.pFrom = pwstrPath;
tFileOptions.wFunc = FO_DELETE;
tFileOptions.fFlags = FOF_NOCONFIRMATION | FOF_SILENT | FOF_NOERRORUI;
/* Execute the deletions with the Shell operation */
iResult = SHFileOperationW(&tFileOptions);
if (0 != iResult)
{
printf("WTF\n");
goto lbl_cleanup;
}
SHChangeNotify(SHCNE_RMDIR, SHCNF_PATHW, pwstrPath, NULL);
The pwstrPath has a double null terminator at the end.
What can be the reason for these exceptions?
EDIT
Stack trace:
from stack trace (even without pdb symbols - with it will be much more better ) visible that exception not inside windows shell itself but in third party product - dragext64.dll (this is not native windows image) wich is implement Copy Hook Handler - i advise uninstall this or disable via registry key
HKEY_CLASSES_ROOT
Directory
shellex
CopyHookHandlers
MyCopyHandler
(Default) = {MyCopyHandler CLSID GUID}
and test after this. think exceptions must go away.
also look like some another shell extensions have bugs here - search in google SHELL32_CallFileCopyHooks. for example bug TortoiseGit.dll - note here in stack trace shell32.dll!SHELL32_CallFileCopyHooks()
so all this bugs inside implementation of ICopyHook::CopyCallback method

CamelFileName vs. message body, file operation

I have implemented a bz2 decompressor by means of the Apache commons-compress library to decompress bz2 files with camel below a certain point in the directory structure on the file system. I have picked up the file name to decompress from the CamelFileName header, opened the file with my decompressor and put the decompressed file back into the same directory. It works fine. The process() method that calls the decompressor I copied here shortened; this processor is invoked for all necessary files by a camel route:
public void process(Exchange exchange) throws Exception {
LOG.info(" #### BZ2Processor ####");
BZ2 bz2 = new BZ2();
String CamelFileName = exchange.getIn().getHeader("CamelFileName", String.class);
bz2.uncompress(CamelFileName);
}
I think that it would have been nicer if I take the file from the message body. How would you have implemented it that way?
The Body would be of type InputStream. You can directly work with this Java type. Camel reads the file on demand. I.e. when you try to access it in the route or in your bean:
String text = exchange.getIn().getBody(String.class); //or
byte[] bytes = exchange.getIn().getBody(byte[].class); //or
InputStream is = exchange.getIn().getBody(InputStream.class);
Use one of the above as you see fit. As for closing it, don't worry Camel will take care of it.

What makes an invalid core name?

While devising a naming scheme for core names, I tried naming a core "search/live" and received this exception when trying to start solr:
java.lang.RuntimeException: Invalid core name: search/live
at org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:411)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:499)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:255)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:249)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Evidently using / in a core name makes it invalid. What are the restricted characters that make a core name invalid? I can't seem to find any documentation on this.
The valid characters for a core name appear to be undocumented. According to the source of org.apache.solr.core.CoreContainer#registerCore(String, SolrCore, boolean) in Solr 4.10.4, the only invalid characters are:
Forward-slash: /
Back-slash: \
The following characters are problematic by causing issues in the admin interface and when performing general queries:
Colon: :

Indexing Wikipedia with Solr doesn't work

I'm trying to index the English Wikipedia, around 40Gb, but it's not working. I've followed the tutorial at http://wiki.apache.org/solr/DataImportHandler#Configuring_DataSources and other related Stackoverflow questions like Indexing wikipedia with solr and Indexing wikipedia dump with solr.
I was able to import the wikipedia (simple english), about 150k documents, and Portuguese wikipedia (more than 1 million documents) using the configuration explained in the tutorial. The problem is happening when I try to index the English Wikipedia (more than 8 million documents). It gives the follow error:
Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.OutOfMemoryError: Java heap space
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:270)
at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:411)
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:476)
at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:457)
Caused by: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.OutOfMemoryError: Java heap space
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:410)
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:323)
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:231)
... 3 more
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.OutOfMemoryError: Java heap space
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:539)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:408)
... 5 more
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.index.ParallelPostingsArray.<init>(ParallelPostingsArray.java:34)
at org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray.<init>(FreqProxTermsWriterPerField.java:254)
at org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray.newInstance(FreqProxTermsWriterPerField.java:279)
at org.apache.lucene.index.ParallelPostingsArray.grow(ParallelPostingsArray.java:48)
at org.apache.lucene.index.TermsHashPerField$PostingsBytesStartArray.grow(TermsHashPerField.java:307)
at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:324)
at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:185)
at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:165)
at org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:248)
at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:253)
at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:453)
at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1520)
at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:217)
at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:569)
at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:705)
at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:435)
at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:70)
at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:235)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:504)
... 6 more
I'm using a MacBook pro with 4Gb RAM and more than 120Gb of free space in the HD. I've already tried to change the the 256 in the solrconfig.xml, but no success up to now.
Does anyone could help me, please?
Edited
Just in case, if someone has the same problem, I've used the command java Xmx1g -jar star.jar suggested by Cheffe to solve my problem.
Your Java VM is running out of memory. Give more memory to it. Like explained in this SO question Increase heap size in Java
java -Xmx1024m myprogram
Further detail on the Xmx parameter can be found in the docs, just search for -Xmxsize
Specifies the maximum size (in bytes) of the memory allocation pool in bytes. This value must be a multiple of 1024 and greater than 2 MB. Append the letter k or K to indicate kilobytes, m or M to indicate megabytes, g or G to indicate gigabytes. The default value is chosen at runtime based on system configuration. For server deployments, -Xms and -Xmx are often set to the same value. For more information, see Garbage Collector Ergonomics at http://docs.oracle.com/javase/8/docs/technotes/guides/vm/gc-ergonomics.html
The following examples show how to set the maximum allowed size of allocated memory to 80 MB using various units:
Xmx83886080
Xmx81920k
Xmx80m
The -Xmx option is equivalent to -XX:MaxHeapSize.
If you have tomcat6, you can increase java heap size in the file
/etc/default/tomcat6
change the parameter -Xmx in the line (e.g. from Xmx128m to Xmx256m):
JAVA_OPTS="-Djava.awt.headless=true -Xmx256m -XX:+UseConcMarkSweepGC"
During the import, watch the Admin Dashboard web page, where you can see actual JVM-memory allocated.

Resources