I want to use Camel 2.12.1 to decrypt some potentially large pgp files. The following flow results in an out of memory exception and the call stack shows that the PGPDataFormat.unmarshal() function is trying to build a ByteArray which is destined to fail if the file is large. Is there a way to pass streams around during unmarshalling?
My route:
from("file:///home/cps/camel/sftp-in?"
+ "include=.*&" // find files using this pattern
+ "move=/home/cps/camel/sftp-archive&" // after done adding records to queue, move file to archive
+ "delay=5000&"
+ "readLock=rename&" // readLock parameters prevent picking up file which is currently changing
+ "readLockCheckInterval=5000")
.choice()
.when(header(Exchange.FILE_NAME_ONLY).regex(".*pgp$|.*PGP$|.*gpg$|.*GPG$")).to("direct:decrypt")
.otherwise()
.to("file:///home/cps/camel/input");
from("direct:decrypt").unmarshal().pgp("file:///home/cps/.gnupg/secring.gpg", "developer", "set42now")
.setHeader(Exchange.FILE_NAME).groovy("request.headers.get('CamelFileNameOnly').replace('.gpg', '')")
.to("file:///home/cps/camel/input/")
.to("log:done");
The exception which shows the converter trying to create a ByteArray:
java.lang.OutOfMemoryError: Java heap space
at org.apache.commons.io.output.ByteArrayOutputStream.needNewBuffer(ByteArrayOutputStream.java:128)
at org.apache.commons.io.output.ByteArrayOutputStream.write(ByteArrayOutputStream.java:158)
at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1026)
at org.apache.commons.io.IOUtils.copy(IOUtils.java:999)
at org.apache.commons.io.IOUtils.toByteArray(IOUtils.java:218)
at org.apache.camel.converter.crypto.PGPDataFormat.unmarshal(PGPDataFormat.java:238)
at org.apache.camel.processor.UnmarshalProcessor.process(UnmarshalProcessor.java:65)
Try with 2.13 or 2.12-SNAPSHOT as we have improved data format and streaming recently. So likely to be better in next release.
Related
Background :
I have been trying to setup BATCH + STREAMING in the same flink application which is deployed on kinesis analytics runtime. The STREAMING part works fine, but I'm having trouble adding support for BATCH.
Flink : Handling Keyed Streams with data older than application watermark
Apache Flink : Batch Mode failing for Datastream API's with exception `IllegalStateException: Checkpointing is not allowed with sorted inputs.`
The logic is something like this :
The logic is something like this :
streamExecutionEnvironment.setRuntimeMode(RuntimeExecutionMode.BATCH);
streamExecutionEnvironment.fromSource(FileSource.forRecordStreamFormat(new TextLineFormat(), path).build(),
WatermarkStrategy.noWatermarks(),
"Text File")
.process(process function which transforms input)
.assignTimestampsAndWatermarks(WatermarkStrategy
.<DetectionEvent>forBoundedOutOfOrderness(orderness)
.withTimestampAssigner(
(SerializableTimestampAssigner<Event>) (event, l) -> event.getEventTime()))
.keyBy(keyFunction)
.window(TumblingEventWindows(Time.of(x days))
.process(processWindowFunction);
On doing this I'm getting the below exception :
java.lang.Exception: Exception while creating StreamOperatorStateContext.
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:254)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:272)
at org.apache.flink.streaming.runtime.tasks.OperatorChain.initializeStateAndOpenOperators(OperatorChain.java:441)
at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:582)
at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55)
at org.apache.flink.streaming.runtime.tasks.StreamTask.executeRestore(StreamTask.java:562)
at org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:647)
at org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:537)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:764)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:571)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.flink.util.FlinkException: Could not restore keyed state backend for WindowOperator_90bea66de1c231edf33913ecd54406c1_(1/1) from any of the 1 provided restore options.
at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:160)
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:345)
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:163)
... 10 more
Caused by: java.io.IOException: Failed to acquire shared cache resource for RocksDB
at org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.allocateSharedCachesIfConfigured(RocksDBOperationUtils.java:306)
at org.apache.flink.contrib.streaming.state.EmbeddedRocksDBStateBackend.createKeyedStateBackend(EmbeddedRocksDBStateBackend.java:426)
at org.apache.flink.contrib.streaming.state.EmbeddedRocksDBStateBackend.createKeyedStateBackend(EmbeddedRocksDBStateBackend.java:90)
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:328)
at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:168)
at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
... 12 more
Caused by: java.lang.IllegalArgumentException: The fraction of memory to allocate should not be 0. Please make sure that all types of managed memory consumers contained in the job are configured with a non-negative weight via `taskmanager.memory.managed.consumer-weights`.
at org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:160)
at org.apache.flink.runtime.memory.MemoryManager.validateFraction(MemoryManager.java:672)
at org.apache.flink.runtime.memory.MemoryManager.computeMemorySize(MemoryManager.java:653)
at org.apache.flink.runtime.memory.MemoryManager.getSharedMemoryResourceForManagedMemory(MemoryManager.java:521)
at org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.allocateSharedCachesIfConfigured(RocksDBOperationUtils.java:302)
... 17 more
Seems like kinesis-analytics does not allow clients to define a flink-conf.yaml file to define taskmanager.memory.managed.consumer-weights. Is there any way around this ?
It's not clear to me what the underlying cause of this exception is, nor how to make batch processing work on KDA.
You can try this (but I'm not sure KDA will allow it):
Configuration conf = new Configuration();
conf.setString("taskmanager.memory.managed.consumer-weights", "put-the-value-here");
StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment(conf);
I am new to Flink Streaming framework and I am trying to understand the components and the flow. I am trying to run the basic wordcount example using the DataStream. I am trying to run the code on my IDE. The code runs with no issues when I feed data using the collection as
DataStream<String> text = env.fromElements(
"To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",z
"Or to take arms against a sea of troubles,"
);
But, it fails everytime when I am trying to read data either from the socket or from a file as below:
DataStream<String> text = env.socketTextStream("localhost", 9999);
or
DataStream<String> text = env.readTextFile("sample_file.txt");
For both the socket and textfile, I am getting the following error:
Exception in thread "main" java.lang.reflect.InaccessibleObjectException: Unable to make field private final byte[] java.lang.String.value accessible: module java.base does not "opens java.lang" to unnamed module
Flink currently (as of Flink 1.14) only supports Java 8 and Java 11. Install a suitable JDK and try again.
My requirement is, I get input xml and based on some condition checking(using choice) I need to send into 2 different files. As I'm getting big file(100-400MB) I'm sending in stream mode(by enable streaming in file and datamapper components).
It is working fine for small size input xml(10-20MB). But when I give large input xml. Condition checking and XML to CSV conversion is working fine but while writing CSVB-data I'm getting error message.,
INFO 2015-09-08 12:03:49,227 [[simplebatch_1].simplebatchFlow.stage1.02] org.mule.api.processor.LoggerMessageProcessor: default logger java.io.PipedInputStream#1bca8e6
INFO 2015-09-08 12:03:49,258 [[simplebatch_1].File1.dispatcher.01] org.mule.lifecycle.AbstractLifecycleManager: Initialising: 'File1.dispatcher.29118412'. Object is: FileMessageDispatcher
INFO 2015-09-08 12:03:49,258 [[simplebatch_1].File1.dispatcher.01] org.mule.lifecycle.AbstractLifecycleManager: Starting: 'File1.dispatcher.29118412'. Object is: FileMessageDispatcher
INFO 2015-09-08 12:03:49,258 [[simplebatch_1].File1.dispatcher.01] org.mule.transport.file.FileConnector: Writing file to: D:\MulePOC's\output\myoutput1
ERROR 2015-09-08 12:03:54,999 [XML_READER0_0] org.jetel.graph.Node: java.lang.OutOfMemoryError: Java heap space
ERROR 2015-09-08 12:03:55,000 [WatchDog_0] org.jetel.graph.runtime.WatchDog: Component [XML READER:XML_READER0] finished with status ERROR.
Java heap space
Please suggest me on this., Thanks..,
You need increase your JVM memory for Mule. You can found the config file in $MULE_HOME/conf/wrapper.conf
You will found something like this:
# Increase Permanent Generation Size from default of 64m
# Increase this value if you get "Java.lang.OutOfMemoryError: PermGen space error"
# This property is not used when running java 8 and may cause a warning.
wrapper.java.additional.7=-XX:PermSize=256m
wrapper.java.additional.8=-XX:MaxPermSize=256m
# GC settings
wrapper.java.additional.9=-XX:+HeapDumpOnOutOfMemoryError
wrapper.java.additional.10=-XX:+AlwaysPreTouch
wrapper.java.additional.11=-XX:+UseParNewGC
wrapper.java.additional.12=-XX:NewSize=512m
wrapper.java.additional.13=-XX:MaxNewSize=512m
wrapper.java.additional.14=-XX:MaxTenuringThreshold=8
You can change this configurations as you like.
We are using camel 2.13.2 - I have a multicast route with an AggregationStrategy.
And in each multicast branch, we have a custom camel component that returns huge data (around 4 MB) and writes to Stream Cache (Cached Output Stream) and we need to aggregate the data in the multicast (Aggregation Strategy).
In the Aggregation strategy, I need to do XPath evaluation using camel XPathBuilder.
Hence, I try to read the body and convert from StreamCache to byte[] to avoid 'Error during type conversion from type: org.apache.camel.converter.stream.InputStreamCache.' in the XPathBuilder.
When I try to read the body in the beginning of the Aggregation Strategy, I get the following error.
*/tmp/camel/camel-tmp-4e00bf8a-4a42-463a-b046-5ea2d7fc8161/cos6047774870387520936.tmp (No such file or directory), cause: FileNotFoundException:/tmp/camel/camel-tmp-4e00bf8a-4a42-463a-b046-5ea2d7fc8161/cos6047774870387520936.tmp (No such file or directory).
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.(FileInputStream.java:138)
at org.apache.camel.converter.stream.FileInputStreamCache.createInputStream(FileInputStreamCache.java:123) at org.apache.camel.converter.stream.FileInputStreamCache.getInputStream(FileInputStreamCache.java:117)
at org.apache.camel.converter.stream.FileInputStreamCache.writeTo(FileInputStreamCache.java:93)
at org.apache.camel.converter.stream.StreamCacheConverter.convertToByteArray(StreamCacheConverter.java:102)
at com.sap.it.rt.camel.aggregate.strategies.MergeAtXPathAggregationStrategy.convertToByteArray(MergeAtXPathAggregationStrategy.java:169)
at com.sap.it.rt.camel.aggregate.strategies.MergeAtXPathAggregationStrategy.convertToXpathCompatibleType(MergeAtXPathAggregationStrategy.java:161)
*
Following is the line of code where it is throwing an error:
Object body = exchange.getIn().getBody();
if( body instanceof StreamCache){
StreamCache cache = (StreamCache)body;
xml = new String(convertToByteArray(cache,exchange));
exchange.getIn().setBody(xml);
}
By disabling stream cache to write to file by setting a threshold of 10MB in multicast related routes, we were able to work with the aggregation strategy. But we do not want to do that, as we may have incoming data that maybe bigger.
<camel:camelContext id="multicast_xml_1" streamCache="true">
<camel:properties>
<camel:property key="CamelCachedOutputStreamCipherTransformation" value="RC4"/>
<camel:property key="CamelCachedOutputStreamThreshold" value="100000000"/>
</camel:properties>
....
</camel:camelContext>
Note: The FileNotFound issue does not appear if we have the StreamCache based camel component in the route with other processors, but without Multicast + Aggregation.
After debugging, I could understand the issue with aggregating huge data from StreamCache with MulticastProcessor.
In MulticastProcessor.java: doProcessParallel() is called and on completion of the branch exchange of multicast, the CachedOutputStream deletes / cleans up the temporary file.
This happens even before the multicast branch exchange reaches the aggregation Strategy, which tries to read the data from the branch exchange. In case of huge data in StreamCache, the temporary file is already deleted, leading to FileNotFound issues.
public CachedOutputStream(Exchange exchange, boolean closedOnCompletion) {
this.strategy = exchange.getContext().getStreamCachingStrategy();
currentStream = new CachedByteArrayOutputStream(strategy.getBufferSize());
if (closedOnCompletion) {
// add on completion so we can cleanup after the exchange is done such as deleting temporary files
exchange.addOnCompletion(new SynchronizationAdapter() {
#Override
public void onDone(Exchange exchange) {
try {
if (fileInputStreamCache != null) {
fileInputStreamCache.close();
}
close();
} catch (Exception e) {
LOG.warn("Error deleting temporary cache file: " + tempFile, e);
}
}
#Override
public String toString() {
return "OnCompletion[CachedOutputStream]";
}
});
}
}
public void close() throws IOException {
currentStream.close();
cleanUpTempFile();
}
I was able to circumvent the issue, if I try to set closedOnCompletion= false, while writing to CachedOutputStream in any component in any Multicast branch.
But this is a leaky solution, because the streamcache temporary file(s) may then never get cleaned up... hence I try to close + clean up the cachestream, after reading the data in the AggregationStrategy.
Can the MulticastProcessor be adjusted so that the multicast branch exchanges reach 'completion' status only, after they have been aggregated at the end of multicast?
Please help / advise on the issue, as I am new to using camel Multicast.
Thanks,
Lakshmi
I have similar exception thrown when trying to send larger than 1MB JSON response to Restlet request (yes, I know 1MB JSON is too big):
java.io.FileNotFoundException: C:\Users\me\AppData\Local\Temp\camel\camel-tmp-7ad6e098-538d-4d4c-9357-2b7addb1f19d\cos6725022584818060586.tmp (The system cannot find the file specified)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:146)
at org.apache.camel.converter.stream.FileInputStreamCache.createInputStream(FileInputStreamCache.java:123)
at org.apache.camel.converter.stream.FileInputStreamCache.getInputStream(FileInputStreamCache.java:117)
at org.apache.camel.converter.stream.FileInputStreamCache.read(FileInputStreamCache.java:112)
at java.io.InputStream.read(InputStream.java:170)
at java.io.InputStream.read(InputStream.java:101)
at org.restlet.engine.io.BioUtils.copy(BioUtils.java:81)
at org.restlet.representation.InputRepresentation.write(InputRepresentation.java:148)
at org.restlet.engine.adapter.ServerCall.writeResponseBody(ServerCall.java:510)
at org.restlet.engine.adapter.ServerCall.sendResponse(ServerCall.java:454)
at org.restlet.ext.servlet.internal.ServletCall.sendResponse(ServletCall.java:426)
at org.restlet.engine.adapter.ServerAdapter.commit(ServerAdapter.java:196)
at org.restlet.engine.adapter.HttpServerHelper.handle(HttpServerHelper.java:153)
at org.restlet.ext.servlet.ServerServlet.service(ServerServlet.java:1089)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1496)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:102)
Same workaround works for me:
getContext().getProperties().put(CachedOutputStream.THRESHOLD, "" + THREE_MEGABYTE_TRESHOLD_BEFORE_FILE_CACHE);
I don't use multicast in this route, just plain
restlet request -> Service -> Jackson marshall => error
I use Camel 2.14.0 & Restlet 2.2.2 with JDK 7 and Spring-boot 1.0.2 / Jetty
This Camel reverse proxy - no response stream caching might be related to my issue.
I've fairly simple looking route
sftp://hostname:22//incoming/folder/location/?username=username&password=xxxxx
&localWorkDirectory=/tmp&readLock=changed&readLockCheckInterval=2000
&move=processed/$simple{date:now:yyyy}/$simple{date:now:MM}/$simple{date:now:dd}${file:name}
&consumer.delay=450000&stepwise=false&streamDownload=true&disconnect=true
I also have an onException clause
onException(ValidationException.class)
.handled(true)
.logStackTrace(true)
.filter(header("VALIDATION_ERROR").isEqualTo(true))
.choice()
.when(header("CamelFileName").contains("Param1"))
.to(sftp://hostname:22//One/error/folder?password=xxxxxx&username=username)
.when(header("CamelFileName").contains("Param2"))
.to(sftp://hostname:22//Two/error/folder?password=xxxxxx&username=username)
.endChoice();
When I have single file, the route seems to work as expected. When more than one file and exception occurs, I get many different exceptions like
org.apache.camel.component.file.GenericFileOperationFailedException: Cannot list directory: incoming/folder/location
Caused by: java.lang.IndexOutOfBoundsException
I tried using all the attributes mentioned in the route viz. streamDownload, stepwise, readLock, localWorkDirectory et al. However, the error handling while multiple files is not working. I see the first file getting processed. However, it doesn't move to the processed folder once the exception occurs and then incoming/folder/location becomes non listable. I tried using continued(true) as well instead of handled(true)
The problem was with multiple files being handled in the same exchange. On exception, the route was trying to FTP back the error file on the same server. The solution was to split body into multiple exchanges so that each file has its own exchange and process them separately.
from(sftp://hostname:22//incoming/folder/location/?username=username&password=xxxxx
&localWorkDirectory=/tmp&readLock=changed&disconnect=true&stepwise=false
&move=processed/$simple{date:now:yyyy}/$simple{date:now:MM}/$simple{date:now:dd}${file:name}
&consumer.delay=450000).split(body()).processRef("incomingProcessor").end();