Camel reverse proxy - no response stream caching - apache-camel

I am trying to implement a memory efficient http reverse proxy that is only working on streams. The Jetty consumer places the input stream into the exchange and I can hook it up with a http producer to forward the request. No problem there.
However, all http producers that I am aware of (Jetty, http4, netty-http) read the response stream into heap memory and place its contents into the exchange in some form or another instead of a handle to the stream. And none of them seem to offer an option to make them do so.
I found this thread which describes the same problem and also suggests a solution. But looking at the code of the http4 HttpProducer in Camel 2.13.1, it does not look like the proposed change made it into the Camel code base after all.
Is there any way to achieve the stream-only approach with Camel? So, with a minimal memory footprint I could do something along the line of this:
<route id="reverse_proxy" streamCache="false">
<from ref="jetty.http.server"/>
<bean ref="streamHolder" method="enableCaching"/>
<bean ref="streamHolder" method="parsePayloadHeaderInfoAndDoStuff"/>
<bean ref="streamHolder" method="resetStream"/>
<to ref="http.client"/> <!-- Register completion synchronization hook to close stream. -->
<bean ref="streamHolder" method="enableCaching"/>
<bean ref="streamHolder" method="parsePayloadResponseHeaderAndDoStuff"/>
<bean ref="streamHolder" method="resetStream"/>
</route>
EDIT - Additional info on where exactly the input stream ends up in memory:
http4: Everything happens in org.apache.camel.component.http4.HttpProducer::process() -> populateResponse(..) -> extractResponseBody(..) -> doExtractResponseBodyAsStream(); and here the original stream is copied into an instance of CachedOutputStream.
Jetty: org.eclipse.jetty.client.AsyncHttpConnection::handle() -> org.eclipse.jetty.http.HttpParser::parseNext() will fill a byte array in org.eclipse.jetty.client.ContentExchange which is a CachedExchange which is a HttpExchange.
netty-http: Builds a pipeline that assembles the HttpResponse content as a composite ChannelBuffer. The wrapped channel buffers make up the complete response stream.
I have debugged all three clients and did not stumble across a branch not taken that would leave me with the original input stream as the exchange body.
This is reproducible with a route as simple as this:
<camelContext id="pep-poc">
<endpoint id="jetty.http.server" uri="jetty:http://{{http.host.server}}:{{http.port.server}}/my/frontend?disableStreamCache=true"/>
<endpoint id="http.client" uri="jetty:http://{{http.host.client}}:{{http.port.client}}/large_response.html?bridgeEndpoint=true&throwExceptionOnFailure=false&disableStreamCache=true"/>
<route id="reverse_proxy" startupOrder="10" streamCache="false">
<from ref="jetty.http.server"/>
<to ref="http.client"/>
</route>
</camelContext>
I have an Apache2 return a 750MB file as large_response.html.
EDIT 2
Indeed this is a problem with all available HTTP producers. See this thread on the Camel mailing list and the corresponding JIRA ticket.

They do not read the stream into memory unless you access the message body on demand, and tell Camel to read it into memory as a String type etc.
See this cookbook example how to do a stream based proxy
http://camel.apache.org/how-to-use-camel-as-a-http-proxy-between-a-client-and-server.html

Related

Locking file while appending data on Camel

I am writing 2 routes to process a files in a directory, those files could have any name, but I need 2 routes as I need some complex processing.
First route:
<route id="Init">
<from uri="file:{{file.path}}?move=.done&moveFailed=.error&readLock=changed&readLockCheckInterval=1500&charset=UTF-8"/>
<transacted/>
<split streaming="true" stopOnException="true" shareUnitOfWork="true" parallelProcessing="false">
<tokenize token="\r\n"/>
<choice>
<when>
<simple>${body.substring(0,4)} == 4000</simple>
[...]
<to uri="file:{{file.path}}/tmp?fileName=${date:now:yyyyMMddss}.txt&fileExist=append&charset=UTF-8"/>
</when>
<when>
<simple>${body.substring(0,4)} == 4002</simple>
[...]
<to uri="file:{{file.path}}/tmp?fileName=${date:now:yyyyMMddss}.txt&fileExist=append&charset=UTF-8"/>
</when>
</choice>
</split>
</route>
Second route, which consumes the file produced by the first route:
<route id="End">
<from uri="file:{{file.path}}/tmp?delete=true&moveFailed=.error&readLock=changed&readLockCheckInterval=1500&charset=UTF-8"/>
<transacted/>
<split streaming="true" stopOnException="true" shareUnitOfWork="true" parallelProcessing="false">
<tokenize token="\r\n4000"/>
[...]
<to uri="[...]"/>
</split>
</route>
I am trying to make sure file produced by route Init won't be consumed by route End until the Init has finished processing the first file.
I guessed using a temp file extension, and then using an exlude on the second route, but it doesn't work with fileExists.
Any ideas?
Thanks!
Use done file
You need a mechanism to make sure the second route only consume file that have been completely processed by first route.
A simple method is to let first route emit a done file as a signal to tell second route that the file has been processed completed and is ready to pickup.
To use done file, you could add doneFileName parameter in first route when process completed and also add in the second route using same filename pattern.
For more details, please read Section "Using 'done' Files" of Camel File Component
You cannot use readLock=changed with the file component as it's only available for FTP/SFTP from Camel 2.8 onwards.
changed is using file length/modification timestamp to detect whether the file is currently being copied or not. Will at least use 1 sec. to determine this, so this option cannot consume files as fast as the others, but can be more reliable as the JDK IO API cannot always determine whether a file is currently being used by another process. The option readLockCheckInterval can be used to set the check frequency. This option is only avail for the FTP component from Camel 2.8 onward. Note: from Camel 2.10.1 onward the FTP option fastExistsCheck can be enabled to speedup this readLock strategy, if the FTP server support the LIST operation with a full file name (some servers may not).
Try one of the other mechanisms such as markerFile, fileLock, or rename

Exchange id in camel request ends with even number

I am using Apache Camel in OSGI scenario using Karaf in version 2.15.1. I am using the exchange.getExchangeId() to print the exchange id in a request/reply. The exchange pattern is set to InOnly. The route looks like this:
<route id="ip_client_rpc">
<from uri="restlet:http://localhost:7070/lsp/patron/id?restletMethod=POST&synchronous=true"/>
<to uri="log:${headers}"/>
<setExchangePattern pattern="InOnly"/>
<process ref="rabbit_client"/>
<to uri="log:${headers}"/>
</route>
However when I print the exchange id sent to the rabbitmq queue it always ends with an even number.
Request from client:ID-VirtualDev-49301-1443430754519-5-6
Request from client:ID-VirtualDev-49301-1443430754519-5-8
Request from client:ID-VirtualDev-49301-1443430754519-5-10
Request from client:ID-VirtualDev-49301-1443430754519-5-12
Request from client:ID-VirtualDev-49301-1443430754519-5-14
Is there a reason why the final digit is always even? Is there another exchange being created that I am missing?
Thanks
Camel uses the same id generator for generating unique ids for different things, its just by chance that its even in this case. Could be that a breadcrumb or message id was also generated that takes the odd number.

Apache Camel - from netty to file

I'm using the netty component for socket communication between two systems, request and
response.
This is the route
<from uri="netty:tcp://localhost:61616?encoder=#encoder"/>
<to uri="netty:tcp://localhost:61618?decoder=#decoder"/>
<log message="Receive ${body}">
<to uri="file://data?fileName=data2&charset=utf-8"/>
Everything, works fine, the data I send is buffer type, as well as the response received. I can see this data as String using the log ${body}, but there's nothing in the file where is suppossed to store this data.
I'm guessing that camel uses a converter (from buffer to string) for logging the body as plain text, but why is not saving something in the file, using a default converter for this????
I appreciate any comments of how to resolve this. Thank you !!!
Since your paylaod is ByteBuffer you need to explicitly convert to either String or byte[]
<from uri="netty:tcp://localhost:61616?encoder=#encoder"/>
<to uri="netty:tcp://localhost:61618?decoder=#decoder"/>
<convertBodyTo type="byte[]"/>
<log message="Receive ${body}">
<to uri="file://data?fileName=data2&charset=utf-8"/>
You can even use type="java.lang.String"
Please refer to the link http://camel.apache.org/type-converter.html
Hope it helps...

Shutdown of camel routes with time consuming process

We have a camel route that looks at a file and processes potentially hundreds of records on this file, almost like a batch routine (yet there will only be one message in camel). Thus the message will take potentially minutes or maybe hours to complete. We want to shut down the queue once this message (and any others waiting) are complete.
We have the following to consider:
The shutdown strategy defines the time to wait for a route to stop before a forced shutdown
<bean id="shutdown" class="org.apache.camel.impl.DefaultShutdownStrategy">
<property name="timeout" value="#[bpf.defaultShutdownStrategy.timeout]"/>
</bean>
The route has a parameter shutdownRunningTask="CompleteAllTasks" which should wait untill all messages processed.
Not sure which is going to take presidence as the timeout once exceeded is not graceful, it will force shutdown and for our scneario it is likely we will exceed a timeout, as we cannot predict how long processing will take.
Any ideas/considerations?
Thanks in advance.
You should look at the onCompletion functionality. It adds a new route in a separated thread when the Exchange is complete.
Here is some examples from the Camel documentation:
Java DSL
// define a global on completion that is invoked when the exchange is complete
onCompletion().to("log:global").to("mock:sync");
from("direct:start")
.process(new MyProcessor())
.to("mock:result");
XML DSL
<!-- this is a global onCompletion route that is invoke when any exchange is complete
as a kind of after callback -->
<onCompletion>
<to uri="log:global"/>
<to uri="mock:sync"/>
</onCompletion>
<route>
<from uri="direct:start"/>
<process ref="myProcessor"/>
<to uri="mock:result"/>
</route>
Then, here is documentation on how to stop a route in Camel.

Throttling based on content

I would like to know if it's possible with Camel to do throttling based on the content of the exchange.
The situation is the following: I have to call a webservice via soap. Among, the parameters sent to that webservice there is a customerId. The problem is that the webservice send back an error if there are more than 1 request per minute for a given customerId.
I'm wondering if it would be possible to implement throttling per customerId with Camel. So the throttling should not be implemented for all messages but only for messages with the same customerId.
Let me know how I could implement this or if I need to clarify my question.
ActiveMQ Message Groups is designed to handle this case. So, if you can introduce a JMS queue hop in your route, then just set the JMSXGroupId header to the customerId. Then in another route, you can consume from this queue and send to your web service to get the behavior you described.
also see http://camel.apache.org/parallel-processing-and-ordering.html for more information...
While ActiveMQ Message Groups would definitely address the parallel processing of unique customer ID's, in my assessment Claus is correct that introducing a throttle for each unique group represents an unimplemented feature for Camel/ActiveMQ.
Message Groups alone will not meet the SLA described. While each group of messages (correlated by the customer ID) will be processed in order with one thread per group, as long as requests take less than a minute to receive a response, the requirement of one request per minute per customer would not be enforced.
That said, I would be very interested to know if it would be possible to combine Message Groups and a throttle strategy in a way that would simulate the feature request in JIRA. My attempts so far have failed. I was thinking something along these lines:
<route>
<from uri="activemq:pending?maxConcurrentConsumers=10"/>
<throttle timePeriodMillis="60000">
<constant>1</constant>
<to uri="mock:endpoint"/>
</throttle>
</route>
However, the throttle seems to be applied to the entire set of requests moving to the endpoint, and not to each individual consumer. I have to admit, I was a bit surprised to find that behavior. My expectation was that the throttle would apply to each consumer individually, which would satisfy the SLA in the original question, provided that the messages include the customer ID in the JMSXGroupId header.
I came across a similar problem and finally came up with the solution described here.
My assumptions are:
Order of messages is not important (though it can be solved by re-sequencer)
Total volume of messages per customer ID is not great so the runtime is not saturated.
The solution approach:
Run aggregator for 1 minute while using customerID to assemble messages with the same customer ID into a list
Use Splitter to split the list into individual messages
Send the first message from the splitter to the actual service
Re-route the rest of the list back into the aggregator.
Java DSL version is a bit easier to understand:
final AggregationStrategy aggregationStrategy = AggregationStrategies.flexible(Object.class)
.accumulateInCollection(ArrayList.class);
from("direct:start")
.log("Receiving ${body}")
.aggregate(header("customerID"), aggregationStrategy).completionTimeout(60000)
.log("Aggregate: releasing ${body}")
.split(body())
.choice()
.when(header(Exchange.SPLIT_INDEX).isEqualTo(0))
.log("*** Processing: ${body}")
.to("mock:result")
.otherwise()
.to("seda:delay")
.endChoice();
from("seda:delay")
.delay(0)
.to("direct:start");
Spring XML version looks like the following:
<!-- this is our aggregation strategy defined as a spring bean -->
<!-- see http://stackoverflow.com/questions/27404726/how-does-one-set-the-pick-expression-for-apache-camels-flexibleaggregationstr -->
<bean id="_flexible0" class="org.apache.camel.util.toolbox.FlexibleAggregationStrategy"/>
<bean id="_flexible2" factory-bean="_flexible0" factory-method="accumulateInCollection">
<constructor-arg value="java.util.ArrayList" />
</bean>
<camelContext xmlns="http://camel.apache.org/schema/spring">
<route>
<from uri="direct:start"/>
<log message="Receiving ${body}"/>
<aggregate strategyRef="_flexible2" completionTimeout="60000" >
<correlationExpression>
<xpath>/order/#customerID</xpath>
</correlationExpression>
<log message="Aggregate: releasing ${body}"/>
<split>
<simple>${body}</simple>
<choice>
<when>
<simple>${header.CamelSplitIndex} == 0</simple>
<log message="*** Processing: ${body}"/>
<to uri="mock:result"/>
</when>
<otherwise>
<log message="--- Delaying: ${body}"/>
<to uri="seda:delay" />
</otherwise>
</choice>
</split>
</aggregate>
</route>
<route>
<from uri="seda:delay"/>
<to uri="direct:start"/>
</route>
</camelContext>

Resources