Apache Camel Aggregation - out of memory issue - apache-camel

I have to create a large XML using Camel. Basically, I use smaller messages and finally aggregate all to create the final XML using Camel aggregator.
My issue is that it throws out of memory error when I create large XML file. For smaller XML files, it creates without any issues.
I checked the persistent XML repository such as LevelDB, but that helps in managing persistence during a crash to recover the aggregated message; not exactly deals with memory issues - we tried LevelDb repository along with aggregator as well, but did not help in resolving the memory issue.
I cannot do simple appending as the output is an XML file. If it had been a CSV, I can just append it to an existing file.
Can anyone please help? Thanks in advance.

Related

How to write to different files based on content for batch processing in Flink?

I am trying to process some files on HDFS and write the results back to HDFS too. The files are already prepared before job starts. The thing is I want to write to different paths and files based on the file content. I am aware that BucketingSink(doc here) is provided to achieve this in Flink streaming. However, it seems that Dataset does not have a similar API. I have found out some Q&As on stackoverflow.(1, 2, 3). Now I think I have two options:
Use Hadoop API: MultipleTextOutputFormat or MultipleOutputs;
Read files as stream and use BucketingSink.
My question is how to make a choice between them, or is there another solution ? Any help is appreciated.
EDIT: This question may be a duplicate of this .
We faced the same problem. We too are surprised that DataSet does not support addSink().
I recommend not switching to Streaming mode. You might give up some optimizations (i.e Memory pools) that are available in batch mode.
You may have to implement your own OutputFormat to do the bucketing.
Instead, you can extend the OutputFormat[YOUR_RECORD] (or RichOutputFormat[]) where you can still use the BucketAssigner[YOUR_RECORD, String] to open/write/close output streams.
That's what we did and it's working great.
I hope flink would support this soon in Batch Mode soon.

Processing large compressed files in apache camel

I am trying to get a single file with .zip compression from a ftp server and trying to store it in S3 with .gzip compression using camel.
Following is the route I currently have.
from("sftp://username#host/file_path/?password=<password>&noop=true&streamDownload=true")
.routeId("route_id")
.setExchangePattern(ExchangePattern.InOut)
.unmarshal().zipFile()
.marshal().gzip()
.to("aws-s3://s3_bucket_name?amazonS3Client=#client");
This works fine for smaller files. But I have files that are ~700 MB in size when compressed. For files of that size I get OutOfMemoryError for Java heap space
I know there is a streaming option in camel (.split(body().tokenize("\n")).streaming()) but I am not sure if I can umarshal and marshal while streaming. (I see a similar solution here but in this case the source file is plain text / csv).
The second part to the problem is streaming the file back to S3. I am aware of the multiPartUpload option in camel-aws component but it seems to require the source to be a file. I do not know how to achieve that.
Can this be achieved without processing (unzipping and then gzipping) the file using java code in a custom processor ?
Environment: Camel 2.19.3, Java 8
Thanks
I solved it using streamCaching(). So the way I would do that is
from('xyz')
.streamCaching()
.unmarshall().gzip()
.to('abc')
.end()

How to export a certain row from a table when it changes the value as an XML using Mule?

Apologies if this question has been asked before.
I am new to Mule and I need a bit of help on how to export a certain row from a Table as an XML. Is it a good idea to use the poll scope to handle this?
I need the XML to plug it in an external program. Any ideas or simple examples I can play with?
thanks and Have a good day
You can use Poll scope with Database connector in it to select the records you want to transform. Then yo can use DataWeave to transform that to xml or Object to XML transformer to convert to XML as-is.
Using Poll or HTTP Request depends on your requirement, if you need to get the records based on a particular schedule then you can Use Poll(where this jus acts as a schedule for the database call). If you want this to happen with any external trigger then go for HTTP.
Then use the Database transformer/message Processor from the pallette, configure your Database and in the configuration you can write the select query with what ever the fields required.
then you can play the with the structure returned with value to map with the xml transformer object.
Hope this helps you, people have given the same answer but i thought this should be add on to those and helps.
Use a Database Component to poll out records from database.
In my knowledge there is no connector to transform database to xml as you need.
Use a custom transformer to convert database resultset to xml.
Go through This link
Db to XMl tranformation in Mule
Once you got the required xml the its your scope to decide where to you to push that output.
Kindly share the xml config to help further.
Cheers!

Can I read AND write to a db in my main bundle using core data?

With the particular app I am working on, I have a significant amount of data that I need to have in my db so I can read it in. I also have the need to write a few things to the db. I took a copy of the sqlite db out of the documents folder and put it into my main bundle and can read my manually inserted data without problems.
I am now trying to insert data, but I am running into difficulty. I remember reading somewhere that you can't write to a db in the main bundle? Only the documents folder? Is that correct? What are my options if I need to have custom data in a core data db that I also need write to?
Should I move it out of the main into the documents folder?
Thanks!
I can't find documentation to back this up, but it is my understanding that the application bundle is read-only. I have read that if you have a pre-populated Core Data store in the app bundle, you need to copy it to the Documents directory - and then make modifications that copy.
Check out this.

Simplest way to convert XML file and/or web service to a flat file

I am starting a project in which I need to get two sets of data into flat files. Source A is a web service that is updated daily, so I only need to hit this once a day. Source B is an XML file on an FTP site that I will also retrieve daily.
I have never converted XML to a flat file before. I'm a SQL Server guy, so my initial thought was to work on getting the data into SQL Server, then export to flat files. However, am I wasting my time doing that? Should I just use a conversion tool such as XMLConvert and skip SQL Server entirely? The data files are fairly small, so performance is not an issue. I need to get this done as quickly as possible. What suggestions do you folks have? Thank you!
I have used Stylus Studio to create XSLT and was very happy with the features.
http://www.stylusstudio.com/
I have also used XML Spy, but not the XSLT features.
http://www.altova.com/xmlspy.html
Once you have the XSLT created the code to transform the XML is fairly straightforward.
http://msdn.microsoft.com/en-us/library/ms757854(v=VS.85).aspx
I've used this method to convert XML into HTML, but not a flat file, but it should work.
Converting XML to other (text-based) formats is probably best done using something like XSLT. http://www.w3schools.com/xsl/

Resources