Read file locations from table and copy to specific folder using pollEnrich()

Read file locations from table and copy to specific folder using pollEnrich() - apache-camel

I am trying to write a camel route that reads a database table to get the list of absolute file paths and then copy those files to another folder. However only the file path is created as content instead of the original content.
from("timer://testDataGen?repeatCount=1")
.to("sql:" + positionSql + "?dataSource=dataSource")
.split(body())
.to("file://" + positionlistDir )
.log("Finished copying the list of Files.")
Please let me know what am i missing here to convert an absolute file path to an actual file.
Update #1.
Below snippet is invoking the pollEnrich(). But instead the pollEnrich() is copying the no of files which is equal to the no of rows returned by the sql and not according to the file name from the previous exchange.
String positionListSqlOptions = "?dataSource=dataSource";
// String positionSrcDirOptions = "?noop=true&delay=500&readLockMarkerFile=false&fileName=${header.positionFileToBeCopied}";
String positionSrcDirOptions = "?noop=true&delay=500&readLockMarkerFile=false&fileName=${body}";
String positionStagingDirOptionsForWriting = "?doneFileName=${file:name}.DONE";
from("timer://testDataGen?repeatCount=1")
.to("sql:" + positionListSql + positionListSqlOptions)
.split(body())
\\ Getting the column value from the resultset which is a LinkedCaseInsensitiveMap and storing in the body
.process(new positionFeederProcessor())
.setHeader("positionFileToBeCopied", body())
.pollEnrich("file://" + positionSrcDir + positionSrcDirOptions)
// .pollEnrich().simple("file://" + positionSrcDir + positionSrcDirOptions)
.to("file://" + positionStagingDir + positionStagingDirOptionsForWriting)
.log("Finished copying the list of Files.");
I am still unable to get the actual file name passed to the pollingEnrich() endpoint. I tried extracting it from body as well as through a header too. What could have gone wrong.

Well, Finally I was able to do this without using pollEnrich() at all.
String positionListSqlOptions = "?dataSource=dataSource";
String positionSrcDirOptions = "?noop=true&delay=500&readLockMarkerFile=false&fileName=${header.CamelFileName}";
String positionStagingDirOptionsForWriting = "?fileName=${header.position.file.name}&doneFileName=${file:name}.DONE";
from("timer://testDataGen?repeatCount=1")
.to("sql:" + positionListSql + positionListSqlOptions)
.routeId("Copier:")
.setHeader("positionFileList", body())
.log("Creating the list of position Files ...")
.split(body())
.process(new PositionListProcessor())
.setHeader("position.file.name", body())
.setHeader("position.dir.name", constant(positionSrcDir))
.process(new PositionFileProcessor())
.choice()
.when(body().isNull())
.log("Position File not found. ${header.position.file.name}")
.otherwise()
.to("file://" + positionStagingDir + positionStagingDirOptionsForWriting)
.log("Position File Copied from Src to : " + "${header.CamelFileNameProduced} ... ${headers} ...");
And here are the processors.
public class PositionListProcessor implements Processor {
public void process(Exchange exchange) throws Exception {
LinkedCaseInsensitiveMap positionFilesResultSet = (LinkedCaseInsensitiveMap) exchange.getIn().getBody();
try {
String positionFileStr = positionFilesResultSet.get("PF_LOCATION_NEW").toString();
}
exchange.getOut().setBody(positionFileStr.trim());
} catch (Exception e) { }
} }
public class PositionFileProcessor implements Processor {
public void process(Exchange exchange) throws Exception {
String filename = exchange.getIn().getBody(String.class);
String filePath = exchange.getIn().getHeader("position.dir.name", String.class);
URI uri = new URI("file:///".concat(filePath.concat(filename)));
File file = new File(uri);
if (!file.exists()) {
logger.debug((String.format("File %s not found on %s", filename, filePath)));
exchange.getIn().setBody(null);
} else {
exchange.getIn().setBody(file);
}
} }

the file component, when used on a to definition, produce a file with the content of the exchange, it doesn't read a file. you can use for example a pollEnrich processor :
from("timer://testDataGen?repeatCount=1")
.to("sql:" + positionSql + "?dataSource=dataSource")
.split(body())
.pollEnrich().simple("file:folder?fileName=${body}")
.to("file://" + positionlistDir )
.log("Finished copying the list of Files.")

Related

Apache Camel: route using custom processor, splitter and aggregator doesn't output anything

I'm cutting my teeth on Camel using the following use case:
Given a GitHub username, I want to fetch a certain number of public
repos in descending order of activity, then for each repo I want to
fetch a certain number of commits, and finally, for each commit, I
want to print some information.
To achieve this, I wrote a Producer and the following route. The Producer works (I've tests), and so does the route without the aggregator. When using the aggregator, nothing comes out (my tests fail).
public void configure() throws Exception {
from("direct:start")
.id("gitHubRoute")
.filter(and(
isNotNull(simple("${header." + ENDPOINT + "}")),
isNotNull(simple("${body}")))
)
.setHeader(USERNAME, simple("${body}"))
.toD("github:repos?username=${body}")
.process(e -> {
// some processing
})
.split(body())
.parallelProcessing()
.setHeader(REPO, simple("${body.name}"))
.toD("github:commits" +
"?repo=${body.name}" +
"&username=${header." + USERNAME + "}"
)
.process(e -> {
// some processing
})
.split(body())
.toD("github:commit" +
"?repo=${header." + REPO + "}" +
"&username=${header." + USERNAME + "}" +
"&sha=${body.sha}"
)
.process(e -> {
// some processing
})
.aggregate(header(REPO), new GroupedExchangeAggregationStrategy()).completionTimeout(10000l)
.toD("${header." + ENDPOINT + "}");
from("direct:end")
.process().exchange(this::print);
}
During testing, I set the header ENDPOINT to mock:result. In reality, it's set to direct:end.
What am I doing wrong? There are no errors but the print method, or the mock during testing, is never invoked.

I solved it myself. Couple of things that I'd to change:
Completion check: I used a completionPredicate as shown below.
eagerCheckCompletion(): Without this, the exchange passed into the completionPredicate is the aggregated exchange, not the incoming exchange.
I also took the opportunity to do little refactoring to improve readability.
public void configure() throws Exception {
from("direct:start")
.id("usersRoute")
.filter(isNotNull(simple("${header." + ENDPOINT + "}")))
.setHeader(USERNAME, simple("${body}"))
.toD("github:users/${body}/repos")
.process(e -> this.<GitHub.Repository>limitList(e))
.to("direct:reposRoute1");
from("direct:reposRoute1")
.id("reposRoute1")
.split(body())
.parallelProcessing()
.setHeader(REPO, simple("${body.name}"))
.toD("github:repos/${header." + USERNAME + "}" + "/${body.name}/commits")
.process(e -> this.<GitHub.Commit>limitList(e))
.to("direct:reposRoute2");
from("direct:reposRoute2")
.id("reposRoute2")
.split(body())
.toD("github:repos/${header." + USERNAME + "}" + "/${header." + REPO + "}" + "/commits/${body.sha}")
.process(e -> {
GitHub.Commit commit = e.getIn().getBody(GitHub.Commit.class);
List<GitHub.Commit.File> files = commit.getFiles();
if (!CollectionUtils.isEmpty(files) && files.size() > LIMIT) {
commit.setFiles(files.subList(0, LIMIT));
e.getIn().setBody(commit);
}
})
// http://camel.apache.org/aggregator2.html
.aggregate(header(REPO), new AggregateByRepoStrategy())
.forceCompletionOnStop()
.eagerCheckCompletion()
.completionPredicate(header("CamelSplitComplete").convertTo(Boolean.class).isEqualTo(TRUE))
.toD("${header." + ENDPOINT + "}");
from("direct:end")
.process().exchange(this::print);
}
The AggregationStrategy I used as follows:
private static final class AggregateByRepoStrategy extends AbstractListAggregationStrategy<GitHub.Commit> {
#Override
public GitHub.Commit getValue(Exchange exchange) {
return exchange.getIn().getBody(GitHub.Commit.class);
}
}

How do I get a mixed multipart in a RESTEasy response?

I am trying to use resteasy. While I am able to do send a mixed multipart as a request to a webservice, I am unable to do get a mixed multipart in the response.
For eg: Requesting for a file (byte[] or stream) and the file name in a single Response.
Following is what I have tested:
Service code:
#Path("/myfiles")
public class MyMultiPartWebService {
#POST
#Path("/filedetail")
#Consumes("multipart/form-data")
#Produces("multipart/mixed")
public MultipartOutput fileDetail(MultipartFormDataInput input) throws IOException {
MultipartOutput multipartOutput = new MultipartOutput();
//some logic based on input to locate a file(s)
File myFile = new File("samplefile.pdf");
multipartOutput.addPart("fileName:"+ myFile.getName(), MediaType.TEXT_PLAIN_TYPE);
multipartOutput.addPart(file, MediaType.APPLICATION_OCTET_STREAM_TYPE);
return multipartOutput;
}
}
Client code:
public void getFileDetails(/*input params*/){
HttpClient client = new DefaultHttpClient();
HttpPost postRequest = new HttpPost("urlString");
MultipartEntity multiPartEntity = new MultipartEntity();
//prepare the request details
postRequest.setEntity(multiPartEntity);
HttpResponse response = client.execute(postRequest);
HttpEntity returnEntity = response.getEntity();
//extracting data from the response
Header header = returnEntity.getContentType();
InputStream is = returnEntity.getContent();
if (is != null) {
byte[] bytes = IOUtils.toByteArray(is);
//Can we see the 2 parts that were added?
//Able to get a single InputStream only, and hence unable to differentiate two objects in the response
//Trying to see the contents - printing as string
System.out.println("Output from Response :: " + new String(bytes));
}
}
The output is as follows - able to see 2 different objects with different content types, but unable to extract them separately.
Output from Response ::
--af481055-4e4f-4860-9c0b-bb636d86d639
Content-Type: text/plain
fileName: samplefile.pdf
--af481055-4e4f-4860-9c0b-bb636d86d639
Content-Length: 1928
Content-Type: application/octet-stream
%PDF-1.4
<<pdf content printed as junk chars>>
How can I extract the 2 objects from the response?
UPDATE:
Tried the following approach to extract the different parts - use the 'boundary' to break the MultipartStream; use the content type string to extract approp object.
private void getResponeObject(HttpResponse response) throws IllegalStateException, IOException {
HttpEntity returnEntity = response.getEntity();
Header header = returnEntity.getContentType();
String boundary = header.getValue();
boundary = boundary.substring("multipart/mixed; boundary=".length(), boundary.length());
System.out.println("Boundary" + boundary); // --af481055-4e4f-4860-9c0b-bb636d86d639
InputStream is = returnEntity.getContent();
splitter(is, boundary);
}
//extract subsets from the input stream based on content type
private void splitter(InputStream is, String boundary) throws IOException {
ByteArrayOutputStream boas = null;
FileOutputStream fos = null;
MultipartStream multipartStream = new MultipartStream(is, boundary.getBytes());
boolean nextPart = multipartStream.skipPreamble();
System.out.println("NEXT PART :: " + nextPart);
while (nextPart) {
String header = multipartStream.readHeaders();
if (header.contains("Content-Type: "+MediaType.APPLICATION_OCTET_STREAM_TYPE)) {
fos = new FileOutputStream(new File("myfilename.pdf"));
multipartStream.readBodyData(fos);
} else if (header.contains("Content-Type: "+MediaType.TEXT_PLAIN_TYPE)) {
boas = new ByteArrayOutputStream();
multipartStream.readBodyData(boas);
String newString = new String( boas.toByteArray());
} else if (header.contains("Content-Type: "+ MediaType.APPLICATION_JSON_TYPE)) {
//extract string and create JSONObject from it
} else if (header.contains("Content-Type: "+MediaType.APPLICATION_XML_TYPE)) {
//extract string and create XML object from it
}
nextPart = multipartStream.readBoundary();
}
}
Is this the right approach?
UPDATE 2:
The logic above seems to work. But got another block, when receiving the RESPONSE from the webservice. I could not find any references to handle such issues in the Response.
The logic assumes that there is ONE part for a part type. If there are, say, 2 JSON parts in the response, it would be difficult to identify which part is what. In other words, though we can add the part with a key name while creating the response, we are unable to extract the key names int he client side.
Any clues?

You can try the following approach...
At the server side...
Create a wrapper object that can encapsulate all types. For eg., it could have a Map for TEXT and another Map for Binary data.
Convert the TEXT content to bytes (octet stream).
Create a MetaData which contains references to the Key names and their type. Eg., STR_MYKEY1, BYTES_MYKEY2. This metadata can also be converted into octet stream.
Add the metadata and the wrapped entity as parts to the multipart response.
At the Client side...
Read the MetaData to get the key names.
Use the key name to interpret each part. Since the Keyname from the metadata tells if the original data is a TEXT or BINARY, you should be able to extract the actual content with appropriate logic.
The same approach can be used for upstream, from client to service.
On top of this, you can compress the TEXT data which will help in reducing the content size...

Does flyway migrations support PostgreSQL's COPY?

Having performed a pg_dump of an existing posgresql schema, I have an sql file containing a number of table population statements using the copy.
COPY test_table (id, itm, factor, created_timestamp, updated_timestamp, updated_by_user, version) FROM stdin;
1 600 0.000 2012-07-17 18:12:42.360828 2012-07-17 18:12:42.360828 system 0
2 700 0.000 2012-07-17 18:12:42.360828 2012-07-17 18:12:42.360828 system 0
\.
Though not standard this is part of PostgreSQL's PLSQL implementation.
Performing a flyway migration (via the maven plugin) I get:
[ERROR] Caused by org.postgresql.util.PSQLException: ERROR: unexpected message type 0x50 during COPY from stein
Am I doing something wrong, or is this just not supported?
Thanks.

The short answer is no.
The one definite problem is that the parser is currently not able to deal with this special construct.
The other question is jdbc driver support. Could you try and see if this syntax generally supported by the jdbc driver with a single createStatement call?
If it is, please file an issue in the issue tracker and I'll extend the parser.
Update: This is now supported

I have accomplished this for Postgres using
public abstract class SeedData implements JdbcMigration {
protected static final String CSV_COPY_STRING = "COPY %s(%s) FROM STDIN HEADER DELIMITER ',' CSV ENCODING 'UTF-8'";
protected CopyManager copyManager;
#Override
public void migrate(Connection connection) throws Exception {
log.info(String.format("[%s] Populating database with seed data", getClass().getName()));
copyManager = new CopyManager((BaseConnection) connection);
Resource[] resources = scanForResources();
List<Resource> res = Arrays.asList(resources);
for (Resource resource : res) {
load(resource);
}
}
private void load(Resource resource) throws SQLException, IOException {
String location = resource.getLocation();
InputStream inputStream = getClass().getClassLoader().getResourceAsStream(location);
if (inputStream == null) {
throw new FlywayException("Failure to load seed data. Unable to load from location: " + location);
}
if (!inputStream.markSupported()) {
// Sanity check. We have to be able to mark the stream.
throw new FlywayException(
"Failure to load seed data as mark is not supported. Unable to load from location: " + location);
}
// set our mark to something big
inputStream.mark(1 << 32);
String filename = resource.getFilename();
// Strip the prefix (e.g. 01_) and the file extension (e.g. .csv)
String table = filename.substring(3, filename.length() - 4);
String columns = loadCsvHeader(location, inputStream);
// reset to the mark
inputStream.reset();
// Use Postgres COPY command to bring it in
long result = copyManager.copyIn(String.format(CSV_COPY_STRING, table, columns), inputStream);
log.info(format(" %s - Inserted %d rows", location, result));
}
private String loadCsvHeader(String location, InputStream inputStream) {
try {
return new BufferedReader(new InputStreamReader(inputStream)).readLine();
} catch (IOException e) {
throw new FlywayException("Failure to load seed data. Unable to load from location: " + location, e);
}
}
private Resource[] scanForResources() throws IOException {
return new ClassPathScanner(getClass().getClassLoader()).scanForResources(getSeedDataLocation(), "", ".csv");
}
protected String getSeedDataLocation() {
return getClass().getPackage().getName().replace('.', '/');
}
}
To use implement the class with the appropriate classpath
package db.devSeedData.dev;
public class v0_90__seed extends db.devSeedData.v0_90__seed {
}
All that is needed then is to have CSV files in your classpath under db/devSeedData that follow the format 01_tablename.csv. Columns are extracted from the header line of the CSV.

Silverlight - How to get Webresponse string from WebClient.UploadStringAsync

public void Register(string email, string name, string hash)
{
string registerData = "{\"email\":\"" + email + "\",\"name\":\"" + name + "\",\"hash\":\"" + hash + "\"}";
WebClient webClient = new WebClient();
webClient.Headers["Content-Type"] = "application/json";
webClient.UploadStringCompleted += new UploadStringCompletedEventHandler(HandleRegisterAsyncResult);
webClient.UploadStringAsync(new Uri(registerUrl), registerData);
}
void HandleRegisterAsyncResult(object sender, UploadStringCompletedEventArgs e)
{
}
I'm basically trying to call a webservice with a https:// post command that takes a data string. It works well, except when I get an error I can't seem to find the actual WebResponse content. If I cast the e.Error that was returned to a WebException there's a class called Response that's a BrowserHttpWebResponse but the ContentLength is 0 (eventhough I can see the content lenght is not 0 in fiddler)
Is there a way to get the response content with this method? And if not is there another way to do a Post command that does allow me to get the response content?

Most effective way to transfer almost 400k images to S3

I am currently in charge of transferring a site from its current server to EC2, that part of the project is done and fine, the other part is the part I am struggling with, the site currently has almost 400K images, all sorted within different folders within a main userimg folder, the client wants all these images to be stored on S3 - the main problem I have is how do I transfer almost 400,000 images from the server to S3 - I have been using http://s3tools.org/s3cmd which is brilliant but if I was to transfer the userimg folder with s3cmd it is going to take almost 3 days solid, and if the connection breaks or similar problem I am going to have some images on s3 and some not, with no way to continue the process...
Can anyone suggest a solution, has anyone come up against a problem like this before?

I would suggest you to write (or to get someone to write) a simple Java utility that:
Reads the structure of your client directories (if needed)
For every image creates a corresponding key (according to the file structure read in 1)on s3 and starts Multi-part upload in paralel using AWS SDK or jets3t API.
I did it for our client. It is less than 200 lines of java code and it is very reliable.
below is the part that does a multi-part upload.The part that reads the file structure is trivial.
/**
* Uploads file to Amazon S3. Creates the specified bucket if it does not exist.
* The upload is done in chunks of CHUNK_SIZE size (multi-part upload).
* Attempts to handle upload exceptions gracefully up to MAX_RETRY times per single chunk.
*
* #param accessKey - Amazon account access key
* #param secretKey - Amazon account secret key
* #param directoryName - directory path where the file resides
* #param keyName - the name of the file to upload
* #param bucketName - the name of the bucket to upload to
* #throws Exception - in case that something goes wrong
*/
public void uploadFileToS3(String accessKey
,String secretKey
,String directoryName
,String keyName // that is the file name that will be created after upload completed
,String bucketName ) throws Exception {
// Create a credentials object and service to access S3 account
AWSCredentials myCredentials =
new BasicAWSCredentials(accessKey, secretKey);
String filePath = directoryName
+ System.getProperty("file.separator")
+ keyName;
log.info("uploadFileToS3 is about to upload file [" + filePath + "]");
AmazonS3 s3Client = new AmazonS3Client(myCredentials);
// Create a list of UploadPartResponse objects. You get one of these
// for each part upload.
List<PartETag> partETags = new ArrayList<PartETag>();
// make sure that the bucket exists
createBucketIfNotExists(bucketName, accessKey, secretKey);
// delete the file from bucket if it already exists there
s3Client.deleteObject(bucketName, keyName);
// Initialize.
InitiateMultipartUploadRequest initRequest = new InitiateMultipartUploadRequest(bucketName, keyName);
InitiateMultipartUploadResult initResponse = s3Client.initiateMultipartUpload(initRequest);
File file = new File(filePath);
long contentLength = file.length();
long partSize = CHUNK_SIZE; // Set part size to 5 MB.
int numOfParts = 1;
if (contentLength > CHUNK_SIZE) {
if (contentLength % CHUNK_SIZE != 0) {
numOfParts = (int)((contentLength/partSize)+1.0);
}
else {
numOfParts = (int)((contentLength/partSize));
}
}
try {
// Step 2: Upload parts.
long filePosition = 0;
for (int i = 1; filePosition < contentLength; i++) {
// Last part can be less than 5 MB. Adjust part size.
partSize = Math.min(partSize, (contentLength - filePosition));
log.info("Start uploading part[" + i + "] of [" + numOfParts + "]");
// Create request to upload a part.
UploadPartRequest uploadRequest = new UploadPartRequest()
.withBucketName(bucketName).withKey(keyName)
.withUploadId(initResponse.getUploadId()).withPartNumber(i)
.withFileOffset(filePosition)
.withFile(file)
.withPartSize(partSize);
// repeat the upload until it succeeds or reaches the retry limit
boolean anotherPass;
int retryCount = 0;
do {
anotherPass = false; // assume everything is ok
try {
log.info("Uploading part[" + i + "]");
// Upload part and add response to our list.
partETags.add(s3Client.uploadPart(uploadRequest).getPartETag());
log.info("Finished uploading part[" + i + "] of [" + numOfParts + "]");
} catch (Exception e) {
log.error("Failed uploading part[" + i + "] due to exception. Will retry... Exception: ", e);
anotherPass = true; // repeat
retryCount++;
}
}
while (anotherPass && retryCount < CloudUtilsService.MAX_RETRY);
filePosition += partSize;
log.info("filePosition=[" + filePosition + "]");
}
log.info("Finished uploading file");
// Complete.
CompleteMultipartUploadRequest compRequest = new
CompleteMultipartUploadRequest(
bucketName,
keyName,
initResponse.getUploadId(),
partETags);
s3Client.completeMultipartUpload(compRequest);
log.info("multipart upload completed.upload id=[" + initResponse.getUploadId() + "]");
} catch (Exception e) {
s3Client.abortMultipartUpload(new AbortMultipartUploadRequest(
bucketName, keyName, initResponse.getUploadId()));
log.error("Failed to upload due to Exception:", e);
throw e;
}
}
/**
* Creates new bucket with the names specified if it does not exist.
*
* #param bucketName - the name of the bucket to retrieve or create
* #param accessKey - Amazon account access key
* #param secretKey - Amazon account secret key
* #throws S3ServiceException - if something goes wrong
*/
public void createBucketIfNotExists(String bucketName, String accessKey, String secretKey) throws S3ServiceException {
try {
// Create a credentials object and service to access S3 account
org.jets3t.service.security.AWSCredentials myCredentials =
new org.jets3t.service.security.AWSCredentials(accessKey, secretKey);
S3Service service = new RestS3Service(myCredentials);
// Create a new bucket named after a normalized directory path,
// and include my Access Key ID to ensure the bucket name is unique
S3Bucket zeBucket = service.getOrCreateBucket(bucketName);
log.info("the bucket [" + zeBucket.getName() + "] was created (if it was not existing yet...)");
} catch (S3ServiceException e) {
log.error("Failed to get or create bucket[" + bucketName + "] due to exception:", e);
throw e;
}
}

Sounds like a job for Rsync. I've never used it in combination with S3, but S3Sync seems like what you need.

If you don't want to actually upload all of the files (or indeed, manage it), you could use AWS Import/Export which basically entails just shipping Amazon a hard-disk.

You could use superflexiblefilesychronizer. It is a commercial product but the Linux version is free.
It can compare and sync the folders and multiple files can be transferred in parallel. Its fast. The interface is perhaps not the simplest, but thats mainly because it has a million configuration options.
Note: I am not affiliated in any way with this product but I have used it.

Consider Amazon S3 Bucket Explorer.
It allows you to upload files in parallel, so that should speed up the process.
The program has a job queue, so that if one of the uploads fails it will retry the upload automatically.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Read file locations from table and copy to specific folder using pollEnrich() - apache-camel

Related

Apache Camel: route using custom processor, splitter and aggregator doesn't output anything

How do I get a mixed multipart in a RESTEasy response?

Does flyway migrations support PostgreSQL's COPY?

Silverlight - How to get Webresponse string from WebClient.UploadStringAsync

Most effective way to transfer almost 400k images to S3

Categories

Resources