I am using Apache HttpComponents in a bean inside of Camel to try to write a job to download Apple's metadata database files. This is a list of every song in iTunes. So, obviously it is big. 3.5+ GB. I am trying to use Apache HttpComponents to make an asynchronous get request. However, it seems that the size of the file being returned is too large.
try {
httpclient.start();
FileOutputStream fileOutputStream = new FileOutputStream(download);
//Grab the archive.
URIBuilder uriBuilder = new URIBuilder();
uriBuilder.setScheme("https");
uriBuilder.setHost("feeds.itunes.apple.com");
uriBuilder.setPath("/feeds/epf-flat/v1/full/usa/" + iTunesDate + "/song-usa-" + iTunesDate + ".tbz");
String endpoint = uriBuilder.build().toURL().toString();
HttpGet getCall = new HttpGet(endpoint);
String creds64 = new String(Base64.encodeBase64((user + ":" + password).getBytes()));
log.debug("Auth: " + "Basic " + creds64);
getCall.setHeader("Authorization", "Basic " + creds64);
log.debug("About to download file from Apple: " + endpoint);
Future<HttpResponse> future = httpclient.execute(getCall, null);
HttpResponse response = future.get();
fileOutputStream.write(EntityUtils.toByteArray(response.getEntity()));
fileOutputStream.close();
Every time it return this:
java.util.concurrent.ExecutionException: org.apache.http.ContentTooLongException: Entity content is too long: 3776283429
at org.apache.http.concurrent.BasicFuture.getResult(BasicFuture.java:68)
at org.apache.http.concurrent.BasicFuture.get(BasicFuture.java:77)
at com.decibly.hive.songs.iTunesWrapper.getSongData(iTunesWrapper.java:89)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.camel.component.bean.MethodInfo.invoke(MethodInfo.java:407)
So, the size of the file in bytes is to big for a Java integer, which HttpComponents is using to track the response size. I get that, wondering if there are any workarounds aside from dropping back a layer and calling the Java Net libraries directly.
Use HttpAsyncClient
that is build on the top of http components and supports for Zero-Copy transfer.
See an example here: https://hc.apache.org/httpcomponents-asyncclient-4.1.x/httpasyncclient/examples/org/apache/http/examples/nio/client/ZeroCopyHttpExchange.java
Or simply, in your case
CloseableHttpAsyncClient httpclient = HttpAsyncClientBuilder.create()....
ZeroCopyConsumer<File> consumer = new ZeroCopyConsumer<File>(new File(download)) {
#Override
protected File process(
final HttpResponse response,
final File file,
final ContentType contentType) throws Exception {
if (response.getStatusLine().getStatusCode() != HttpStatus.SC_OK) {
throw new ClientProtocolException("Connection to host failed: " + response.getStatusLine());
}
return file;
}
};
httpclient.execute(HttpAsyncMethods.createGet(endpoint), consumer, null, null).get();
The body of the response is directly saved to a file. The only limitation is given by the file system
Related
My problem statement goes like this
" I want to leverage apache solr 8.6.1 streaming capability to send millions of records as part of spring boot REST API call. I cannot directly call solr end points due to security restrictions and also some business logic in place. So I have written the code through which I am able to read the data as stream and push it to spring boot outputstream."
When I am making the API call everytime It goes through the following code
StreamFactory factory = new StreamFactory().withCollectionZkHost(COLLECTION_NAME,ZK_HOST);
SolrClientCache solrClientCache = new SolrClientCache(httpClient);
StreamContext streamContext = new StreamContext();
streamContext.setSolrClientCache(solrClientCache);
String expressionStr = String.format(SEARCH_EXPRESSION,COLLECTION_NAME);
StreamExpression expression = StreamExpressionParser.parse(expressionStr);
TupleStream stream;
try {
stream = new CloudSolrStream(expression, factory);
stream.setStreamContext(streamContext);
stream.open();
Tuple tuple = stream.read();
int count = 0;
while (!tuple.EOF) {
String jsonStr = ++count + " " + tuple.jsonStr() + "\r\n";
outputStream.write(jsonStr.getBytes());
outputStream.flush();
tuple = stream.read();
}
stream.close();
} catch (IOException e) {
e.printStackTrace();
}
and it tries to connect to zookeeper at stream.open and it is taking some time.
is it possible to optimize this code so that everytime it doesn't have to connect to zookeeper and we can keep it ready before hand only.
because it is a stream that's why we have to open and close the stream with every call.
also how it will behave in the multiuser scenario.
Can anyone throw some light on it and how we can optimize it further
I try to call an external web service (not mine) from my GWT application working with App Engine.
I know it's impossible to do it from the client due to the SOP (Same Origin Policy) and RequestBuilder is not a solution on the server. I followed the tutorial on the web site and using java.net as well
Here is the client
AsyncCallback<CustomObject> callback = new AsyncCallback<CustomObjectCustomObject>() {
#Override
public void onFailure(Throwable caught) {
caught.printStackTrace();
}
#Override
public void onSuccess(CustomObject result) {
// code omitted
}
};
service.callMethod(aString, callback);
And this is the server
try {
String xmlRequest = "xmlToSend";
URL url = new URL("https://www.externalWebService.com");
HttpURLConnection conn = (HttpURLConnection)url.openConnection();
conn.setRequestMethod("POST");
conn.setAllowUserInteraction(false);
conn.setDoOutput(true);
conn.setRequestProperty("Content-Type","application/soap+xml");
conn.setRequestProperty( "Content-length", Integer.toString(xmlRequest.length()));
conn.setRequestProperty("charset", "utf-8");
conn.setConnectTimeout(10000);
OutputStream rawOutStream = conn.getOutputStream();
PrintWriter pw = new PrintWriter(rawOutStream);
pw.print(xmlRequest);
pw.flush();
pw.close();
if(conn.getResponseCode() != 200){
// Something...
}
I keep having the same error at conn.getResponseCode() :
java.lang.ClassCastException: com.google.appengine.repackaged.org.apache.http.message.BasicHttpRequest cannot be cast to com.google.appengine.repackaged.org.apache.http.client.methods.HttpUriRequest
Without making a real request, the remote service works well : it's able to serialize and return objects to the client. The issue is not linked to the communication between the client and the server, it's more like AppEngine doesn't support HttpURLConnection. But it should on the server (isn't it?)
Any thoughts would be hightly appreciated! Thanks in advance
Your problem has nothing to do with GWT: as long as you are running on the server, you can use any 'normal' Java and it will work unless AppEngine has restrictions.
It seems you have imported the repackaged version of Apache HttpClient in your class. You should not do that: download your own HttpClient .jar, add it to the dependencies and use that one.
AppEngine also has some issues with HttpClient. There's an adapter available here that fixes most of the issues.
Thanks #Marcelo, you were right!
Here is the solution I found.
I added httpcore.jar and httpclient.jar to my build path and wrote the code below for the server (the client is the same) :
String xmlRequest = "xmlToSend";
CloseableHttpClient httpclient = HttpClients.custom().build();
//RequestConfig requestConfig = RequestConfig.custom()
// .setConnectionRequestTimeout(10000)
// .build();
try {
ByteArrayOutputStream out = new ByteArrayOutputStream();
Writer writer = new OutputStreamWriter(out);
writer.write(xmlToSend);
writer.flush();
writer.close();
HttpPost request = new HttpPost("https://www.externalWebService.com/path");
request.setEntity(new ByteArrayEntity(out.toByteArray()));
//request.setConfig(requestConfig);
CloseableHttpResponse response = httpclient.execute(request);
if(response.getStatusLine().getStatusCode() == 200){
// retrieve content with a BufferReader
// from response.getEntity().getContent()
...
}
The code works and is up to date.
Edit
Here is the rest of the solution when using a proxy. Mine only deals with NTCredentials but otherwise UsernamePasswordCredentials can be used instead.
HttpHost proxy = new HttpHost("addresse.proxy.com", port);
CredentialsProvider credsProvider = new BasicCredentialsProvider();
credsProvider.setCredentials(
new AuthScope(proxy),
new NTCredentials(System.getProperty("user.name") + ":" + password));
RequestConfig requestConfig = RequestConfig.custom()
.setProxy(proxy)
.build();
CloseableHttpClient httpclient = HttpClients.custom()
.setDefaultCredentialsProvider(credsProvider)
.setDefaultRequestConfig(requestConfig)
.build();
Thanks again for your help, I really appreciated!
I have successfully implemented calling GAE -> Azure Mobile Services -> Azure Notification HUB.
But I want to skip the Mobile Services step and call the notification HUB directly and I can't figure out how to send the authorization token. The returned error is:
Returned response: <Error><Code>401</Code><Detail>MissingAudience: The provided token does not
specify the 'Audience'..TrackingId:6a9a452d-c3bf-4fed-b0b0-975210f7a13c_G14,TimeStamp:11/26/2013 12:47:40 PM</Detail></Error>
Here is my code:
URL url = new URL("https://myapp-ns.servicebus.windows.net/myhubbie/messages/?api-version=2013-08");
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setConnectTimeout(60000);
connection.setRequestMethod("POST");
connection.setDoOutput(true);
connection.setRequestProperty("Content-Type", "application/json;charset=utf-8");
connection.setRequestProperty("Authorization","WRAP access_token=\"mytoken_taken_from_azure_portal=\"");
connection.setRequestProperty("ServiceBusNotification-Tags", tag);
byte[] notificationMessage = new byte[0];
try
{
notificationMessage = json.getBytes("UTF-8");
}
catch (UnsupportedEncodingException e)
{
e.printStackTrace();
log.warning("Error encoding toast message to UTF8! Error=" + e.getMessage());
}
connection.setRequestProperty("Content-Length", String.valueOf(notificationMessage.length));
OutputStream ostream = connection.getOutputStream();
ostream.write(notificationMessage);
ostream.flush();
ostream.close();
int responseCode = connection.getResponseCode();
The authorization header has to contain a token specially crafted for each individual request. The data you are using is the key you have to use to generate such a token.
Please follow the instructions on : http://msdn.microsoft.com/en-us/library/dn495627.aspx to create a token for your requests.
Final note, if you are using Java, you can use the code in this public repo https://github.com/fsautomata/notificationhubs-rest-java. It contains a fully functional REST wrapper for Notification Hubs. It is not Microsoft official but works and implements the above specs.
i have a form with a FormPanel, a FileUpload and a Button
final FormPanel formPanel = new FormPanel();
formPanel.setAction("uploadServlet");
formPanel.setMethod(FormPanel.METHOD_POST);
formPanel.setEncoding(FormPanel.ENCODING_MULTIPART);
formPanel.setSize("100%", "100%");
setWidget(formPanel);
AbsolutePanel absolutePanel = new AbsolutePanel();
formPanel.setWidget(absolutePanel);
absolutePanel.setSize("249px", "70px");
final FileUpload fileUpload = new FileUpload();
fileUpload.setName("uploadFormElement");
absolutePanel.add(fileUpload, 0, 0);
Button btnOpen = new Button("Open");
absolutePanel.add(btnOpen, 10, 30);
Button btnCancel = new Button("Cancel");
absolutePanel.add(btnCancel, 63, 30);
this.setText("Open...");
this.setTitle(this.getText());
this.setAnimationEnabled(true);
this.setGlassEnabled(true);
btnOpen.addClickHandler(new ClickHandler() {
public void onClick(ClickEvent event) {
formPanel.submit();
}
});
the servlet gets called but the request contains a error message "error post".
When i try it on the local server it works, the request contains the file, but on the app engine server only the error
public void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
FileItemFactory factory = new DiskFileItemFactory();
ServletFileUpload upload = new ServletFileUpload(factory);
List<?> items = null;
String json = null;
try {
items = upload.parseRequest(request);
}
catch (FileUploadException e) {
e.printStackTrace();
}
Iterator<?> it = items.iterator();
while (it.hasNext()) {
System.out.println("while (it.hasNext()) {");
FileItem item = (FileItem) it.next();
json = item.getString();
}
response.setContentType("text/html");
ServletOutputStream out = response.getOutputStream();
response.setContentLength(json.length());
out.write(json.getBytes());
out.close();
}
DiskFileItemFactory is the default implementation for the commons-fileupload library, and based in it's javadoc:
This implementation creates FileItem instances which keep their content either in memory, for smaller items, or in a temporary file on disk, for larger items. The size threshold, above which content will be stored on disk, is configurable, as is the directory in which temporary files will be created.
If not otherwise configured, the default configuration values are as follows:
Size threshold is 10KB. Repository is the system default temp directory, as returned by System.getProperty("java.io.tmpdir").
So as you see, this implementation is going to write in filesystem when it does not have enough memory.
In GAE, there are many constrains, like the memory you are allow to use, or the prohibition of writing in the filesystem.
Your code should fail in GAE developing mode, but maybe you have not reached the memory limitation, or whatever since GAE dev tries to emulate the same constrains than production server, but it is not identical.
Said, that I could take a look to gwtupload library, they have a servlet for GAE which can save files in different ways: BlobStore, FileApi and MemCache.
Problem Background
I am currently working on a camel based ETL application that processes groups of files as they appear in a dated directory. The files need to be processed together as a group determined by the beginning of the file name. The files can only be processed once the done file (".flag") has been written to the directory. I know the camel file component has a done file option, but that only allows you to retrieve files with the same name as the done file. The application needs to run continuously and start polling the next day's directory when the date rolls.
Example Directory Structure:
/process-directory
/03-09-2011
/03-10-2011
/GROUPNAME_ID1_staticfilename.xml
/GROUPNAME_staticfilename2.xml
/GROUPNAME.flag
/GROUPNAME2_ID1_staticfilename.xml
/GROUPNAME2_staticfilename2.xml
/GROUPNAME2_staticfilename3.xml
/GROUPNAME2.flag
Attempts Thus Far
I have the following route (names obfuscated) that kicks off the processing:
#Override
public void configure() throws Exception
{
getContext().addEndpoint("processShare", createProcessShareEndpoint());
from("processShare")
.process(new InputFileRouter())
.choice()
.when()
.simple("${header.processorName} == '" + InputFileType.TYPE1 + "'")
.to("seda://type1?size=1")
.when()
.simple("${header.processorName} == '" + InputFileType.TYPE2 + "'")
.to("seda://type2?size=1")
.when()
.simple("${header.processorName} == '" + InputFileType.TYPE3 + "'")
.to("seda://type3?size=1")
.when()
.simple("${header.processorName} == '" + InputFileType.TYPE4 + "'")
.to("seda://type4?size=1")
.when()
.simple("${header.processorName} == '" + InputFileType.TYPE5 + "'")
.to("seda://type5?size=1")
.when()
.simple("${header.processorName} == '" + InputFileType.TYPE6 + "'")
.to("seda://type6?size=1")
.when()
.simple("${header.processorName} == '" + InputFileType.TYPE7 + "'")
.to("seda://type7?size=1")
.otherwise()
.log(LoggingLevel.FATAL, "Unknown file type encountered during processing! --> ${body}");
}
My problems are around how to configure the file endpoint. I'm currently trying to programatically configure the endpoint without a lot of luck. My experience in camel thus far has been predominently using the Spring DSL and not the Java DSL.
I went down the route of trying to instantiate a FileEndpoint object, but whenever the route builds I get an error saying that the file property is null. I believe this is because I should be creating a FileComponent and not an endpoint. I'm not creating the endpoint without using a uri because I am not able to specify the dynamic date in the directory name using the uri.
private FileEndpoint createProcessShareEndpoint() throws ConfigurationException
{
FileEndpoint endpoint = new FileEndpoint();
//Custom directory "ready to process" implementation.
endpoint.setProcessStrategy(getContext().getRegistry().lookup(
"inputFileProcessStrategy", MyFileInputProcessStrategy.class));
try
{
//Controls the number of files returned per directory poll.
endpoint.setMaxMessagesPerPoll(Integer.parseInt(
PropertiesUtil.getProperty(
AdapterConstants.OUTDIR_MAXFILES, "1")));
}
catch (NumberFormatException e)
{
throw new ConfigurationException(String.format(
"Property %s is required to be an integer.",
AdapterConstants.OUTDIR_MAXFILES), e);
}
Map<String, Object> consumerPropertiesMap = new HashMap<String, Object>();
//Controls the delay between directory polls.
consumerPropertiesMap.put("delay", PropertiesUtil.getProperty(
AdapterConstants.OUTDIR_POLLING_MILLIS));
//Controls which files are included in directory polls.
//Regex that matches file extensions (eg. {SOME_FILE}.flag)
consumerPropertiesMap.put("include", "^.*(." + PropertiesUtil.getProperty(
AdapterConstants.OUTDIR_FLAGFILE_EXTENSION, "flag") + ")");
endpoint.setConsumerProperties(consumerPropertiesMap);
GenericFileConfiguration configuration = new GenericFileConfiguration();
//Controls the directory to be polled by the endpoint.
if(CommandLineOptions.getInstance().getInputDirectory() != null)
{
configuration.setDirectory(CommandLineOptions.getInstance().getInputDirectory());
}
else
{
SimpleDateFormat dateFormat = new SimpleDateFormat(PropertiesUtil.getProperty(AdapterConstants.OUTDIR_DATE_FORMAT, "MM-dd-yyyy"));
configuration.setDirectory(
PropertiesUtil.getProperty(AdapterConstants.OUTDIR_ROOT) + "\\" +
dateFormat.format(new Date()));
}
endpoint.setConfiguration(configuration);
return endpoint;
Questions
Is implementing a GenericFileProcessingStrategy the correct thing to do in this situation? If so, is there an example of this somewhere? I have looked through the camel file unit tests and didn't see anything that jumped out at me.
What am I doing wrong with configuring the endpoint? I feel like the answer to cleaning up this mess is tied in with question 3.
Can you configure the file endpoint to roll dated folders when polling and the date changes?
As always thanks for the help.
You can refer to a custom ProcessStrategy from the endpoint uri using the processStrategy option, eg file:xxxx?processStrategy=#myProcess. Notice how we prefix the value with # to indicate it should lookup it from the registry. So in Spring XML you just add a
<bean id="myProcess" ...> tag
In Java its probably easier to grab the endpoint from the CamelContext API:
FileEndpoint file = context.getEndpoint("file:xxx?aaa=123&bbb=456", FileEndpoint.class);
This allows you to pre configure the endpoint. And of course afterwards you can use the API on FileEndpoint to set other configurations.
In java, this is how to use GenericFileProcessingStrategy :
#Component
public class CustomGenericFileProcessingStrategy<T> extends GenericFileProcessStrategySupport<T> {
public CustomFileReadyToCopyProcessStrategy() {
}
public boolean begin(GenericFileOperations<T> operations, GenericFileEndpoint<T> endpoint, Exchange exchange, GenericFile<T> file) throws Exception {
super.begin(operations, endpoint, exchange, file);
...
}
public void commit(GenericFileOperations<T> operations, GenericFileEndpoint<T> endpoint, Exchange exchange, GenericFile<T> file) throws Exception {
super.commit(operations, endpoint, exchange, file);
...
}
public void rollback(GenericFileOperations<T> operations, GenericFileEndpoint<T> endpoint, Exchange exchange, GenericFile<T> file) throws Exception {
super.rollback(operations, endpoint, exchange, file);
...
}
}
And then create you route Builer class:
public class myRoutes() extends RouteBuilder {
private final static CustomGenericFileProcessingStrategy customGenericFileProcessingStrategy;
public myRoutes(CustomGenericFileProcessingStrategy
customGenericFileProcessingStrategy) {
this.customGenericFileProcessingStrategy = customGenericFileProcessingStrategy ; }
#Override public void configure() throws Exception {
FileEndpoint fileEndPoint= camelContext.getEndpoint("file://mySourceDirectory");
fileEndPoint.setProcessStrategy(myCustomGenericFileProcessingStrategy );
from(fileEndPoint).setBody(...)process(...).toD(...);
...
}