I've got the following method which allows me to upload files to containers on Rackspace CloudFiles :
/**
* Uploads a file to the storage.
*
* #param f the <code>File</code> which is to be uploaded to the storage.
* #param fileContainer a <code>String</code> representing the container
* which the provided <code>File</code> is to be uploaded to.
* #throws StorageException if an attempt to upload the provided file to
* the storage failed.
*/
public static void upload(File file, String fileContainer) throws StorageException {
if (!file.exists()) {
throw new StorageException("The file '" + file.getName() + "' does not exist.");
}
try {
BlobStoreContext cb = ContextBuilder.newBuilder("cloudfiles-uk")
.credentials(USERNAME, PASSWORD)
.buildView(BlobStoreContext.class);
Blob blob = cb.getBlobStore().blobBuilder(file.getName())
.payload(file)
.build();
cb.getBlobStore().putBlob(fileContainer, blob);
} catch (Exception e) {
throw new StorageException(e);
}
}
Right now, I'm creataing a new context every time the method is called. As far as I understand, the code will only authenticate on first call and from there use a key issued during the first authentication on all subsequent calls. However, I'm not sure it that is correct? Will I be re-authenticating if i throw away the BlobStoreContext instance and instantiate a new one every time upload() is invoked? Would it be a better idea to keep the BlobStoreContext instance?
As you have your code now, you will be reauthenticating on each call to the 'upload' function.
Instead, you'll probably want to create a global context variable, call an authentication function to set your credentials, and then use the context in your upload function.
See this example:
https://github.com/jclouds/jclouds-examples/blob/master/rackspace/src/main/java/org/jclouds/examples/rackspace/Authentication.java
Related
I have a FlatMapFunction that lists items in S3. I want to register each item in the distributed file cache.
Is that even possible?
ie, in my job:
final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
...
... = myDataSet.flatMap(new S3Lister(env));
and in the S3Lister file:
...
String id = os.getKey().substring(os.getKey().lastIndexOf('/') + 1);
env.registerCachedFile("s3://" + bucket + os.getKey(), id);
...
and then later access it from the distributed cache in another custom coGroup function.
Could this work? Are you even allowed to pass the ExecutionEnvironment around like that?
Update:
If not, what's the best way to get an entire S3 bucket into a distributed file cache for use in a flink job?
Essentially, registerCachedFiles method helps to upload the files when submitting the job. So it's not possible to call it in a deployed program.
But from your description, why not read the S3 files directly?
You can use Reach functions and instead of normal ones, and then load your distributed cache in it.
First you load your file:
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
// register a file from HDFS
env.registerCachedFile("hdfs:///path/to/your/file", "hdfsFile")
// register a local executable file (script, executable, ...)
env.registerCachedFile("file:///path/to/exec/file", "localExecFile", true)
// define your program and execute
...
DataSet<String> input = ...
DataSet<Integer> result = input.map(new MyMapper());
...
env.execute();
and then use it in you ReachFunction class:
// extend a RichFunction to have access to the RuntimeContext
public final class MyMapper extends RichMapFunction<String, Integer> {
#Override
public void open(Configuration config) {
// access cached file via RuntimeContext and DistributedCache
File myFile = getRuntimeContext().getDistributedCache().getFile("hdfsFile");
// read the file (or navigate the directory)
...
}
#Override
public Integer map(String value) throws Exception {
// use content of cached file
...
}
}
You can see these in this Flink Documentation.
I'm a bit confused about Storage and FileSystemStorage. I wrote the following methods, but I'm sure that they don't work as expected, because .contains("/") is not enough to distinguish if we are using Storage or FileSystemStorage.
Could you please help me to fix them? Thank you
/**
* Get an InputStream for the given sourceFile, it automatically chooses
* FileSystem API or Storage API
*
* #param sourceFile
* #return
* #throws java.io.IOException
*/
public static InputStream getInputStream(String sourceFile) throws IOException {
if (sourceFile.contains("/")) {
return FileSystemStorage.getInstance().openInputStream(sourceFile);
} else {
// Storage is a flat file system
return Storage.getInstance().createInputStream(sourceFile);
}
}
/**
* Get an OutputStream for the given sourceFile, it automatically chooses
* FileSystem API or Storage API
*
* #param destFile
* #return
* #throws java.io.IOException
*/
public static OutputStream getOutputStream(String destFile) throws IOException {
if (destFile.contains("/")) {
return FileSystemStorage.getInstance().openOutputStream(destFile);
} else {
// Storage is a flat file system
return Storage.getInstance().createOutputStream(destFile);
}
}
Actually they should be pretty good. In theory storage would allow you to use / as part of the file name but honestly it isn't something we've tested and I'm not sure if that's the right thing to do.
FileSystemStorage requires an absolute path and as such will always include a slash character. So this should work fine. Technically a FileSystemStorage path should start with file:// but APIs often work without it to make native code integration easier so that's not a great way to distinguish the API.
I am creating a file that is user specific. This file is basically a results csv that is created with the option for the user to download or not. When the user leaves the page, or ends their session I want to be able to delete this file. What is the best way to handle this?
Currently I am using the File class for Java.
Thanks!
You don't have to write a file in the first place. Create the content on the fly and stream it back to the client. Wicket has a few classes in the package org.apache.wicket.request.resource to help with that.
As a starting point, look at Wicket 6 resource management and Wicket 1.5 Mounting resources
You basically mount a resource in the WicketApplication.init():
mountResource("somePath/${param1}/${param2}", new SomeResourceReference());
Than the SomeResourceReference:
public class SomeResourceReference extends ResourceReference {
#Override
public IResource getResource() {
return new SomeResource();
}
}
And finally in SomeResource:
public class SomeResource extends AbstractResource {
#Override
public AbstractResource.ResourceResponse
newResourceResponse(Attributes attributes) {
// get the parameters
PageParameters parameters = attributes.getParameters();
final String param1 = parameters.get("param1").toStringObject();
AbstractResource.ResourceResponse response
= new AbstractResource.ResourceResponse();
response.setContentType("application/CSV");
response.setCacheDuration(Duration.NONE);
response.setCacheScope(WebResponse.CacheScope.PRIVATE);
response.setContentDisposition(ContentDisposition.INLINE);
response.setWriteCallback(new AbstractResource.WriteCallback() {
#Override
public void writeData(final Attributes attributes) throws IOException {
// create your data here
attributes.getResponse().write(dataAsString);
}
});
return response;
}
}
Wicket doesn't control destroying the session. It is the concern of the servlet container you are using.
If you want to create a file in Wicket and delete the file when the session is destroyed or user want logout, it has two parts:
User logout (in Wikcet)
Store the file path or the file reference in the WebSession (Wicket)
Override the method invalidate() of your WebSession or AutheticatedWebSession, see http://ci.apache.org/projects/wicket/apidocs/6.x/org/apache/wicket/protocol/http/WebSession.html#invalidate%28%29
Session destroyed
Store the file path or the file reference into the container session and write your listener and add it to the your servlet context (e.g. tomcat using web.xml file).
See http://docs.oracle.com/javaee/7/api/javax/servlet/http/HttpSessionListener.html
i have a form with a FormPanel, a FileUpload and a Button
final FormPanel formPanel = new FormPanel();
formPanel.setAction("uploadServlet");
formPanel.setMethod(FormPanel.METHOD_POST);
formPanel.setEncoding(FormPanel.ENCODING_MULTIPART);
formPanel.setSize("100%", "100%");
setWidget(formPanel);
AbsolutePanel absolutePanel = new AbsolutePanel();
formPanel.setWidget(absolutePanel);
absolutePanel.setSize("249px", "70px");
final FileUpload fileUpload = new FileUpload();
fileUpload.setName("uploadFormElement");
absolutePanel.add(fileUpload, 0, 0);
Button btnOpen = new Button("Open");
absolutePanel.add(btnOpen, 10, 30);
Button btnCancel = new Button("Cancel");
absolutePanel.add(btnCancel, 63, 30);
this.setText("Open...");
this.setTitle(this.getText());
this.setAnimationEnabled(true);
this.setGlassEnabled(true);
btnOpen.addClickHandler(new ClickHandler() {
public void onClick(ClickEvent event) {
formPanel.submit();
}
});
the servlet gets called but the request contains a error message "error post".
When i try it on the local server it works, the request contains the file, but on the app engine server only the error
public void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
FileItemFactory factory = new DiskFileItemFactory();
ServletFileUpload upload = new ServletFileUpload(factory);
List<?> items = null;
String json = null;
try {
items = upload.parseRequest(request);
}
catch (FileUploadException e) {
e.printStackTrace();
}
Iterator<?> it = items.iterator();
while (it.hasNext()) {
System.out.println("while (it.hasNext()) {");
FileItem item = (FileItem) it.next();
json = item.getString();
}
response.setContentType("text/html");
ServletOutputStream out = response.getOutputStream();
response.setContentLength(json.length());
out.write(json.getBytes());
out.close();
}
DiskFileItemFactory is the default implementation for the commons-fileupload library, and based in it's javadoc:
This implementation creates FileItem instances which keep their content either in memory, for smaller items, or in a temporary file on disk, for larger items. The size threshold, above which content will be stored on disk, is configurable, as is the directory in which temporary files will be created.
If not otherwise configured, the default configuration values are as follows:
Size threshold is 10KB. Repository is the system default temp directory, as returned by System.getProperty("java.io.tmpdir").
So as you see, this implementation is going to write in filesystem when it does not have enough memory.
In GAE, there are many constrains, like the memory you are allow to use, or the prohibition of writing in the filesystem.
Your code should fail in GAE developing mode, but maybe you have not reached the memory limitation, or whatever since GAE dev tries to emulate the same constrains than production server, but it is not identical.
Said, that I could take a look to gwtupload library, they have a servlet for GAE which can save files in different ways: BlobStore, FileApi and MemCache.
So I want to create a java.io.File so that I can use it to generate a multipart-form POST request. I have the file in the form of a com.google.api.services.drive.model.File so I'm wondering, is there a way I can convert this Google File to a Java File? This is a web-app that uses the Google App Engine SDK, which prohibits every approach I've tried to make this work
No, you it doesn't seem like you can convert from com.google.api.services.drive.model.File to java.io.File. But it should still be possible to generate a multipart-form POST request using your data in Drive.
So the com.google.api.services.drive.model.File class is used for storing metadata about the file. It's not storing the file contents.
If you want to read the contents of your file into memory, this code snippet from the Drive documentation shows how to do it. Once the file is in memory, you can do whatever you want with it.
/**
* Download the content of the given file.
*
* #param service Drive service to use for downloading.
* #param file File metadata object whose content to download.
* #return String representation of file content. String is returned here
* because this app is setup for text/plain files.
* #throws IOException Thrown if the request fails for whatever reason.
*/
private String downloadFileContent(Drive service, File file)
throws IOException {
GenericUrl url = new GenericUrl(file.getDownloadUrl());
HttpResponse response = service.getRequestFactory().buildGetRequest(url)
.execute();
try {
return new Scanner(response.getContent()).useDelimiter("\\A").next();
} catch (java.util.NoSuchElementException e) {
return "";
}
}
https://developers.google.com/drive/examples/java
This post might be helpful for making your multi-part POST request from Google AppEngine.
In GoogleDrive Api v3 you can download the file content into your OutputStream. You need for that the file id, which you can get from your com.google.api.services.drive.model.File:
String fileId = "yourFileId";
OutputStream outputStream = new ByteArrayOutputStream();
driveService.files().get(fileId).executeMediaAndDownloadTo(outputStream);