How to read byte by byte from appengine datastore Entity Object

How to read byte by byte from appengine datastore Entity Object - google-app-engine

In a nutshell, since GAE cannot write to a filesystem, I have decided to persist my data into the datastore (using JDO). Now, I will like to retrieve the data byte by byte and pass it to the client as an input stream. There's code from the gwtupload library(http://code.google.com/p/gwtupload/) (see below) which breaks on GAE because it writes to the system filesystem. I'll like to be able to provide a GAE ported solution.
public static void copyFromInputStreamToOutputStream(InputStream in, OutputStream out) throws IOException {
byte[] buffer = new byte[100000];
while (true) {
synchronized (buffer) {
int amountRead = in.read(buffer);
if (amountRead == -1) {
break;
}
out.write(buffer, 0, amountRead);
}
}
in.close();
out.flush();
out.close();
}
One work around I have tried (didn't work) is to retrieve the data from the datastore as a resource like this:
InputStream resourceAsStream = null;
PersistenceManager pm = PMF.get().getPersistenceManager();
try {
Query q = pm.newQuery(ImageFile.class);
lf = q.execute();
resourceAsStream = getServletContext().getResourceAsStream((String) pm.getObjectById(lf));
} finally {
pm.close();
}
if (lf != null) {
response.setContentType(receivedContentTypes.get(fieldName));
copyFromInputStreamToOutputStream(resourceAsStream, response.getOutputStream());
}
I welcome your suggestions.
Regards

Store data in a byte array, and use a ByteArrayInputStream or ByteArrayOutputStream to pass it to libraries that expect streams.
If by 'client' you mean 'HTTP client' or browser, though, there's no reason to do this - just deal with regular byte arrays on your end and send them to/from the user as you would any other data. The only reason to mess around with streams like this is if you have some library that expects them.

Related

Dart - HTTPClient download file to string

In the Flutter/Dart app that I am currently working on need to download large files from my servers. However, instead of storing the file in local storage what I need to do is to parse its contents and consume it one-off. I thought the best way to accomplish this was by implementing my own StreamConsumer and overriding the relvant methods. Here is what I have done thus far
import 'dart:io';
import 'dart:async';
class Accumulator extends StreamConsumer<List<int>>
{
String text = '';
#override
Future<void> addStream(Stream<List<int>> s) async
{
print('Adding');
//print(s.length);
return;
}
#override
Future<dynamic> close() async
{
print('closed');
return Future.value(text);
}
}
Future<String> fileFetch() async
{
String url = 'https://file.io/bse4moAYc7gW';
final HttpClientRequest request = await HttpClient().getUrl(Uri.parse(url));
final HttpClientResponse response = await request.close();
return await response.pipe(Accumulator());
}
Future<void> simpleFetch() async
{
String url = 'https://file.io/bse4moAYc7gW';
final HttpClientRequest request = await HttpClient().getUrl(Uri.parse(url));
final HttpClientResponse response = await request.close();
await response.pipe(File('sample.txt').openWrite());
print('Simple done!!');
}
void main() async
{
print('Starting');
await simpleFetch();
String text = await fileFetch();
print('Finished! $text');
}
When I run this in VSCode here is the output I get
Starting
Simple done!! //the contents of the file at https://file.io/bse4moAYc7gW are duly saved in the file
sample.txt
Adding //clearly addStream is being called
Instance of 'Future<int>' //I had expected to see the length of the available data here
closed //close is clearly being called BUT
Finished! //back in main()
My understanding of the underlying issues here is still rather limited. My expectation
I had thought that I would use addStream to accumulate the contents being downloaded until
There is nothing more to download at which point close would be called and the program would display exited
Why is addStream showing instance of... rather than the length of available content?
Although the VSCode debug console does display exited this happens several seconds after closed is displayed. I thought this might be an issue with having to call super.close() but not so. What am I doing wrong here?

I was going to delete this question but decided to leave it here with an answer for the benefit of anyone else trying to do something similar.
The key point to note is that the call to Accumulator.addStream does just that - it furnishes a stream to be listened to, no actual data to be read. What you do next is this
void whenData(List<int> data)
{
//you will typically get a sequence of one or more bytes here.
for(int value in data)
{
//accumulate the incoming data here
}
return;
}
function void whenDone()
{
//now that you have all the file data *accumulated* do what you like with it
}
#override
Future<void> addStream(Stream<List<int>> s) async
{
s.listen(whenData,onDone:whenDone);
//you can optionally ahandler for `onError`
}

Send and receive a w3c.dom.Document over socket as byte[] Java

I send a document over socket like this:
sendFXML(asByteArray(getRequiredScene(fetchSceneRequest())));
private void sendFXML(byte[] requiredFXML) throws IOException, TransformerException {
dataOutputStream.write(requiredFXML);
dataOutputStream.flush();
}
private Document getRequiredScene(String requiredFile) throws IOException, ParserConfigurationException, SAXException, TransformerException {
return new XMLLocator().getDocumentOrReturnNull(requiredFile);
}
private String fetchSceneRequest() throws IOException, ClassNotFoundException {
return dataInputStream.readUTF();
}
On the side of XMLLocator it finds the correct document and parses it right. I see it by printing the whole doc in console.
But I cannot handle it on the clients side where it's fetch by:
public static void receivePage() throws IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
byte[] data = new byte[989898];
int bytesRead = -1;
while((bytesRead = dataInputStream.read(data)) != -1 ) { //stops here
baos.write(data, 0, bytesRead );
}
Files.write(Paths.get(FILE_TO_RECEIVED), data);
}
After the first iteration in while() cycle it just stops on the commented place.
I don't know if I have an error on the side of the server and I send this in doc in an incorrect format or I read the sent byte array incorrectly. Where is the problem?
Edit:
For the debug purpose, in the receivePage() method, I've chosen a different way of reading the byte array from server which goes like:
int count = inputStream.available();
byte[] b = new byte[count];
int bytes = dataInputStream.read(b);
System.out.println(bytes);
for (byte by : b) {
System.out.print((char)by);
}
And now I'm able to print fetched FXLM in console but a new problem has appeared.
On debug, it normally receives the byte[] from server, writes 2024 for count and displayes the content of the file but if I run the app normally via Shift + f10 it fetches nothing and just writes 0 in console
Edit2:
For some reason, once again, on debug, it's able to even write into a file
for (byte by : b) {
Files.write(Paths.get(FILE_TO_RECEIVED), b);
System.out.print((char)by);
}
But when I try to return this fxml on debug and then show like this:
Parent fxmlToShow = FXMLLoader.load(getClass().getResource("/network/gui.fxml"));
Scene childScene = new Scene(fxmlToShow);
Stage window = (Stage)((Node)ae.getSource()).getScene().getWindow();
window.setScene(childScene);
return window;
It shows only previous files. Like on the first attempt of debug it show a blank page when I asked for the 1st one from server. On the second attempt of debug when i ask for 3rd page from server, it shows me the previously asked one and so on.
To me, it seems absolutely insane cuz the fxml rile actually refreshes before the line
Parent fxmlToShow = FXMLLoader.load(getClass().getResource("/network/gui.fxml"));
is invoked.

Yeah, thank everybody for participating.
So, the issue of incorrect displaying if FXML files was caused by the incorrect FILE_TO_RECEIVED path.
When FXMLLoader.load(getClass().getResource("/network/gui.fxml")); loads gui.fxml it takes it not from D:\\JetBrains\\IdeaProjects\\Client\\src\\network\\gui.fxml,im my case, but from D:\\JetBrains\\IdeaProjects\\Client\\OUT\\PRODUCTION\\Client\\network\\gui.fxml.
As for me, that doesn't seem obvious.
What about different behaviour on debug and on run. In method receivePage() it needs to wait until connection is available.
int count = inputStream.available();
If you read docs for this method you will see
Returns an estimate of the number of bytes that can be read (or skipped over) from this input stream ...
The available method for class InputStream always returns 0...
So, you jext need to wait for connection to be available
while(inputStream.available()==0){
Thread.sleep(100);
}
Otherwise it just prepares byte[] b = new byte[count]; for 0 bytes and you can write in nothing.

StackOverflowError while reading file from reactive GridFS

I have a problem with the ReactiveGridFsTemplate. I am trying to read a GridFS file written with the old GridFS (com.mongodb.gridfs) instead of the new GridFS (com.mongodb.client.gridfs.model.GridFS) with an UUID as an ID instead of the ObjectId. Reading the GridFS file metainfo goes fine, but as soon as I want to get the ReactiveGridFsResource it blows with a nice new MongoGridFSException("Custom id type used for this GridFS file").
The culprit is the code below from ReactiveGridFsTemplate which calls the getObjectId() instead of the getId(). Should it call this method or can that be rewritten to the getId() method?
public Mono<ReactiveGridFsResource> getResource(GridFSFile file) {
Assert.notNull(file, "GridFSFile must not be null!");
return Mono.fromSupplier(() -> {
GridFSDownloadStream stream = this.getGridFs().openDownloadStream(file.getObjectId());
return new ReactiveGridFsResource(file, BinaryStreamAdapters.toPublisher(stream, this.dataBufferFactory));
});
}
I hacked the ReactiveGridFsTemplate to use getId() instead of getObjectId() but now it gives me a stackoverflow exception. Can someone tell me what I'm doing wrong?
ReactiveGridFsTemplate reactiveGridFsTemplate = new ReactiveGridFsTemplate(mongoDbDFactory, operations.getConverter(), "nl.loxia.collectie.buitenlandbladen.dgn", 1024) {
public Mono<ReactiveGridFsResource> getResource(GridFSFile file) {
Assert.notNull(file, "GridFSFile must not be null!");
return Mono.fromSupplier(() -> {
GridFSDownloadStream stream = this.getGridFs().openDownloadStream(file.getId());
return new ReactiveGridFsResource(file, BinaryStreamAdapters.toPublisher(stream, this.dataBufferFactory));
});
}
};
var q = Query.query((Criteria.where("_id").is("5449d9e3-7f6d-47b7-957d-056842f190f7")));
List<DataBuffer> block = reactiveGridFsTemplate
.findOne(q)
.flatMap(reactiveGridFsTemplate::getResource)
.flux()
.flatMap(ReactiveGridFsResource::getDownloadStream)
.collectList()
.block();
The stacktrace: https://gist.github.com/nickstolwijk/fa77681572db1d91941d85f6c845f2f4
Also, this code hangs due to the stackoverflow exception. Is that correct?

Google cloud storage using stream instead of bytebuffer - java

I'm using the following code:
GcsService gcsService = GcsServiceFactory.createGcsService();
GcsFilename filename = new GcsFilename(BUCKETNAME, fileName);
GcsFileOptions options = new GcsFileOptions.Builder()
.mimeType(contentType)
.acl("public-read")
.addUserMetadata("myfield1", "my field value")
.build();
#SuppressWarnings("resource")
GcsOutputChannel outputChannel =
gcsService.createOrReplace(filename, options);
outputChannel.write(ByteBuffer.wrap(byteArray));
outputChannel.close();
The problem is that when I try to store video files, I have to store the file in the byteArray which could cause memory issues.
But I cannot find any interface to do the same with stream.
questions:
Should I worry about mem issues in the appengine srv, or are they capable of keeping a 1 min video in mem?
is it possible to use stream instead of byte array? how?
I'm reading the bytes as byte[] byteArray = IOUtils.toByteArray(stream); should I use the byte array as a real buffer and just read chunks and upload them to the GCS? how do I do that?

The amount of memory available depends on the appengine instance type you've configured. Streaming this data seems like a good idea if you can.
Not sure about the GcsService api, but looks like you can do this using the gcloud Storage api:
https://github.com/GoogleCloudPlatform/gcloud-java/blob/master/gcloud-java-storage/src/main/java/com/google/cloud/storage/Storage.java
This code might work (untested)...
final BlobInfo info = BlobInfo.builder(bucket.getBucketName(), "name").contentType("image/png").build();
final ReadableByteChannel src = Channels.newChannel(stream);
final WriteChannel dst = gcsStorage.writer(info);
fastChannelCopy(src, dst);
private void fastChannelCopy(final ReadableByteChannel src, final WritableByteChannel dest) throws IOException {
final ByteBuffer buffer = ByteBuffer.allocateDirect(16 * 1024);
while (src.read(buffer) != -1) {
buffer.flip(); // prepare the buffer to be drained
dest.write(buffer); // write to the channel, may block
// If partial transfer, shift remainder down
// If buffer is empty, same as doing clear()
buffer.compact();
}
// EOF will leave buffer in fill state
buffer.flip();
// make sure the buffer is fully drained.
while (buffer.hasRemaining()) {
dest.write(buffer);
}
}

read cloud storage content with "gzip" encoding for "application/octet-stream" type content

We're using "Google Cloud Storage Client Library" for app engine, with simply "GcsFileOptions.Builder.contentEncoding("gzip")" at file creation time, we got the following problem when reading the file:
com.google.appengine.tools.cloudstorage.NonRetriableException: java.lang.RuntimeException: com.google.appengine.tools.cloudstorage.SimpleGcsInputChannelImpl$1#1c07d21: Unexpected cause of ExecutionException
at com.google.appengine.tools.cloudstorage.RetryHelper.doRetry(RetryHelper.java:87)
at com.google.appengine.tools.cloudstorage.RetryHelper.runWithRetries(RetryHelper.java:129)
at com.google.appengine.tools.cloudstorage.RetryHelper.runWithRetries(RetryHelper.java:123)
at com.google.appengine.tools.cloudstorage.SimpleGcsInputChannelImpl.read(SimpleGcsInputChannelImpl.java:81)
...
Caused by: java.lang.RuntimeException: com.google.appengine.tools.cloudstorage.SimpleGcsInputChannelImpl$1#1c07d21: Unexpected cause of ExecutionException
at com.google.appengine.tools.cloudstorage.SimpleGcsInputChannelImpl$1.call(SimpleGcsInputChannelImpl.java:101)
at com.google.appengine.tools.cloudstorage.SimpleGcsInputChannelImpl$1.call(SimpleGcsInputChannelImpl.java:81)
at com.google.appengine.tools.cloudstorage.RetryHelper.doRetry(RetryHelper.java:75)
... 56 more
Caused by: java.lang.IllegalStateException: com.google.appengine.tools.cloudstorage.oauth.OauthRawGcsService$2#1d8c25d: got 46483 > wanted 19823
at com.google.common.base.Preconditions.checkState(Preconditions.java:177)
at com.google.appengine.tools.cloudstorage.oauth.OauthRawGcsService$2.wrap(OauthRawGcsService.java:418)
at com.google.appengine.tools.cloudstorage.oauth.OauthRawGcsService$2.wrap(OauthRawGcsService.java:398)
at com.google.appengine.api.utils.FutureWrapper.wrapAndCache(FutureWrapper.java:53)
at com.google.appengine.api.utils.FutureWrapper.get(FutureWrapper.java:90)
at com.google.appengine.tools.cloudstorage.SimpleGcsInputChannelImpl$1.call(SimpleGcsInputChannelImpl.java:86)
... 58 more
What else should be added to read files with "gzip" compression to be able to read the content in app engine? ( curl cloud storage URL from client side works fine for both compressed and uncompressed file )
This is the code that works for uncompressed object:
byte[] blobContent = new byte[0];
try
{
GcsFileMetadata metaData = gcsService.getMetadata(fileName);
int fileSize = (int) metaData.getLength();
final int chunkSize = BlobstoreService.MAX_BLOB_FETCH_SIZE;
LOG.info("content encoding: " + metaData.getOptions().getContentEncoding()); // "gzip" here
LOG.info("input size " + fileSize); // the size is obviously the compressed size!
for (long offset = 0; offset < fileSize;)
{
if (offset != 0)
{
LOG.info("Handling extra size for " + filePath + " at " + offset);
}
final int size = Math.min(chunkSize, fileSize);
ByteBuffer result = ByteBuffer.allocate(size);
GcsInputChannel readChannel = gcsService.openReadChannel(fileName, offset);
try
{
readChannel.read(result); <<<< here the exception was thrown
}
finally
{
......
It is now compressed by:
GcsFilename filename = new GcsFilename(bucketName, filePath);
GcsFileOptions.Builder builder = new GcsFileOptions.Builder().mimeType(image_type);
builder = builder.contentEncoding("gzip");
GcsOutputChannel writeChannel = gcsService.createOrReplace(filename, builder.build());
ByteArrayOutputStream byteStream = new ByteArrayOutputStream(blob_content.length);
try
{
GZIPOutputStream zipStream = new GZIPOutputStream(byteStream);
try
{
zipStream.write(blob_content);
}
finally
{
zipStream.close();
}
}
finally
{
byteStream.close();
}
byte[] compressedData = byteStream.toByteArray();
writeChannel.write(ByteBuffer.wrap(compressedData));
the blob_content is compressed from 46483 bytes to 19823 bytes.
I think it is the google code's bug
https://code.google.com/p/appengine-gcs-client/source/browse/trunk/java/src/main/java/com/google/appengine/tools/cloudstorage/oauth/OauthRawGcsService.java, L418:
Preconditions.checkState(content.length <= want, "%s: got %s > wanted %s", this, content.length, want);
the HTTPResponse has decoded the blob, so the Precondition is wrong here.

If I good understand you have to set mineType:
GcsFileOptions options = new GcsFileOptions.Builder().mimeType("text/html")
Google Cloud Storage does not compress or decompress objects:
https://developers.google.com/storage/docs/reference-headers?csw=1#contentencoding
I hope that's what you want to do .

Looking at your code it seems like there is a mismatch between what is stored and what is read. The documentation specifies that compression is not done for you (https://developers.google.com/storage/docs/reference-headers?csw=1#contentencoding). You will need to do the actual compression manually.
Also if you look at the implementation of the class that throws the exception (https://code.google.com/p/appengine-gcs-client/source/browse/trunk/java/src/main/java/com/google/appengine/tools/cloudstorage/oauth/OauthRawGcsService.java?r=81&spec=svn134) you will notice that you get the original contents back but you're actually expecting compressed content. Check the method readObjectAsync in the above mentioned class.
It looks like the content persisted might not be gzipped or the content-length is not set properly. What you should do is verify length of the compressed stream just before writing it into the channel. You should also verify that the content length is set correctly when doing the http request. It would be useful to see the actual http request headers and make sure that content length header matches the actual content length in the http response.
Also it looks like contentEncoding could be set incorrectly. Try using:.contentEncoding("Content-Encoding: gzip") as used in this TCK test. Although still the best thing to do is inspect the HTTP request and response. You can use wireshark to do that easily.
Also you need to make sure that GCSOutputChannel is closed as that's when the file is finalized.
Hope this puts you on the right track. To gzip your contents you can use java GZIPInputStream.

I'm seeing the same issue, easily reproducable by uploading a file with "gsutil cp -Z", then trying to open it with the following
ByteArrayOutputStream output = new ByteArrayOutputStream();
try (GcsInputChannel readChannel = svc.openReadChannel(filename, 0)) {
try (InputStream input = Channels.newInputStream(readChannel))
{
IOUtils.copy(input, output);
}
}
This causes an exception like this:
java.lang.IllegalStateException:
....oauth.OauthRawGcsService$2#1883798: got 64303 > wanted 4096
at ....Preconditions.checkState(Preconditions.java:199)
at ....oauth.OauthRawGcsService$2.wrap(OauthRawGcsService.java:519)
at ....oauth.OauthRawGcsService$2.wrap(OauthRawGcsService.java:499)
The only work around I've found is to read the entire file into memory using readChannel.read:
int fileSize = 64303;
ByteBuffer result = ByteBuffer.allocate(fileSize);
try (GcsInputChannel readChannel = gcs.openReadChannel(new GcsFilename("mybucket", "mygzippedfile.xml"), 0)) {
readChannel.read(result);
}
Unfortunately, this only works if the size of the bytebuffer is greater or equal to the uncompressed size of the file, which is not possible to get via the api.
I've also posted my comment to an issue registered with google: https://code.google.com/p/googleappengine/issues/detail?id=10445

This is my function for reading compressed gzip files
public byte[] getUpdate(String fileName) throws IOException
{
GcsFilename fileNameObj = new GcsFilename(defaultBucketName, fileName);
try (GcsInputChannel readChannel = gcsService.openReadChannel(fileNameObj, 0))
{
maxSizeBuffer.clear();
readChannel.read(maxSizeBuffer);
}
byte[] result = maxSizeBuffer.array();
return result;
}
The core is that you cannot use the size of the saved file cause Google Storage will give it to you with the original size, so it checks the sizes you expected and the real size and these are differents:
Preconditions.checkState(content.length <= want, "%s: got %s > wanted
%s", this, content.length, want);
So i solved it allocating the biggest amount possible for these files using BlobstoreService.MAX_BLOB_FETCH_SIZE. Actually maxSizeBuffer is only allocated once outsize of the function
ByteBuffer maxSizeBuffer = ByteBuffer.allocate(BlobstoreService.MAX_BLOB_FETCH_SIZE);
And with maxSizeBuffer.clear(); all data is flushed again.