Google App Engine DataStore Text UTF-8 Encoding Problem - google-app-engine

I'm building a gwt app that stores the text of random webpages in a datastore text field. Often the text is formatted UTF-8. All the files of my app are stored as UTF-8 and when I run the application on my local machine the entire process works fine. UTF-8 text is stored as such and retrievable ftom the local version of the app engine as UTF-8. However when I deploy the app to the google app engine somewhere between when I store the text and when I retrieve it it is no longer UTF-8 which causes non-ascii characters to be displayed as ?.
When I view the datastore in the appengine control panel all the special characters appear as ? which leads me to believe that it is a problem when writing to the database.
Does anyone know how to fix this?
The app itself is a little big.
Here's some pseudocode:
Text webPageText = new Text(<STRING THAT CONTAINS UNICODE CHARACTERS>);
/*Some Code to store Text object on datastore
Specifically I'm using javax.jdo.PersistenceManager to do this.
Some Code to retrieve text from datastore. */
String retrievedText = webPageText.getValue();
The problem is that retrievedText comes back with ? instead of unicode characters.
Here's a similar problem in python that I found: Trying to store Utf-8 data in datastore getting UnicodeEncodeError. Though my app is not getting any errors.
Unfortunately I think Java strings are default utf-8 and I can't find any code that will let me declare them explicitly as utf-8.
Edit: I've now built a small webapp that takes in unicode text and stores it in the datastore and then retrieves it with no problems. I still have no idea where the problem is in my original source code but I'm going to change the way my code handles webpage retrieval to match the smaller app that I just built. Thank you everyone for your help.

Fixed same issue by setting both request and response encoding to utf-8.
Request encoding results in valid string stored in datastore, without it values will be stored as "????..."
Requests: if you use Apache HTTP Client, this is done in the following way:
Get request:
NameValuePair... params;
...
String url = urlBase + URLEncodedUtils.format(Arrays.asList(params), "UTF-8");
HttpGet httpGet = new HttpGet(url);
Post request:
NameValuePair... params;
...
HttpPost httpPost = new HttpPost(url);
httpPost.setEntity(new UrlEncodedFormEntity(Arrays.asList(params), "UTF-8"));
Response: if you build your response in HttpServlet, this is done in a following way:
HttpServletResponse resp;
...
resp.setContentType("text/html; charset=utf-8");

I tried to convert String to ByteArray and then store it as datastore blob.
//Save String as Blob
Blob webPageText = new Blob(<STRING THAT CONTAINS UNICODE CHARACTERS>.getBytes());
//Retrieve Blob as String
String retrievedText = new String(webPageText.getBytes());
I originally thought this had solved the problem but I had by mistake only tested it on my local server. This code still returns ? instead of unicode characters which leads me to believe that the problem isn't in the datastore but in the transfer from the app engine to the client.

Encoding Solution: Cause Browser use "8859_1" charset
=> Before
Save Datastore, I convert charset.
new String(req.getParameter("title").getBytes("8859_1"),"utf-8")
When I ran this application on my local machine, it was fine. But when I deployed, I faced the same issue you saw. I solved this problem by:
After
=> Save Datastore Code.
new String(req.getParameter("title").getBytes("utf-8"),"utf-8")

These links may prove useful, afterall:
How to set Google App Engine java Content-Type to UTF-8
http://code.google.com/appengine/docs/python/tools/webapp/buildingtheresponse.html

Related

Corrupted PDF when downloading through React

I have a pdf uploaded in a Azure Blobstorage and I'm facing some problems during the download routine.
My application runs with springboot and I use an OutputStream provided by a HttpServletResponse in a #RestController method to stream the bytes from the blobstorage to the request.
In the frontend I have a Reac application receiving the information and executing the download through the browser.
Every time I stream the bytes to the frontend I got a corrupted file. It works just fine when I execute a request through Insomnia or Postman.
I tried to compare the files as texts and I could see some differences between them.
Differences
The size of the corrupted file is almost double the size of the original
When I opened files on Notepad++ they seems to be in a different encoding
corrupted
consistent
It looks like there are some characters bad interpreted
corrupted
consistent
My frontend uses FileSaver #2.0.2 to persist the file on disk
const blobParts = [];
const blobOptions = {
type: axiosResponse.headers['content-type'],
};
blobParts.push(axiosResponse.data);
const file = new File(
blobParts,
axiosResponse.headers['content-disposition'].split('=')[1],
blobOptions,
);
return FileSaver.saveAs(file);
I'm wondering if there's a way of keep the ANSI encoding through the persistence process or if there's is a way

How to send and store image in a Flutter+Spring Boot+PostgreSQL+Heroku structure?

I am developing a mobile application with Flutter framework.
In backend API side, I use Spring Boot framework and deploy it to the Heroku(free-plan).
In database side, I use PostgreSQL add-on in Heroku.
Everything okay before working with the images. I am confused when I need to send image to server and store it. What is best practice of it? I saw two option after the some searching. These are:
First Option
In Flutter side, take the image from the user
In Flutter side, convert the image to the BASE64 string format.
In Flutter side, POST it as a JSON object to backend.
In Spring Boot side, get the BASE64 string and store it to the
PostgreSQL db.
Second Option
In Flutter side, take the image from the user
In Flutter side, convert the image to the BASE64 string format.
In Flutter side, POST it as a JSON object to backend.
In Spring Boot side, get the BASE64 string and convert it to the
real image file;
In Spring Boot side, save the actual image file into the file
system of hosting machine and store path of the image to the
PostgreSQL db. (But Heroku doesn't allow writes on its filesystem)
(Even it is possible to write on its filesystem, Every new
deployment, the images would be gone)
if I choose second option, what should I do for solving the saving image in file system of Heroku?
Which option should I use?
Are there any another good option?
Saving images in your database is usually not recommended. Instead you could try hosting on another platform (for example AWS) that does allow file storage.
However, if you do not have too many images and won't access them very often, you can store them in the database. Instead of the first option however, I recommend letting Spring boot convert your BASE64 string to an actual image. You can then store this image as a BLOB in your database. This makes sure the database optimizes for BLOBs and doesn't create indexes and other optimizations for text entries.
i think you should:
take the image from the user in the flutter app;
convert the image in the base64 format;
send to the backend by rest api in json format;
in the spring app convert the string into a blob data and save in
the database;
when you need to read that:
retrieve the blob from the db;
convert to the string;
send to flutter;
convert to real image;
i usually face up this requirement with this approach and it works really well.
i write bellow a simple function from/to blob/string that you can use
toBlob
public static Blob toBlob(String s) throws SQLException {
if (Objects.nonNull(s)) {
return new SerialBlob(s.getBytes(StandardCharsets.UTF_8));
} else return null;
}
toString
public static String toString(Blob b) throws SQLException {
if (Objects.nonNull(b)) {
return new String(b.getBytes(1L, (int) b.length()), StandardCharsets.UTF_8);
} else return null;
}

Reading PlayStore csv review files from Google storage bucket using Java App Engine

This problem has stumped me for most part of my day.
BACKGROUND
I am attempting to read the Play Store reviews for my Apps via my own Google App Engine Java project.
Now I am able to get the list of all the files using Google Cloud Storage client api (java).
I can also read the meta for each of the csv files in that bucket and print it to the logs:
PROBLEM
I simply can't find a way to read the actual object and get the csv data.
My java code snippet:
BUCKET_NAME = "pubsite_prod_rev_*******";
objectFileName = "reviews/reviews_*****_***.csv"
Storage.Objects.Get obj = client.objects().get(BUCKET_NAME, objectFileName);
InputStream is = obj.executeMediaAsInputStream();
Now when I print this inputstream, it tells me its GZIPInputStream (java.util.zip.GZIPInputStream#f0be2c). Converting this inputstream to byte[] or String (desired) does not work.
And if I try to envelope it inside GZIPInputStream object using:
zis = new GZIPInputStream(is);
it throws ZipException : Not in GZIP format.
Metadata of the file:
"contentType": "text/csv; charset=utf-16le",
"contentEncoding": "gzip",
What wrong am I doing?
Sub Question: In the past I have successfully read text data from Google Cloud Storage using GcsService, but it does not seem to work with the Buckets which have the Play Store review csv files. Does anybody know if my Google App Engine project (connected to same Google developer account) can read these Buckets?
Solved it using executeMedia() and parseAsString
HttpResponse response = obj.executeMedia();
response.parseAsString(); //works!!

How do i determine the stream size from an uploaded file from a website which i want to insert in Google Drive

I'm trying to upload files to Google Drive with ProgressListener and ChunkSize enabled (thus with DirectUploadEnabled disabled). This way i have a more reliable upload and the possibility for a progress indication to the user.
I transfer the files from the GWT website to the GAE with a FormPanel and a FileUploadField which POSTS the file to GAE on submit(). On the GAE i receive the file with an UploadServlet which uses org.apache.commons.fileupload to receive the documents as a stream. I don't want to receive the complete documents on the GAE because the documents are to big. Therefore i start the upload (insert) to Google Drive with the received stream from the incoming request.
Now there's a problem; for the insert i need to know the size of the stream;
int lContentLength = getRequest().getContentLength();
FileItemStream lFileItemStream = getFileItemStream();
InputStream lInputStream = lFileItemStream.openStream();
BufferedInputStream lBufferedInputStream = new BufferedInputStream(lInputStream);
InputStreamContent lInputStreamContent = new InputStreamContent(pContentType, lBufferedInputStream);
lInputStreamContent.setLength(lContentLength);
My first guess was the ContentLengt from the incoming Servlet request. But this is not correct because this concerns the complete request (which also contains other fields which are used as parameters). Without the Drive option DirectUploadEnabled i need the exact stream size from the uploaded document, otherwise the upload stall's at the end...
How do i grap this document size? The Google example is stupid because it uses a local file;
https://code.google.com/p/google-api-java-client/wiki/MediaUpload
Yes from a local file it is easy to get the file size (mediaFile.length()). But from a website ... Several sites specify it is not possible to grab the file size before submit() from the website, and it seems also impossible to determine the stream-size on GAE without loading the complete file...
How do i determine this streamsize? Is there another solution for this problem?

webapp2 - blobstore or request.get adding =\r\n every 75 chars for long parameters?

I have an iOS app that is sending fairly large JSON POST parameters (a few hundred characters long) in addition to an image to my App Engine instance via a blobstore url.
For some reason, the JSON string returned by self.request.get('foo') has Carriage Returns (ie, characters with decimal value 13) inserted every 76 characters. This is causing the JSON parser to throw errors about control characters. Why is this happening and is there a way to stop it?
I am fairly certain that my app is not adding these characters, as I used a proxy to inspect the HTTP requests and the JSON string was formatted correctly.
Thanks!
EDIT:
I discovered that it is actually adding =\r\n every 75 characters which lead me to another SO question with a pointer to a bug in App Engine's blobstore.
This is a duplicate of (except it is in webapp2 instead of django): Data gets corrupted on form send, =\r\n introduced in the data every 75 characters?
And the solution is at: Encoding problem in app engine when submitting multipart/form-data forms
This is fixed with webob 1.2.3, that will be available in the next App Engine release: 1.7.4
In the meantime you can deploy webob 1.2.3 with your application by copying the webob subdirectory contained in their release tarball to your application directory.

Resources