Appengine not encoding request body in UTF-8 - google-app-engine

Appengine is not respecting req.setCharacterEncoding('UTF-8') when reading the request body.
This is how I read the request body
StringBuilder sb = new StringBuilder();
BufferedReader reader;
req.setCharacterEncoding("UTF-8");
reader = req.getReader();
String line;
while ((line = reader.readLine()) != null) {
sb.append(line).append('\n');
}
reader.close();
// parse body as JSON
data = new JSONObject(sb.toString());
Request with non-english character are parsed properly when running local test server (mvn appengine:devserver) but the version pushed to production does not parse non-english characters (mvn appengine:update); they are read as ?. This discrepancy is what I'm really confused about.
I also tried setting environment variables like
<env-variables>
<env-var name="DEFAULT_ENCODING" value="UTF-8" />
</env-variables>
in appengine-web.xml, but that doesn't change anything.
What could be causing the prod server to not parse non-english characters?

I don't really know why it wouldn't parse the body properly. I needed to parse the body to validate it before passing it onto my backend to do further processing. So, instead of parsing it in GAE, I relayed the body as a byte array to the backend, and let my backend handle the validation. This was the only working solution I can find.

Make sure you set the content-type header on your request correctly - on the client side, as in:
requestBuilder.setHeader("Content-type", "application/json; charset=utf-8");

I had a similar problem and this is the solution that worked for me. What I learned was that by the time the string is completely built (or appended to the string builder), it's too late because you need to specify the charset while reading the bytes and building the string.
The request.setCharacterEncoding doesn't work well in this regard, for reasons I'm unsure of.
The alternative I used for this was:
StringBuilder stringBuilder = new StringBuilder();
BufferedReader bufferedReader = null;
try {
InputStream inputStream = request.getInputStream();
if (inputStream != null) {
bufferedReader = new BufferedReader(new InputStreamReader(inputStream, "UTF-8"));
char[] charBuffer = new char[128];
int bytesRead = -1;
while ((bytesRead = bufferedReader.read(charBuffer)) > 0) {
stringBuilder.append(charBuffer, 0, bytesRead);
}
} else {
stringBuilder.append("");
}
} catch (IOException e) {
e.printStackTrace();
} finally {
if (bufferedReader != null) {
try {
bufferedReader.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
String body = stringBuilder.toString();
I got the input stream of bytes directly from the request and used a BufferedReader to read characters from this stream. I specified the charset here and this allowed me to build the string, while decoding in the respective charset.

Related

Compress an existing string in DynamoDB table to a string : ZipException: Not in GZIP format

We want to compress a big string in our DynamoDB table, which is a JSON object.
I want to simply replace it with a compressed string. I looked into DynamoDB documentation, which uses ByteBuffer to be stored directly, as mentioned here.
But since I don't want to save ByteArray, and instead store a compressed string version of the original string, I have modified it.
Here is what I've done:
public class GZIPStringCompression {
public static String compress(String data) throws IOException {
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream(data.length());
GZIPOutputStream gzipOutputStream = new GZIPOutputStream(byteArrayOutputStream);
gzipOutputStream.write(data.getBytes());
gzipOutputStream.close();
return byteArrayOutputStream.toString();
}
public static String decompress(String compressed) throws IOException {
ByteArrayInputStream bis = new ByteArrayInputStream(compressed.getBytes());
GZIPInputStream gis = new GZIPInputStream(bis);
BufferedReader br = new BufferedReader(new InputStreamReader(gis, StandardCharsets.UTF_8));
StringBuilder sb = new StringBuilder();
String line;
while((line = br.readLine()) != null) {
sb.append(line);
}
br.close();
gis.close();
bis.close();
return sb.toString();
}
}
This gives out the exception:
Exception in thread "main" java.util.zip.ZipException: Not in GZIP format
at java.base/java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:165)
at java.base/java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:79)
at java.base/java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:91)
at GZIPStringCompression.decompress(MyClass.java:41)
at MyClass.main(MyClass.java:16)
I am not sure what I want to is even possible, that's why, want to confirm that here.
Changed this to:
class GZIPStringCompression {
public static String compress(String data) throws IOException {
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream(data.length());
GZIPOutputStream gzipOutputStream = new GZIPOutputStream(byteArrayOutputStream);
gzipOutputStream.write(data.getBytes());
gzipOutputStream.close();
return Base64.getEncoder().encodeToString(byteArrayOutputStream.toByteArray());
}
public static String decompress(String compressed) throws IOException {
ByteArrayInputStream bis = new ByteArrayInputStream(Base64.getDecoder().decode(compressed));
GZIPInputStream gis = new GZIPInputStream(bis);
BufferedReader br = new BufferedReader(new InputStreamReader(gis, StandardCharsets.UTF_8));
StringBuilder sb = new StringBuilder();
String line;
while((line = br.readLine()) != null) {
sb.append(line);
}
br.close();
gis.close();
bis.close();
return sb.toString();
}
}
This somehow worked. Would this be a dependable solution?
‎Your first solution didn't work because you wanted to take a byte array (array of 8-bit bytes) and assign it to a String attribute (basically an array of unicode characters). That doesn't make sense can can result in all sorts of unwanted manipulation for your bytes that makes them unusable when you read them back.
Your approach of converting the byte array into base-64 encoding - basically a subset of ASCII - works, because ASCII characters can indeed be represented as in the String without any manipulation and can be read back just like they were written.
Since you mentioned this is for DynamoDB, I should add the DynamoDB does have the "binary" type in addition to the "string" type, and you could just use that. In Java, you can assign the byte array directly to an attribute with of this type - and don't need to try to "convert" it into a String.

Using Database as Alfresco ContentStore

I'm working with Alfresco 4.2 and I need to use a table in my database as document content store.
Collecting some information hither and thither over the internet, I read that I have to just implement my custom DBContentStore DBContentWriter and DBContentReader classes. Someone told me to take as reference the FileContentStore class.
I need some help to mapping the FileContentStore in order to match my new class.
For example;
The DBContentWriter has to extend AbstractContentWriter and in the API docs I read that the only methods I have to overwrite are:
getReader() to create a reader to the underlying content
getDirectWritableChannel() to write content to the repository.
What about the second method?
protected WritableByteChannel getDirectWritableChannel()
This is called by getContentOutputStream():
public OutputStream getContentOutputStream() throws ContentIOException
{
try
{
WritableByteChannel channel = getWritableChannel();
OutputStream is = new BufferedOutputStream(Channels.newOutputStream(channel));
// done
return is;
}
The main method is the putContent(InputStream is) which wants to write content into a DB table.
public final void putContent(InputStream is) throws ContentIOException
{
try
{
OutputStream os = getContentOutputStream();
copyStreams(is, os);
Where copyStreams does something like this:
public final int copyStreams(InputStream in, OutputStream out, long sizeLimit) throws IOException
{
int byteCount = 0;
IOException error = null;
long totalBytesRead = 0;
try
{
byte[] buffer = new byte[BYTE_BUFFER_SIZE];
int bytesRead = -1;
while ((bytesRead = in.read(buffer)) != -1)
{
// We are able to abort the copy immediately upon limit violation.
totalBytesRead += bytesRead;
if (sizeLimit > 0 && totalBytesRead > sizeLimit)
{
StringBuilder msg = new StringBuilder();
msg.append("Content size violation, limit = ")
.append(sizeLimit);
throw new ContentLimitViolationException(msg.toString());
}
out.write(buffer, 0, bytesRead);
byteCount += bytesRead;
}
out.flush();
}
finally
{
try
{
in.close();
}
catch (IOException e)
{
error = e;
logger.error("Failed to close output stream: " + this, e);
}
try
{
out.close();
}
catch (IOException e)
{
error = e;
logger.error("Failed to close output stream: " + this, e);
}
}
if (error != null)
{
throw error;
}
return byteCount;
}
}
The main target is to write some code in order to write and read from the DB using these methods.
When the out.flush() is called i should have to write into the BLOB field.
thanks
Without looking at the example implementation in FileContentStore it is difficult to determine everything that getDirectWritableChennel() needs to do. Needless to say actually creating a WritableByteChannel to your database should be relatively easy.
Assuming you are using the BLOB type and you are using JDBC to get at your database then you just need to set a stream for your BLOB and turn it in to a channel.
OutputStream stream = myBlob.setBinaryStream(1);
WritableByteChannel channel = Channels.newChannel(stream);
Will you need to overwrite other methods? Maybe. If you have specific issues with those feel free to raise them.

Apache Camel - protocol buffer endpoint

I am trying to create a camel endpoint that listens on a tcp port to receive a message encoded using protocol buffers. [https://code.google.com/p/protobuf/]
I am trying to use netty to open the tcp port but I cannot get it to work.
My camel route builder is:
from("netty:tcp://localhost:9000?sync=false").to("direct:start");
from("direct:start").unmarshal(format)
.to("log:protocolbuffers?level=DEBUG")
.to("mock:result");
I have tried the textline code, but this just causes the error com.google.protobuf.InvalidProtocolBufferException: While parsing a protocol message, the input ended unexpectedly in the middle of a field. This could mean either than the input has been truncated or that an embedded message misreported its own length.
I think I need to use a byte array codec rather than a String, but I can't see a way to do it. I think I could write a custom endpoint to do it, but I'd rather not. Any pointers would be much appreciated.
I sent the message to the camel endpoint using the code below:
public static void main(String[] args) {
try {
TestProtos.Person me = TestProtos.Person.newBuilder().setId(2).setName("Alan").build();
//set up socket
SocketChannel serverSocket;
serverSocket = SocketChannel.open();
serverSocket.socket()
.setReuseAddress(true);
serverSocket.connect(new InetSocketAddress("127.0.0.1", 9000));
serverSocket.configureBlocking(true);
//create BAOS for protobuf
ByteArrayOutputStream baos = new ByteArrayOutputStream();
//mClientDetails is a protobuf message object, dump it to the BAOS
me.writeDelimitedTo(baos);
//copy the message to a bytebuffer
ByteBuffer socketBuffer = ByteBuffer.wrap(baos.toByteArray());
//keep sending until the buffer is empty
while (socketBuffer.hasRemaining()) {
serverSocket.write(socketBuffer);
}
serverSocket.close();
} catch (Exception e) {
System.out.println("error....");
}
}
}
I also ran a test using a file endpoint which worked as expected. I created the file
with:
#Test
public void fileTest() throws Exception {
TestProtos.Person me = TestProtos.Person.newBuilder().setId(2).setName("Chris").build();
File file = new File("/tmp/test.txt");
FileOutputStream out = new FileOutputStream(file);
me.writeTo(out);
out.close();
};

JavaMail MimeBodyPart.SaveFile provide corrupted files

I'm using JavaMail Library to parser email mime message.
I'm trying to extract the attached files and save them to the local disk but the saved files are not valid and their size is different from the original. only *.txt file are saved ok but *.PDF or *.xlsx are not.
Can you please help me to fix the code?
My code is:
private static void Test3() {
String email_string = File_Reader.Read_File_To_String("D:\\8.txt");
MimeMessage mm = Email_Parser.Get_MIME_Message_From_Email_String(email_string);
Email_Parser.Save_Email_Attachments_To_Folder(mm,"D:\\TEST");
}
public static String Read_File_To_String(String file_path) {
byte[] encoded = new byte[0];
try {
encoded = Files.readAllBytes(Paths.get(file_path));
} catch (IOException exception) {
Print_To_Console(exception.getMessage(), true,false);
}
return new String(encoded, m_encoding);
}
public static MimeMessage Get_MIME_Message_From_Email_String(String email_string) {
MimeMessage mm = null;
try {
Session s = Session.getDefaultInstance(new Properties());
InputStream is = new ByteArrayInputStream(email_string.getBytes());
mm = new MimeMessage(s, is);
} catch (MessagingException exception) {
Print_To_Console(exception.getMessage(), true, false);
}
return mm;
}
public static void Save_Email_Attachments_To_Folder(MimeMessage mm, String output_folder_path) {
ArrayList<Pair<String, InputStream>> attachments_InputStreams = Get_Attachments_InputStream_From_MimeMessage(mm);
String attachment_filename;
String attachment_filename_save_path;
InputStream attachment_InputStream;
MimeBodyPart mbp;
for (Pair<String, InputStream> attachments_InputStream : attachments_InputStreams) {
attachment_filename = attachments_InputStream.getKey();
attachment_filename = Get_Encoded_String(attachment_filename);
attachment_filename_save_path = String.format("%s\\%s", output_folder_path, attachment_filename);
attachment_InputStream = attachments_InputStream.getValue();
try {
mbp = new MimeBodyPart(attachment_InputStream);
mbp.saveFile(attachment_filename_save_path);
} catch (MessagingException | IOException exception) {
Print_To_Console(exception.getMessage(), true, false);
}
}
}
You're doing something very strange in Save_Email_Attachments_To_Folder. (Not to mention the strange naming convention using both camel case and underscores. :-)) I don't know what the InputStreams are you're collecting, but constructing new MimeBodyParts based on them and then using the new MimeBodyPart to save the attachment to the file is almost certainly not what you want to do.
What exactly is Get_Attachments_InputStream_From_MimeMessage doing? Why iterate over the message to collect a bunch of InputStreams, then iterate over the InputStreams to save them? Why not iterate over the message to find the attachments and save them as you find them using the MimeBodyPart.saveFile method? Have you seen the msgshow.java sample program?

Different results reading text file from Eclipse local server and Google App Engine

From GWT I read a text file "myFile.txt" as per below. The issue is that I get different results in the "input" string depending on the server:
If I run in from Eclipse Indigo local server (debugging), "input" includes at the end characters "\r" and "\n".
If I run it from Google App Engine, "input" includes at the end character "\n" only, so one character less in input.length.
Why does this happen, and how can I have the same behaviour?
Thanks
String input=readFromFile("myFile.txt");
public String readFromFile(String fileName) {
File file = new File(fileName);
StringBuffer contents = new StringBuffer();
BufferedReader reader = null;
try {
reader = new BufferedReader(new FileReader(file));
String text = null;
while ((text = reader.readLine()) != null) {
contents.append(text).append(System.getProperty("line.separator"));
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
if (reader != null) {
reader.close();
}
} catch (IOException e) {
e.printStackTrace();
}
}
contents.deleteCharAt(contents.length()-2); // Remove to make it work in GAE
contents.deleteCharAt(contents.length()-1);
return contents.toString();
}
Because line separators are different on different OSes. This is what System.getProperty("line.separator") does.
On Windows it's \r\n\ (two chars), while on Linux it's \n (one char). See here..

Resources