JAXWS - Best Way to transfer file is byte[] or MTOM - arrays

I am trying to find best way to transfer files in JAXWS webservice.
MTOM or byte[] are the options I found. Someone can tell me which is the best way and why

There's not a universal best way to transfer files. It depends on your needs.
As a byte array (Base64)
You should refer to it as Base64 encoded and not as byte[] (the file is sent as text using Base64 encoding and you will handle it in your java code as a byte[]). This method is fast since it only needs to encode the data to Base64 text and write it to the soap message, but the encoded data is 33% larger than the original file size.
So this method is only recommended for small files.
Using MTOM
This is the recommended method for bigger files because it does not increase the file size since it doesn't send the file as encoded data but as a MIME attachment. This method involves some steps more than just encoding the data so it takes a little more processing time although the difference may not be that big.
Best of both worlds
Most web services frameworks allow you to specify a threshold that indicates the minimum file size needed to use MTOM. If the file does not reach that size, the data will be sent as Base64 encoded text.
Example in JAX-WS:
#WebService
#MTOM(threshold = 3072)
public class MyWebService {
}
This means that if the file is less than 3 Mb it will be sent as Base64 text, and if it's bigger it will be sent using MTOM. This is a very common approach and it will probably suit your needs.
For a file to be considered small or big it depends on your hardware, concurrent clients using the service, etc.

Related

Exceeding size limit with Sagemaker endpoint

I have setup a Sagemaker inference endpoint for processing images. I am sending a json request like this to the endpoint:
data = {
'inferenceType' : 'SINGLE_INSTANCE',
'productType' : productType,
'images': [encoded_image_bytes_as_string],
'content_type': "application/json",
}
payload = json.dumps(data)
response = sagemaker_runtime_client.invoke_endpoint(
EndpointName=endpoint_name,
ContentType="application/json",
Body=payload)
where image is an array of base64 encoded images. The endpoint works fine except when I send large images I exceed the Sagemaker's inference payload size limit of:
Maximum payload size for endpoint invocation 6 MB
what other data formats can I make use of that are smaller than JSON? Is there any possibility of using something like gzip to compress my payload before sending it? I know that Sagemaker asynchronous endpoints and batch transform have higher allowable payload sizes however I require realtime inference. Thanks!
You're currently sending the image bytes inefficiently as Base64 (which is ~1.3x bigger than just bytes). If you'll send bytes instead of JSON, it will allow you to grow the maximum image from (6/1.13)MB to 6MB.
You could also contact AWS support and try to ask increase the maximum payload size.
If you need more than that, then you'll need to write the file to some storage (like S3 or EFS), then send the image ref to the endpoint which will read back the image from that storage. Overall, quite hard to pull off, reliably, end to end, in <500ms.
Asynchronous is technically a Real Time hosting option in SageMaker. Depending on the kind of latency requirements you have for your invocations, I would recommend exploring Asynchronous Inference as that is designed for large payloads. I would suggest running some load tests with Asynchronous endpoints.

UTFDataFormatException - encoded string too long

In my mobile app I have an object implementing PropertyBusinessObject which contains numerous other objects also implementing this interface. This object structure is populated by JSON data I am getting back from my server. When I try to write this object to Storage with writeObject() I get the above error. The stacktrace shows it originating in the com.codename1.io.Util.writeObject() method where it is writing UTF-8 (limited to 64k). The developer guide does not reference any potential issues with Storage and recommends it over FileSystemStorage. My question is, is there a workaround/update for this? Would I have to revert to writing out the object structure to the filesystem? Thanks.
If you have a ridiculously long string e.g. to represent the contents of the file I would suggest rethinking that. Strings are inefficient in Codename One since we need to copy their representation into the iOS native layer. Also writing them to UTF is very wasteful if what you need is a binary representation. I suggest using a byte array.
Serializing to storage is a simple approach. It works great for small objects. If you have larger objects you might want to store them individually. You can also serialize to/from JSON so your storage data is readable.

How to implement a lossless URL shortening

First, a bit of context:
I'm trying to implement a URL shortening on my own server (in C, if that matters). The aim is to avoid long URLs while being able to restore a context from a shortened URL.
Currently I have a implementation that creates a session on the server, identified by a certain ID. This works, but consumes memory on the server (and is not desired since it's an embedded server with limited resources and the main purpose of the device isn't providing web pages but doing other cool stuff).
Another option would be to use cookies or HTML5 webstorage to store the session information in the client.
But what I'm searching for is the possibility to store the shortened URL parameters in one parameter that I attach to the URL and be able to re-construct the original parameters from that one.
First thought was to use a Base64-encoding to put all the parameters into one, but this produces an even larger URL.
Currently, I'm thinking of compressing the URL parameters (using some compression algorithm like zip, bz2, ...), do the Base64-encoding on that compressed binary blob and use that information as context. When I get the parameter, I could do a Base64-decoding, de-compress the result and have hands on the original URL.
The question is: is there any other possibility that I'm overlooking that I could use to lossless compress a large list of URL parameters into a single smaller one?
Update:
After the comments from home, I realized that I overlooked that compressing itself adds some overhead to the compressed data making the compressed data even larger than the original data because of the overhead that for example zipping adds to the content.
So (as home states in his comments), I'm starting to think that compressing the whole list of URL parameters is only really useful if the parameters are beyond a certain length because otherwise, I could end up having an even larger URL than before.
You can always roll your own compression. If you simply apply some huffman coding, the result will always be smaller (but then base64 encoding it, it'll grow a bit, so the net effect may perhaps not be optimal).
I'm using a custom compression strategy on an embedded project I work with where I first use a lzjb (a lempel ziv derivate, follow link for source code, really tight implementation (from open solaris)) followed by huffman coding the compressed result.
The lzjb algorithm doesn't perform too well on very short inputs, though (~16 bytes, in which case I leave it uncompressed).

Silverlight for WP7: Trim an existing media file

WP7 Mango is making it possible to save custom ringtones from apps. That's great and all, but not if your source material is too long in length (ringtones must be < 40 seconds or so).
I'm hoping it is possible to take an existing audio file (wma, lets say) and trim it by setting a start/end point, so you can export just a part of the audio for ringtone use.
I gather from other SO questions that audio encoding directly in silverlight is not really feasible. But I don't really want full encoding capabilities, just the ability to trim an existing already encoded file. Any pointers?
I was thinking about doing this as well (until I discovered that we have no access to the music already on the phone).
An mp3 should be pretty easy to do by checking the header (see here: http://www.mpgedit.org/mpgedit/mpeg_format/mpeghdr.htm) and then using the bit rate and frame size to calculate the number of bytes to copy using BinaryReader and BinaryWriter.
I haven't looked into wma but after glancing over the specifications it looks like it may be more complicated (specs: http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=14995).

Java Project Modules - use InputStream/OutputStream or .tmpFile/byte[]

I found myself passing InputStream/OutputStream objects around my application modules.
I'm wondering if it's better to - save the content to disk and pass something like a Resource between the various methods calls - use a byte[] array instead of having to deal with streams everytime.
What's your approach in these situations?Thanks
Edit:
I've a Controller that receives a file uploaded by the user. I've an utility module that provides some functionality to render a file.
utilityMethod(InputStream is, OutputStream os)
The file in InputStream is the one uploaded by the user. os is the stream associated with the response. I'm wondering if it's better to have the utility method to save the generated file in a .tmp file and return the file path, or a byte[], etc. and have the controller to deal with the outputStream directly.
I try to keep as much in RAM as possible (mostly because of performance reasons and RAM is cheap). So I'm using a FileBackedBuffer to "save" data of unknown size. It has a limit. When less than limit bytes are written to it, it will keep them in an internal buffer. If more data is written, I'll create the actual file. This class has methods to get an InputStream and an OutputStream from it, so the using code isn't bothered with the petty details.
The answer actually depends on the context of the problem, which we dont know.
So, imagining the most generic case, I would create two abstractions. The first abstraction would take InputStream/OutputStream as parameters, whereas the other would take byte[].
The one that takes streams can read and pass the data to the byte[] implementation. So now your users can use both the stream abstraction and byte[] abstraction based on thier needs/comfort.

Resources