Write large files in Jackrabbit repository using strem - jackrabbit

I need to write large files in Jackrabbit repository. To avoid memory problems I want to send data from client in small byte arrays, and stateful-bean will write them to repository in some kind of stream.
P.S. Sorry for poor english.

In JCR 2.0 you create a binary property via the
Node.setProperty(java.lang.String name,Binary value)
method, and Binary has a getInputStream() method that you can use to supply the content stream.
This means that you should be able to stream directly from the client to the repository, if the repository implementation supports streaming and if you setup the whole chain correctly. I assume Apache Jackrabbit does support streaming in such a scenario but you might need to check that depending on which version you use.
I don't think JCR supports appending streams to existing Properties, so if you absolutely need to send data from the client across several requests, you'll need to keep a long-lived JCR Session around across multiple client requests, and feed the Binary's stream from the data that you get from the client, in small chunks. This looks more complicated than direct streaming, but should work as well.

Related

Using Python Processors in Java Flink Application

I have a use case where I want to implement an AWS Kinesis Data Application with Flink in Java. It will listen to multiple Kinesis streams via the Data Streams API. However, the analysis of those streams will be done in Python (since our data scientists prefer Python).
From this answer, there appears to be support for calling Python UDFs from Java. However, I want to be able to convert an incoming stream to a table, via
StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env);
Table sessionsTable = tableEnv.fromDataStream(inputStream);
...and then have a Python processor that is invoked to process that stream.
I really have 3 questions here:
Is this a supported use case?
If so, is there documentation that describes how to do so?
If so, will this add significant overhead to the application?
The starting point in the Flink documentation for learning about using Python with Tables and Datastreams is at https://ci.apache.org/projects/flink/flink-docs-stable/docs/dev/python/overview/.
The Python APIs only provide a subset of what's available from Java; you'll have to look and see if what you need is included.
Not sure about performance, but you can, for example, convert back and forth between Flink Tables and Pandas dataframes.

Apache Flink load ML model from file

I'd like to know if there is a way(or some sort of code example) to load an encoded pre-trained model (written in python) inside a Flink streaming application.
So I can fit the model using the weights loaded from the file system and the data coming from from stream.
Thank you in advance
You can do this in a number of different ways. Generally, the simplest way would be to simply invoke the code that downloads the model from some external storage like s3 for example in the open method of your function. Then You can use the library of Your choice to load the pre-trained weights and process the data. You can look for some inspiration here, this is the code for loading model serialized with protobuf read from Kafka, but You can use it to understand the principles.
Normally I wouldn't recommend reading the model from the file system as it's much less flexible and troublesome to maintain. But that can be possible too, depending on Your infrastructure setup. The only thing, in that case, would be to assert that the file with the model is available on the machine that Pipeline will run on.

can we use JSON as a database?

I'm looking for fast and efficient data storage to build my PHP based web site. I'm aware of MySql. Can I use a JSON file in my server root directory instead of a MySQL database? If yes, what is the best way to do it?
You can use any single file, including a JSON file, like this:
Lock it somehow (google PHP file locking, it's possibly as simple as adding a parameter to file open function or changing function name to locking version).
Read the data from file and parse it to internal data stucture.
Optionally modify the data in internal data structure.
If you modified the data, truncate the file to 0 length and write new data to it.
Unlock the file as soon as you can, other requests may be waiting...
You can keep using the data in internal structures to render the page, just remember it may be out-dated as soon as you release the file lock and other HTTP request can modify it.
Also, if you modify the data from user's web form, remember that it may have been modified in between. Like, load page with user details for editing, then other user deletes that user, then editer tries to save the changed details, and should probably get error instead of re-creating deleted user.
Note: This is very inefficient. If you are building a site where you expect more than say 10 simultaneous users, you have to use a more sophisticated scheme, or just use existing database... Also, you can't have too much data, because parsing JSON and generating modified JSON takes time.
As long as you have just one user at a time, it'll just get slower and slower as amount of data grows, but as user count increases, and more users means both more requests and more data, things start to get exponentially slower and you very soon hit limit where HTTP requests start to expire before file is available for handling the request...
At that point, do not try to hack it to make it faster, but instead pick some existing database framework (SQL or nosql or file-based). If you start hacking together your own, you just end up re-inventing the wheel, usually poorly :-). Well, unless it is just programming exercise, but even then it might be better to instead learn use of some existing framework.
I wrote an Object Document Mapper to use with json files called JSON ODM may be a bit late, but if it is still needed it is open source under MIT Licence.
It provides a query languge, and some GeoJSON tools
The new version of IBM Informix 12.10 xC2 supports now JSON.
check the link : http://pic.dhe.ibm.com/infocenter/informix/v121/topic/com.ibm.json.doc/ids_json_007.htm
The manual says it is compatible with MongoDB drivers.
About the Informix JSON compatibility
Applications that use the JSON-oriented query language, created by
MongoDB, can interact with data stored in Informix® databases. The
Informix database server also provides built-in JSON and BSON (binary
JSON) data types.
You can use MongoDB community drivers to insert, update, and query
JSON documents in Informix.
Not sure, but I believe you can use the Innovator-C edition (free for production) to test and use it with no-cost either for production enviroment.
One obvious case when you can prefer JSON (or another file format) over database is when all yours (relatively small) data stored in the application cache.
When an application server (re)starts, an application reads data from file(s) and stores it in the data structure.
When data changes, an application updates file(s).
Advantage: no database.
Disadvantage: for a number of reasons can be used only for systems with relatively small data. For example, a very specific product site with several hundreds of products.

Java EE, EJBs File handling

I'm developing a web application in which users are allowed to upload pictures, the system will then generate thumbs for them.
My problem relies on the fact that EJBs can be distributed on several servers and thus are not allowed to handle files directly. I could store the images in the databases but I was hoping to store them as files in one of the servers. How can I do this? Is there any way to centralize the storage of files? Or any approach to deal with files in Java EE with EJBs?
Currently, I'm storing my files in a database. So I have centralized access and I don't need a dedicated file server. I'm doing this because I don't know how to integrate ftp servers and EJBs. Is this however a good alternative?
What I want is: Using Stateless EJBs, store the uploaded images as files and the path to them in the database. So I can display them using
<h:graphicImage ... />
You actually have four aspects here,
Receiving and sending files
Creating thumbnails
Storing the files somewhere every node can access
Storing the original + thumbnail entities in a common database
Since you already have Java EE server, you probably also already have a (HTTP) servlet server, in which there is numerious ways of doing load balancing and caching, not to mention the obious potential for web-based interaction. If anything, support FTP transfer with a directory watcher as a bonus.
You should not create the thumbnails using stateless session beans, this means your servers will be crap at peak time - the server will give priority to buisness logic over making new connections. Rather, first receieve and store the file + original entity in the database, and then use a service bean to queue up thumbnail creation (maybe with n worker threads or message queues if you want). You can also use native tools in some cases, we do in linux.
You should use a shared file system, SAN, which is the right tool for sharing files across several machines. And structure your files according to your file system's limits - like number of files per directory and read/write capacity.
And a single database will be good enough for at least a small cluster, as long as you are not killing it with big binary blobs.
If in doubt, buy more ram ;) Especially the thumbnails are very cachable and will give good performance also in Tomcat - if you are not familiar with multi-threading, find a cache at google. Also cache the entities naturally, not only the files.
You might want to use a (private) FTP server for this. Your EJB beans can contact this server for storing and retrieving files.
There are various libraries in Java for accessing FTP servers. Specifically well suited for use in an EJB environment would be JCA based FTP connectors, but 'normal' ones will usually work fine too.
You can also look into using a clustered file system. RedHat Global File System and Symantec's Veritas Clustered File System are two I have experience with. These products allow you to mount the same file system across several servers for read/write access. All your application sees is just another directory. If your environment already has the requisite components (SAN and a good Sys Admin), this might be the best performing solution in a lot of use cases.
However, there are drawbacks to this approach:
You shift complexity from your app to the OS. These products aren't trivial to set up.
Scalability might become an issue if you have a large server farm. And when scaling problems arise finding the bottleneck is not as straight forward as arjan's ftp solution.
You need a SAN.
If you can make reasonable assumptions about "where" your EJB instance is, direct handling of a file is no problem. In your case (since you want to have files) I would read the image into a local temp folder and upload it to a remote destination.
A possible way to do that is http://txconnect.sourceforge.net/ a JCA Transaction Adapter that handes (among others) ftp connections. Configure the factory in xml and inject the connection into your bean and you have ready to go.
Depending on your Application server there might be a special connector available (f.e.: Oracle or IBM systems)
I'd suggest you to stick to your current solution. Ftp access (if needed for purposes other than just keeping files together) can be build on top of your ejb layer. Displaying images stored in DB is not a problem, simple servlet will do the trick.
You can :
Create a WebDAV based file share. This can be done by using many libraries available for Java or other languages. One such library is : http://milton.ettrema.com/index.html
All EJB instances can read /write images from this file share. They would need to use WebDav client libraries
DO setup backups of directories behind this file share

Core Data syncing

is there a way to automatically sync my Core Data Model with a server (preferably REST)?
Thanks
Apple has shared their Sync Services Framework it is documented here:
http://developer.apple.com/documentation/Cocoa/Conceptual/SyncServices/SyncServices.html
This section is specifically related to syncing managed objects:
http://developer.apple.com/documentation/Cocoa/Conceptual/SyncServices/Articles/UsingCoreData.html#//apple_ref/doc/uid/TP40005232
As for which style of data transfer is used, I'm not sure if it using REST, I don't see it immediately obvious.
Typically the data transferred to and from REST services doesn't include large binary objects or complex data structures. If REST is a requirement you may need to do something custom, but search through the available documentation you might find everything you are looking for.

Resources