Can i store File chunks in different systems in the IPFS Network? - distributed

I am using IPFS for file storage for my application, I have created a private network swarm of 4 nodes. I assumed initially that the file would be chunked and the chunks will be stored into different devices(Which is my requirement). But I found out that the file will be stored into the local after chunking from this blog Where does IPFS store all the data?. Now I was wondering if it is possible to chunk the data when ipfs add filename and store the chunks on different systems is possible.
If so how can it be achieved?

IPFS is not a file storage but a p2p protocol.
Files are still stored locally when you do ipfs add filename
Other peers can store your published file by requesting the file then pin it.
Instead of thinking about changing the IPFS protocol, what you can do is to build your application on IPFS that behaves as storage such as filecoin.

Related

How to restore Tensorflow model from Google bucket without writing to filesystem?

I have a 2gb Tensorflow model that I'd like to add to a Flask project I have on App Engine but I can't seem to find any documentation stating what I'm trying to do is possible.
Since App Engine doesn't allow writing to the file system, I'm storing my model's files in a Google Bucket and attempting to restore the model from there. These are the files there:
model.ckpt.data-00000-of-00001
model.ckpt.index
model.ckpt.meta
checkpoint
Working locally, I can just use
with tf.Session() as sess:
logger.info("Importing model into TF")
saver = tf.train.import_meta_graph('model.ckpt.meta')
saver.restore(sess, model.ckpt)
Where the model is loaded into memory using Flask's #before_first_request.
Once it's on App Engine, I assumed I could to this:
blob = bucket.get_blob('blob_name')
filename = os.path.join(model_dir, blob.name)
blob.download_to_filename(filename)
Then do the same restore. But App Engine won't allow it.
Is there a way to stream these files into Tensorflow's restore functions so the files don't have to be written to the file system?
After some tips from Dan Cornilescu and digging into it I found that Tensorflow builds the MetaGraphDef with a function called ParseFromString, so here's what I ended up doing:
from google.cloud import storage
from tensorflow import MetaGraphDef
client = storage.Client()
bucket = client.get_bucket(Config.MODEL_BUCKET)
blob = bucket.get_blob('model.ckpt.meta')
model_graph = blob.download_as_string()
mgd = MetaGraphDef()
mgd.ParseFromString(model_graph)
with tf.Session() as sess:
saver = tf.train.import_meta_graph(mgd)
I didn't actually use Tensorflow, the answer is based on docs and GAE-related knowledge.
In general using GCS objects as files in GAE to avoid the lack of a writable filesystem access relies on one of 2 alternate approaches instead of just passing a filename to be directly read/written (which can't be done with GCS objects) by your app code (and/or any 3rd party utility/library it may be using):
using an already open file-like handler for reading/writing the data from/to GCS. Which your app would obtain from using either of:
the open call from a GCS client library instead of the generic one typically used for a regular filesystem. See, for example Write a CSV to store in Google Cloud Storage or pickling python objects to google cloud storage
some in-memory faking of a file, using something like StringIO, see How to zip or tar a static folder without writing anything to the filesystem in python?. The in-memory fake file also gives easy access to the raw data in case it needs to be persisted in GCS, see below.
directly using or producing just the respective raw data which your app would be entirely responsible for actually reading from/writing to GCS (again using a GCS client library's open calls), see How to open gzip file on gae cloud?
In your particular case it seems the tf.train.import_meta_graph() call supports passing a MetaGraphDef protocol buffer (i.e. raw data) instead of the filename from which it should be loaded:
Args:
meta_graph_or_file: MetaGraphDef protocol buffer or filename (including the path) containing a MetaGraphDef.
So restoring models from GCS should be possible, something along these lines:
import cloudstorage
with cloudstorage.open('gcs_path_to_meta_graph_file', 'r') as fd:
meta_graph = fd.read()
# and later:
saver = tf.train.import_meta_graph(meta_graph)
However from the quick doc scan saving/checkpointing the modes back to GCS may be tricky, save() seem to want to want to write the data to disk itself. But I didn't dig too deep.

Split large SQL backup

I have a large SQL database (~1TB) that I'm trying to backup.
I can back it up fine but we want to store it offsite on Amazon S3, where the maximum object size is 5GB.
I thought I could split it by using multiple files, but it seems the maximum is 64 so I'm still ending up with 16GB chunks which are too big for S3.
Is there any other way to do it?
The maximum blob size for S3 is 5TB, not 5GB. 5GB is only the largest object that can be uploaded with a single HTTP PUT.
All cloud providers follow the same pattern: instead of uploading one huge file and storing it as a single blob, they break it apart into blocks that they replicate across many disks. When you ask for data, the provider retrieves it from all these blocks. To the client though, the blob appears as a single object.
Uploading a large file requires blocks too. Instead of uploading a large file with a single upload operation (HTTP PUT) all providers require that you upload individual blocks and finally notify the provider that these blocks constitute one object. This way, you can re-upload only a single failed block in case of failure, the provider can commit each block while you send the next, they don't have to track and lock a large blob (on a large disk) waiting for you to finish uploading etc.
In your case, you'll have to use an uploader that understands cloud storage and uses multiple blocks, perhaps something like Cyberduck, or S3 specific command-line tools. Or write a utility that uses Amazon's SDK to upload the backup file in parts.
Amazon's documentation site offers examples for multipart uploads at Uploading Objects Using Multipart Upload API. The high-level examples demonstrate various ways to upload a large file. All calls though use multi-part uploads, eg the simplest call :
var client= new AmazonS3Client(Amazon.RegionEndpoint.USEast1);
var fileTransferUtility = new TransferUtility(client);
fileTransferUtility.Upload(filePath, existingBucketName);
will upload the file using multiple parts and use the file's path as its key. The most advanced example allows you to specify the part size, a different key, redundancy options etc:
var fileTransferUtilityRequest = new TransferUtilityUploadRequest
{
BucketName = existingBucketName,
FilePath = filePath,
StorageClass = S3StorageClass.ReducedRedundancy,
PartSize = 6291456, // 6 MB.
Key = keyName,
CannedACL = S3CannedACL.PublicRead
};
fileTransferUtilityRequest.Metadata.Add("param1", "Value1");
fileTransferUtilityRequest.Metadata.Add("param2", "Value2");
fileTransferUtility.Upload(fileTransferUtilityRequest);

Store file reference in LDAP

In initial phase, developer started with storing small file in an LDAP attribute. Later, as file size grow, it became a problem. Now I am planning to change it like, storing file content in disk and file path in a attribute. My doubt, Is it possible for the OpenLDAP server to automatically serve the file content, as the client read that attribute??
I saw reference attributes like LabeledURI. Is there any specific Attribute to handle this situation?
Nope, it's not possible and a bad idea.
A LDAP directory should never be treated as a file store, as it is designed to host many but small objects. To be performant, requests should be as short as possible.
A NAS would be better suited to host those files.
You'll have to modify your code to access those files based on a filename stored in the directory.
Certainly storing the URI to a file (or any other Resource) is possible and often done.
Serving a file, depends on the LDAP server implementation and size of the file. Certificates and photos are often stored in LDAP.
eDirectory, as an example, streams data over a certain size, to a file in the DIB store. However, the LDAP protocol is not very efficient in streaming large blocks or data.
-jim

Play! + GAE + File Upload

Usually with the Play framework, when you upload a file, it appears as a File object to the controller, and the file itself is stored in a tmp folder. In GAE this won't work because GAE does not allow writing to the filesystem.
How would one upload a file and access the stream directly in the controller?
So i figured out the solution. In the controller, instead of passing in a File object, you just pass in a byte[], and use a ByeArrayInputStream to get that into a more usable form. In my case I needed to pass in the file data to a csv parser which takes an InputStream.
i'm not familiar with the play framework either but generally, for multipart requests (e.g. file uploads),
the data from the inputstream is written to a temporary file on the local filesystem if the input size is large enough
the request is then dispatched to your controller
your controller gets a File object from the framework. (this file object is pointing to the temporary file)
for the apache commons upload, you can use the DiskFileItemFactory to set the size threshold before the framework decides whether to write the file to disk or keep it in memory. If kept in memory, the framework copies the data to a DataOutputStream (this is done transparently so your servlet will still be working with the File object without having to know whether the file is on disk or in memory).
perhaps there is a similar configuration for the play framework.

grails file upload

Hey. I need to upload some files (images/pdf/pp) to my SQLS Database and thereafter, download it again. I'm not sure what is the best solution - store it as bytes, or store it as file (not sure if possible). I need later to databind multiple domain classes together with that file upload.
Any help would be very much apreciated,
JM
saving files in the file system or in the DB is a general question which is asked here several times.
check this: Store images(jpg,gif,png) in filesystem or DB?
I recommend to save the files in the file system and just save the path in the DB.
(if you want to work with google app-engine though you have to save the file as byte array in the DB as saving files in the file system is not possible with google app-engine)
To upload file with grails check this: http://www.grails.org/Controllers+-+File+Uploads

Resources