Store file reference in LDAP - file

In initial phase, developer started with storing small file in an LDAP attribute. Later, as file size grow, it became a problem. Now I am planning to change it like, storing file content in disk and file path in a attribute. My doubt, Is it possible for the OpenLDAP server to automatically serve the file content, as the client read that attribute??
I saw reference attributes like LabeledURI. Is there any specific Attribute to handle this situation?

Nope, it's not possible and a bad idea.
A LDAP directory should never be treated as a file store, as it is designed to host many but small objects. To be performant, requests should be as short as possible.
A NAS would be better suited to host those files.
You'll have to modify your code to access those files based on a filename stored in the directory.

Certainly storing the URI to a file (or any other Resource) is possible and often done.
Serving a file, depends on the LDAP server implementation and size of the file. Certificates and photos are often stored in LDAP.
eDirectory, as an example, streams data over a certain size, to a file in the DIB store. However, the LDAP protocol is not very efficient in streaming large blocks or data.
-jim

Related

Server backend: how to generate file paths for uploaded files?

I am trying to create a site where users can upload images, videos and other types of files.
I did some research and people seem to suggest that saving the files as BLOB in database is a Bad idea; instead, save the file paths in database.
My questions are, if I save the file paths in a database:
1. How do I generate the file names?
I thought about computing the MD5 value of the file name, but what if two files have the same name? Adding the username and time-stamp etc. to file name? Does it even make sense?
2. What is the best directory structure?
If a user uploads images at 12/17/2013, 12/18/2018, can I just put it in user_ABC/images/, then create time-stamped sub-directories 20131217, 20131218 etc. ? What is the best structure for all these stuff?
3. How do all these come together?
It seems like maintaining this system is such a pain, because the file system manipulation scripts are tightly coupled with the database operations(may also need the worry about database transactions? Say in one transaction I updated the database but failed to modify the file system so I need to roll back my database?).
And I think this system doesn't scale (what if my machine runs out of hard disk so I need to upload the files to a second machine? What if my contents are on a cluster?)
I think my real question is:
4. Is there any existing framework/design pattern/db that handles this problem?
What is the standard way of handling this kind of problems?
Thanks in advance for your answers.
I've actually asked this same question when I was designing a social website for food chefs. I decided to store the url of the image in a MySQL database along with recipe. If you plan on storing multiple images for one recipe, in my example, maybe having a comma separated value would work. When the recipe loaded on the page, I would fetch the image associated with that recipe onto the screen.
Since it was a hackathon and wasn't meant for production purposes, I didn't encode the file name into something unique. However, if I were developing for productional purposes, I would append the time-stamp to the media file name when storing it into the server and database/backend.
I believe what I've proposed is the best data structure of handling this scenario. Storing the image onto the server is not only faster, but it should also take less space. I have found that when converting a standard jpg file of reasonable resolution to base64 encoding, the encoded text file representation took 30% more space. There is also the time of encoding the file and decoding the file for storage and resolving when using some BLOB type of data format instead of straight up storing the file on the server.
Using some sort of backend server scripting like PHP, you'll be able to do some pretty neat stuff with the information you have available. Fetch the result from the database, and load it in from the page using HTML.
As far as I know, there isn't a standard way of fetching media from a database yet. Perhaps there will be one day.
There is not standard way to do that, it is different to the different application. The idea is you need generate a different Path+FileName for every upload, here is a way:
HashId = sha1(microsecond + random(1,1000000));
Path = /[user_id]/[HashId{0,2}]/[HashId{-2}];
FileName = HashId

Basic File storage

In order to prevent file storage problems like when two people upload a file that might have the same file name...
Is it better to get each user a separate folder to prevent issues or is better to have all files in one folder for all users but change the file-name to keep them unique?
It depends on what you are trying to achieve.
What kind of service do you want to provide? A general file storage service? Then use different folders, since the number of files in a directory may be limited (depending on the file system) and can have major influence on the performance.
Do you provide an upload area for a simple blog? Use a single directory and change the file names.
Sorry, an absolute answer can only be given if you provide more information.

Should data files be stored on the same computer (server) the database is stored in?

Currently in our research group, we have many "data files" stored on three servers and a couple of personal computers running different operating systems.
We want to build a database, which would store some information in addition to the URLs of those various "data files". My question is, do we have to copy all the data files and put them in a directory in the same server the database is in? Or can they be left as they are on the different computers? If the second case is ok, what would be the format of the url of the "data files"?
It really depends on what your intended goal is and what your current setup is like
If the files are currently sitting somewhere on the network, and you need a path that the application can use to access them, you just need to store the network path (\\server\share\file for Windows environments) in the database, then read it and access that path to access the files. You'll need to make sure everyone has read access to them.
If the files are currently accessible through a website URL, internal or external, then again, you just need to store that URL (or some portion thereof) (http://mywebsite.com/myfile or http://servername/myfile) and access that.
If either of the above are not currently true, but you want them to be, then you'll need to set up a new share/webserver and put the files there. There's no requirement that this be the same server as the database, but it'd make for better backups if it was.
If you want the files themselves to be in the database, you should check out Bob Fanger's link.
Not sure what you're asking here but...
If you want your database engine to read files filled with data, it probably doesn't matter where they are stored - though this may depend on the database you are using. Are you using MySQL? MS-SQL Server? Oracle?
Many database vendors provide relatively easy-to-use admin tools that would let you choose a file to be loaded, and usually the file chooser dialoge lets you browse networks so you could load a file over the network. Details on how to do this vary so consult the manual for your database engine for loading data from a pre-existing file.
Be aware that if the database is on Computer A and the data is being loaded from Computer B over the network, it will probably be slower than if the data was on the same computer as the database.
It doesn't really matter if the files are stored outside the database anyway.
See Storing Images in DB - Yea or Nay? for more thoughts on that one.
If the files accessible by an url, you can store that with the meta data, like
http://server1/folder/file.ext, file://\server1\folder\file.ext or "file://P:\folder\file.ext"
Things to consider:
Backups
Performance
Synchronisation between the meta-data and the data

Plone 4 data is stored on the file system rather than in the database?

According to this post:
http://ifpeople.wordpress.com/2010/10/20/plone-4-best-yet-of-the-best-cms/
There is words about data storage:
Plone 4′s capacity to handle very
large files has improved drastically
since all file data is now stored on
the file system rather than in the
database. This enhances the ability of
Plone to scale to handle huge content
repositories out of the box!
I'm not plone user. What the meaning of that words? Is it flat file database?
Instead of storing uploaded pdfs and so in the database, these are now stored in a regular file system folder.
So they're stored as regular files on the regular filesystem. Plone's database itself handles those files transparently, so the application code doesn't need to know whether the files are on the filesystem or inside the database. (The technical term is "BLOB storage": binary large objects).
And, yes, it helps a lot with performance :-)
For another explanation, see point 4 on http://jstahl.org/archives/2010/09/01/5-things-that-rock-about-plone-4/ .
By default, files and images uploaded to a Plone 4 site are no longer stored in the traditional 'filestorage' file (eg. Data.fs), but instead in a specially organised 'blob' storage area on the file system. This is a tremendous help in preventing huge Data.fs files. Everything else is in stored the filestorage as before. The only thing you need to worry about is how to do backups properly, as repozo doesn't support this :-)
No, this quote refers to the inclusion of ZODB "blob support" (http://en.wikipedia.org/wiki/Binary_large_object) in Plone 4. Prior to this release, objects like files and images were stored in the (flat file) Data.fs file (which is part of the ZODB).
Now, they are stored on the filesystem in files (still managed by the ZODB) that look like this:
var/blobstorage
var/blobstorage/.layout
var/blobstorage/0x00
var/blobstorage/0x00/0x00
var/blobstorage/0x00/0x00/0x00
var/blobstorage/0x00/0x00/0x00/0x00
var/blobstorage/0x00/0x00/0x00/0x00/0x00
var/blobstorage/0x00/0x00/0x00/0x00/0x00/0x00
var/blobstorage/0x00/0x00/0x00/0x00/0x00/0x00/0x3b
var/blobstorage/0x00/0x00/0x00/0x00/0x00/0x00/0x3b/0xa5
var/blobstorage/0x00/0x00/0x00/0x00/0x00/0x00/0x3b/0xa5/0x038ba9d72acbdcdd.blob
var/blobstorage/0x00/0x00/0x00/0x00/0x00/0x00/0x3b/0xa9
var/blobstorage/0x00/0x00/0x00/0x00/0x00/0x00/0x3b/0xa9/0x038ba9d836b5cdaa.blob

dsofile c# API / NTFS custom file properties

I'm searching for a good way to add meta data to a file. dsofile.dll works fine for NTFS. The meta data is lost, when one drops a copy on a FAT32 share (it uses NTFS hidden streams I guess). Microsoft Word documents contain meta data that are not lost, how do they do it? Similiar to FAT, sending the file via E-Mail strips of all meta data created with dsofile (and also meta data created by hand with Windows Explorer). Separate meta data files are not an option. It must be compatible with standard Windows techniques. If I send someone a file with Outlook and he sends it back, the meta-data should not be lost.
(the required meta data is actually only an ID)
The issue is that all file systems provide a single-stream view of the file as a greatest-common-denominator. Through this interface which exposes the files "contents", you can read or store properties and have them be transported with the "contents" by naive system (or user-) utilities. For example, CopyFile in Windows will carefully lose alternate data streams and has no notion of "shadow files".
The question is whether or not the format of the "contents" allows for arbitrary addition of properties.
Some formats allow arbitrary content (e.g., MSFT's docfile aka .doc/.xls/etc). Some allow limited content (.mp3, .jpg, .exe).
Some are completely SOL (.txt, .bmp).
Any solution would be format-dependent. MS OFfice files are (all) compound files and there's a place for properties there. In some formats (PE files, for example) it's safe to just append data to the end of the file, if you know how to read them later. In ZIP file you can probably find a place in the directory or just add a helper file with your data to the archive. Other formats can't stand this, and you'd need to find your own way at solving the problem.
Actually, file name can also be a good placeholder for your ID.
If you need to store the files somewhere but don't need the file to remain readable by outside applications, you can pack them to ZIP archive or use something like our SolFS
library.
What about the standard properties rather than custom DSOFile properties? Ie Comments, Author etc? do they get wiped?
Not sure if its ideal but a way we've gotten around it is that we have a tool that will take the DSOfile properties and save a text file, which is then emailed along with the file, and at the other end the user runs a tool to re-import the dsofile properties from the text.

Resources