dsofile c# API / NTFS custom file properties - file

I'm searching for a good way to add meta data to a file. dsofile.dll works fine for NTFS. The meta data is lost, when one drops a copy on a FAT32 share (it uses NTFS hidden streams I guess). Microsoft Word documents contain meta data that are not lost, how do they do it? Similiar to FAT, sending the file via E-Mail strips of all meta data created with dsofile (and also meta data created by hand with Windows Explorer). Separate meta data files are not an option. It must be compatible with standard Windows techniques. If I send someone a file with Outlook and he sends it back, the meta-data should not be lost.
(the required meta data is actually only an ID)

The issue is that all file systems provide a single-stream view of the file as a greatest-common-denominator. Through this interface which exposes the files "contents", you can read or store properties and have them be transported with the "contents" by naive system (or user-) utilities. For example, CopyFile in Windows will carefully lose alternate data streams and has no notion of "shadow files".
The question is whether or not the format of the "contents" allows for arbitrary addition of properties.
Some formats allow arbitrary content (e.g., MSFT's docfile aka .doc/.xls/etc). Some allow limited content (.mp3, .jpg, .exe).
Some are completely SOL (.txt, .bmp).

Any solution would be format-dependent. MS OFfice files are (all) compound files and there's a place for properties there. In some formats (PE files, for example) it's safe to just append data to the end of the file, if you know how to read them later. In ZIP file you can probably find a place in the directory or just add a helper file with your data to the archive. Other formats can't stand this, and you'd need to find your own way at solving the problem.
Actually, file name can also be a good placeholder for your ID.
If you need to store the files somewhere but don't need the file to remain readable by outside applications, you can pack them to ZIP archive or use something like our SolFS
library.

What about the standard properties rather than custom DSOFile properties? Ie Comments, Author etc? do they get wiped?
Not sure if its ideal but a way we've gotten around it is that we have a tool that will take the DSOfile properties and save a text file, which is then emailed along with the file, and at the other end the user runs a tool to re-import the dsofile properties from the text.

Related

Why is there no program-data independence in traditional file processing?

"In traditional file processing, the structure of data files is embedded in the application programs, so any changes to the structure of a file may require changing all programs that access that file. By contrast, DBMS access programs do not require such changes in most cases. The structure of data files is stored in the DBMS catalog separately from the access programs. We call this property program-data independence."
The following text is taken from the book Fundamentals of the Database system. I didn't get the part about the traditional file processing can somebody please explain(an example would be appreciated)?
I'll give you a simple example.
Microsoft Excel used to save its files in a proprietary binary format. In practical terms, this meant that you could only work on those files using Excel.
But now, Excel supports an open document format in XML that is text-based, and allows other programs like the OpenOffice SDK to interact with them. So you no longer need to rely on Excel to work with open document format Excel files.

Is it possible to manipulate pdf files in Visual Basic without an external library/SDK?

I am looking at how to implement PDF merging with raw VB code so that the code may be invoked by a bot for business process automation.
The software used to create the bot provides a function to invoke VB code, but I don't believe it can access any externally imported libraries because it expects plain source, so I essentially need to produce code that one could run in a VB shell environment without anything fancy (or convenient, it seems).
All the research I've done so far point me in the direction of external packages I would need to install, such as iText; this is what I'm looking to avoid.
(previous iText employee here)
PDF is not an easy (binary) format.
Essentially, blobs of information (text that has to be rendered, fonts, images, vector graphics, etc) are compressed and gathered into objects.
Each object gets a number. Objects are allowed to reference eachother (a piece of text might say 'I want to be rendered with font 4433')
All object numbers and their byte offset in the file are gathered in the crossreference (often called XREF) table.
A PDF includes a 'Pages' dictionary object that tells the viewer which objects belong on which page.
In order to merge PDF files, you would need to:
- read all XREF tables of all files
- adjust all of those to the correct byte offset
- update various dictionary objects within the PDF file that tell it where all the objects per page are kept
This is by no means a trivial task, but it can be done using only VB.
If you are serious about implementing a robust, scalable version of this of tool, perhaps it's better to look at the iText sourcecode and try to port it to VB?

File Maker Scripting - Sending Different Attachment

Is there a way to send a mail with different PDF file to different contacts using file maker?
I am aware of sending batch emails with one attachment but I would like to send a personalize PDF for each contact which seems not so simple.
Also
Can I add PDF files to the table itself or would I have to use the path to the file?
Example:
Table 1
**Name** [James Brown] [James Blue]
**Email** [brown.j#gmail.com] [blue.j#gmail.com]
**PDFfileAttchamnet** [folder/PDF/JamesBrown.pdf] [folder/PDF/JamesBlue.pdf]
So an Email for James Brown would look like:
Dear James Brown, please see the attached file.
Attachment [JamesBrown.pdf] {actual file}
and
Dear James Blue, please see the attached file.
Attachment [JamesBlue.pdf] {actual file}
I think you can solve it by creating container field in you database and import the pdfs in it.
then you can use export Field Contents[] to export it and send it by email
Hope it useful
I would like to send a personalize PDF for each contact which seems
not so simple.
Find the records of contacts you want to include and loop among them, sending mail to each one individually (i.e. without selecting the 'Collect addresses across found set' option).
Can I add PDF files to the table itself or would I have to use the
path to the file?
You can do either, it's up to you. If the path to the file can be calculated (as in your example), you can calculate it right there in the Send Mail script step.
Note that you can also generate the PDF files during the process itself.
Do I understand correctly that you would actually like to personalize the PDF document(s)?
This is possible, maybe not very simple, but quite simple. The trick is to prepare the PDF as a form, and then fill the form fields to personalize.
PDF has a native forms data format (called FDF), which is described in ISO 32000 (as well as the older PDF specification documents provided by Adobe, as you can find in the Acrobat SDK, downloadable from the Adobe website).
FDF is a simple structured text file, which can easily be assembled using FileMaker (I have done that routinely for several catalog projects). The easiest way to get going is to open the form in Acrobat, fill in the fields, and then export the data as FDF. This gives you the pattern to "fill in the blanks".
So, you create the FDF files using Filemaker. With them you can fill the blank form and feed the saved document to the eMail system.
Which tool to use to fill the blank form depends on the volume you have to process. Acrobat is not very powerful (and you may end up in a bit of a legal gray zone, because Acrobat is not set up for being used as a service). There are applications which are made specifically for filling out forms on a server (such as FDFMerge by Appligent), or there are also several libraries which have the tools to fill out forms (iText or pdflib come to my mind). These applications also allow you to flatten the PDF, which means that there are no longer form fields, but their contents becomes part of the base.
The resulting file can now be either made to an eMail attachment, or you make it available on a server and send an eMail with the link to the file (which method you will use may depend on security and privacy regulations).

Server backend: how to generate file paths for uploaded files?

I am trying to create a site where users can upload images, videos and other types of files.
I did some research and people seem to suggest that saving the files as BLOB in database is a Bad idea; instead, save the file paths in database.
My questions are, if I save the file paths in a database:
1. How do I generate the file names?
I thought about computing the MD5 value of the file name, but what if two files have the same name? Adding the username and time-stamp etc. to file name? Does it even make sense?
2. What is the best directory structure?
If a user uploads images at 12/17/2013, 12/18/2018, can I just put it in user_ABC/images/, then create time-stamped sub-directories 20131217, 20131218 etc. ? What is the best structure for all these stuff?
3. How do all these come together?
It seems like maintaining this system is such a pain, because the file system manipulation scripts are tightly coupled with the database operations(may also need the worry about database transactions? Say in one transaction I updated the database but failed to modify the file system so I need to roll back my database?).
And I think this system doesn't scale (what if my machine runs out of hard disk so I need to upload the files to a second machine? What if my contents are on a cluster?)
I think my real question is:
4. Is there any existing framework/design pattern/db that handles this problem?
What is the standard way of handling this kind of problems?
Thanks in advance for your answers.
I've actually asked this same question when I was designing a social website for food chefs. I decided to store the url of the image in a MySQL database along with recipe. If you plan on storing multiple images for one recipe, in my example, maybe having a comma separated value would work. When the recipe loaded on the page, I would fetch the image associated with that recipe onto the screen.
Since it was a hackathon and wasn't meant for production purposes, I didn't encode the file name into something unique. However, if I were developing for productional purposes, I would append the time-stamp to the media file name when storing it into the server and database/backend.
I believe what I've proposed is the best data structure of handling this scenario. Storing the image onto the server is not only faster, but it should also take less space. I have found that when converting a standard jpg file of reasonable resolution to base64 encoding, the encoded text file representation took 30% more space. There is also the time of encoding the file and decoding the file for storage and resolving when using some BLOB type of data format instead of straight up storing the file on the server.
Using some sort of backend server scripting like PHP, you'll be able to do some pretty neat stuff with the information you have available. Fetch the result from the database, and load it in from the page using HTML.
As far as I know, there isn't a standard way of fetching media from a database yet. Perhaps there will be one day.
There is not standard way to do that, it is different to the different application. The idea is you need generate a different Path+FileName for every upload, here is a way:
HashId = sha1(microsecond + random(1,1000000));
Path = /[user_id]/[HashId{0,2}]/[HashId{-2}];
FileName = HashId

File Management for Large Quantity of Files

Before I begin, I would like to express my appreciation for all of the insight I've gained on stackoverflow and everyone who contributes. I have a general question about managing large numbers of files. I'm trying to determine my options, if any. Here it goes.
Currently, I have a large number of files and I'm on Windows 7. What I've been doing is categorizing the files by copying them into folders based on what needs to be processed together. So, I have one set that contains the files by date (for long term storage) and another that contains the copies by category (for processing and calculations). Of course this doubles my data each time. Now I'm having to create more than one set of categories; 3 copies to be exact. This is quadrupling my data.
For the processing side of things, the data ends up in excel. Originally, all the data was brough into excel. Then all organization and filtering was performed in excel. This was time consuming and not easily maintainable over the long term. Later the work load was shifted to the file system itself, which lightened the work in excel.
The long and short of it is that this is an extremely inefficient use of disk space. What would be a better way of handling this?
Things that have come to mind:
Overlapping Folders
Is there a way to create a folder that only holds the addresses of a file, rather than copying the file. This way I could have two folders reference the same file.
To my understanding, a folder is a file listing the memory addresses of the files inside of it, but on Windows a file can only be contained in one folder.
Microsoft SQL Server
Not sure what could be done here.
Symbolic Links
I'm not an administrator, so I cannot execute the mklink command.
Also, I'm uncertain about any performance issues with this.
A Junction
Apparently not allowed for individual files, only folders in windows.
Search folders (*.search-ms)
Maybe I'm missing something, but to my knowledge there is no way to specify individual files to be listed.
Hashing the files
Creating hash tags for all the files, would allow for the files to be stored once. But then I have no idea how I would handle the hash tags.
XML
Maybe I could use xml files to attach meta data to the files and somehow search using them.
Database File System
I recently came across this concept in my search. Not sure how it would apply Windows.
I have found a partial solution. First, I discovered that the laptop I'm using is actually logged in as Administrator. As an alternative to options 3 and 4, I have decided to use hard-links, which are part of the NTFS file system. However, due to the large number of files, this is unmanageable using the following command from an elevated command prompt:
mklink /h <source\file> <target\file>
Luckily, Hermann Schinagl has created the Link Shell Extension application for Windows Explorer and a very insightful reading of how Junctions, Symbolic Links, and Hard Links work. The only reason that this is currently a partial solution, is due to a separate problem with Windows Explorer, which I intend to post as a separate question. Thank you Hermann.

Resources