Signable, streamable, "readable" archive format? - file

Is there any archive format that offers the following:
be digitally sign-able with a digital certificate from a trusted source like Verisign - for preventing changes to the file (I am not referring to read only, but in case the file was changed it should no longer be signed telling the user this is not the original file)
be stream-able - be able to be opened even if not all of the content has been transferred (also not strictly linearly)
be "readable" - be able to read the data without extracting to a temporary folder (AFAIK if you open a file in a zip archive it is extracted first, and this stays true even for zip based formats like OOXML. This is not what I want)
be portable - support on at least Windows, Linux and Mac OS X is a must, or at least future support
be free of patents - Be open source - also preferably a license that allows commercial use(as far as i know GPL a share-alike license so it doesn't allow commercial use, BSD on the other hand allows it)
Note: Though it may come in handy eventually I can not think right now of a scenario that would require both point 1 and point 2 simultaneously. Or lets leave it a be able to check the signature only when the whole file was downloaded.
I am not interested in:
being able to be compressed
being supported on legacy systems
Does any existing archive format fit this description (tar evolutions like DAR and pax come to mind) ?
If there is, are there programing libraries available for the above mentioned OSs?
If not, would it be hard to create such a thing?
Usage scenario:
I want to use this to create a new media container.
Current media containers contain the audio, video and subtitle streams directly.
Matroska, currently the most advanced container, has supplementary features like attachments and menus.
The menu functionality however is not implemented and very limited.
What I want to create is one level higher.
I want to create a file similar in a way to OOXML.
Also all of the menuing should be done in web technologies like HTML5 (as it is now the tag allows for any kind of codec to be used) and CSS.
Also just like you have holograms on dvds to prove the authenticity I want to create a sign-able file
Research notes:
Before asking this question I stumbled uppon this:
Whats the best way digitally sign a zip file for download using .Net
While detached signing would be feasable for the individual files contained in this archive it is not an ellegant solution for the archive file. Not end user friendly.End users should be able to doubleclick the file to open it in a media player like VLC, and see a message that the file is legit (just like you see in a browser if the page is transmitted with SSL through HTTPS or not)
EDIT: clarified point 5
EDIT 2: added a note to clarify point 1 and 2
EDIT 3: added usage scenario
EDIT 4: added research notes section
P.S.: This is my first question on StackOverflow

I doubt that you find such format out of the box. I understand how such solution can be built with help of our SolFS, but SolFS doesn't have built-in signing (you can add signing easily).

Related

Is there any way to get exactly which part of file has been changed on Linux

I want to build file sync software. Is there any way to get exact file changes (or at least changes size) with kernel systems like I-notify or others?
EDIT:
I'm interested in the following scenario with I-notify:
When getting IN_MODIFY event on a file I want retrieve in some way changed lines of the file (some kind of a file diff format). Are there any linux kernel tools to achieve this?
Even if there were such a kernel feature, it would not work in practice. You see, most editors modify files by creating a copy, then renaming it over the original one. This way the user is assured of getting either the old contents or the new contents, never a mix between the two.
The only real option is to take snapshots of the file (at e.g. when file is closed when it was open for writing, or when the file is replaced with a new one), and compare the snapshots, to find which part was changed.
Comparing two versions of a file to see which part of it was changed is itself a difficult question, as it definitely depends on the file format. For source code, unified diffs work well, but for other types (including plain text files that are not line-oriented), it's not that simple.
Could you please refine your question? The inotify API on Linux does monitor such changes, and similar changes such as if a file was open, if a file inside a directory (or the directory itself) was moved and file deletions etc.
For more, see here:
(http://man7.org/linux/man-pages/man7/inotify.7.html)
EDIT:
I believe I misread the question the first time around, if I did, yes such programs exist and the inotify API is the primary one existing within the Linux kernels. See the above link for a comprehensive guide on the different functions it provides.

Trying to find information on how to build a simple file version controll system

Im want to build a file system for non-tecks( dont care about old versions of the file so no merging or svn/git). The thougt is that a user should be able to download a file, in the same instance the file should be locked for other users. When the first user is done editing the, the file should then automaticaly upload to the server. When he closes the file, the lock should den be opend.
Is this even possible? Im thingking a sort of browser plugin, but I cant find anywone that has done the same thing. (besides microsoft, but who want to go down that road)
That would be: Sharepoint, Alfresco, (almost every WIKI), ...
Actually that is a basic feature of most document management systems. Even SVN has that already and IIRC you can set that up with mod_dav_svn without a line of code (considering configuration is not code).
Also the interesting question is, IMHO, not TheHappyCase where the described unit of work goes well but what about this*:
I Checkout 50 random documents you need
(get some popcorn and wait for your stresslevel to go up)
?????
I get bored and forget about it (everything still being checked out)
*: Points (1) and (2) may change order

How to use Sphinx3 in an application

I used Sphinx4 for some time which really fits my needs. I load a recognizer, pass the audio data to it and use the recognized String in my application.
Right now I'm working on a C application (C++ is unfortunately not an option) where I need something similar and thought that I could use Sphinx3 which is written in C.
The problem is that I don't really know how it is used inside an application and there is no "Hello World"-example as Sphinx4 provides it.
I already compiled and installed sphinxbase and sphinx3 and now I can include the sphinx header files in my application.
Now to my questions:
Is there a "simple" and well documented example application that uses sphinx3 from a C environment?
How can I load up the sphinx3 engine and call a recognizer with my binary audio data?
OR: Do I need to start an application like "sphinx3_decode" and call it from my own application? If so, is there an example application for that?
Thank you in advance!
Best regards,
Robert
It's not recommended to use Sphinx3. From the website:
Sphinx-3 is CMU’s large vocabulary speech recognition system. It’s
older C based decoder that we continue to maintain. It’s planned to
make it obsolete in the future, it’s still most accurate decoder for
large vocabulary tasks. We are using it as a baseline to check the
recognizer accuracy. This decoder is only intended for researchers who
want to evaluate bleeding edge methods in ASR like tree search method.
If you need to use a decoder you should use pocketsphinx. You can find the tutorial and the API documentation on the website
http://cmusphinx.sourceforge.net/wiki/tutorialpocketsphinx
http://cmusphinx.sourceforge.net/api/pocketsphinx/pocketsphinx_8h.html
I Recently worked on an Intregated Project on Punjabi Language.
Here are some steps that we used...
First we recorded the punjabi audio data in a vaccumed room in 16000 hz sample rate.
Then we took the recorded data and segmented it using Praat Software into small wav and raw files of 2 to 30 sec and saved them in a folder named train.
Then we took a system having Linux ie. Ubuntu and installed the required plug in like autoconfig, automake etc and untarred Sphinx 3 along with 4 packages that are cmuclmtk, pocketsphinx, sphinxbase, sphinxtrain.
Then according to the small wav files we made many files like transcription, dic, phone, filler, file id, ccs etc.
Then we opened the terminal and typed –"sphinx_fe” to check the whether the sphinx is functional or not.
Then we created an folder named “man” and then in terminal wrote its path.
Then we run the command- “sphinxtrain –t man setup”. By running this command an folder named “etc” will be formed in “man” folder containing files “feat_paramas” & ”config”.
Changes were made in the in the config file according to our data.
Then we moved all the files that we created before ie. transcription, dic in the etc folder in that is located in man folder.
Then we placed ‘lang1.sh” script in etc folder and remaining 4 scripts in man folder.
Then we opened the path for etc folder in terminal and run command- “lang1.sh”
Then we run series of commands in terminal – “mfcgen2.sh” then “verify3.sh” then “hmm4.sh” and at last “end-test.sh” to get the final result.
Rest if you have worked on Sphinx 4 then you may know about the files that are mentioned above in the steps. I hope this helps you.

Legacy dos system with flat file data store (ISAM-Files)

I have a legacy system which used to run on dos. It is an ERP system for retail stores (fashion). It think it stores it's data in flat files.
I have files ending with *.KEY and other files ending with *.D00 (counting up).
I think the key files hold the key informationen and the D-Files hold some data ... there are alot D77 files...
As far as my investigation concerns this is not dfb or foxpro it could proprietary...
The company who wrote it is out of business of course so no chance for support or any hints.
When I open these files in vim or other editors I get some binary signs and some text... I tryed it in hex mode but still nothing to use...
Is there any chance I can dump out the data... in csv, ascii, xml?
I am pretty sure that this is not a standard format. Can someone point me in a direction how those data were stored back in the days and how could I make them read-able...
Any tools, tips or tricks?
// EDIT
After some time I made some progress and can now post some details which I did not now of back then and made a good answer impossible.
I asume that the dos system was written in visual cobol and that the files could be b-tree files stored in ISAM format. I assume the closet thing I could provide is, that there is a possibility that the format is C-ISAM.
How can I access / view or modify these files... C#, JAVA, ruby.... everything new age language would be cool... I am not sure if I can handle cobol... It would be great to have a converter or a viewer tool preferable opensource...
Hope this clearifies more my question =)
OpenCOBOL has a very active user group. The language itself is free and runs on Linux and Windows and perhaps MacOSX. Have a chat to the user group there; they may be able to help.
Peachtree Accounting Software used those file extensions back in 1992.

Free server side anti virus / security / trojan protection for file uploads?

I am allowing users to upload photos like photo albums, and also attach files (documents for now) as mail attachments. So i assume I need some anti virus/security tool in place to scan the files first in case people upload infected stuff. So two questions:
1) Are there any 'free' or open source tools for this I can use or integrate into my environment: codeignitor php?
2) How to secure the upload area from rest of the system? Say the virus scanner fails to catch a virus and it is uploaded, how to prevent it from infecting other files? Like can the upload area be sandboxed in or something always and use that filepath for users to access the content so it does not spread to other parts of the system?
There is clamav for a free virus scanner. Install it and you could do something like:
function virus_detected($filename)
{
$clamscan = "/usr/local/bin/clamscan";
$result = exec("$clamscan -i --no-summary $filename");
return strlen($result)?true:false;
}
As for security, make sure the temporary files are uploaded to a directory outside of your web root. You should then verify the file type, rename the file to something other than it's original file name and append the appropriate extension (gif,jpg,bmp,png). I believe this should keep you fairly safe aside from exploits in php itself.
For more information about verifying file types in php check out:
http://www.php.net/manual/en/function.finfo-file.php
I know this topic hasn't been active for three years now, but, in case anyone else in the future, similarly, is looking for a PHP-based anti-virus solution, for those without an anti-virus daemon, program or utility installed on their host machine and without the ability to install an anti-virus daemon, program or utility, phpMussel, a PHP script that I've written based on ClamAV that fits the bill for what Rohit (the the original poster) was looking for (a PHP-based anti-virus to protect their CMS against malicious file uploads), may possibly be a viable solution. It certainly isn't perfect and I can't guarantee that it'll catch everything, but by far, it's certainly better than using nothing at all.
Ideally, as per already suggested above by Matt, making a call to shell to have ClamScan scan the file uploads is definitely an ideal solution, and if this is something that a hostmaster, webmaster or anyone in Rohit's situation is able to do, I'd second that suggestion wholly. What I've written, because it is a PHP script, has limitations inherent to anything that relies wholly on PHP in order to function, but, in instances where the aforementioned suggestion and/or similar suggestions aren't a possibility (such as if the host machine doesn't have an anti-virus installed and shell access is disabled; common with cheaper shared hosting solutions), that's where what I'm suggesting here could potentially step in - Something that only requires PHP to be installed (with PCRE extension included, which is standard with PHP nowadays anyhow), and nothing more.
Also remember, as Matt has already suggested, to always upload outside of your root directory, to ensure that uploaded files can't be exploited by attackers (such as in the event of an attacker attempting to compromise your system by uploading backdoors or trojans) - Viruses are not the only threat you need to worry about, and the vast majority of anti-virus solutions nowadays do not solely focus on viruses. Matt is also entirely correct in pointing out that no anti-virus solution is perfect, and for that reason, anyone allowing file uploads to their website or server needs to remain vigilant - An anti-virus solution is a must-have for anyone in that situation, but no holy grail of internet security that'll cover every possible threat exists. Also, renaming files isn't only about ensuring that they can't execute (as may be somewhat inferred by the original poster's reply comment regarding EXEs) - The risk of threats such as directory traversal attacks can be reduced by renaming files as well as the risk associated with an attacker attempting to override an already existing file on a targeted system as a means to hide their dirty-work.
Regarding the threat of files that may be malicious being missed by an anti-virus solution and then potentially infecting the system where they are being uploaded to; What a hostmaster or webmaster could potentially do in this situation is employ some sort of quick and simple encoding process that'd render the file non-executable by the system itself, but which can be easily and readily reversed by the PHP script responsible for calling that file on request, such as by way of using base64_encode(), bin2hex(), or even by just rotating a few characters and adding a salt to displace the file's magic number or something similar.

Resources