Signature files for document retrieval - file

I was wondering if you know somewhere where I can find information on how to build a signature file for docuement retrieval.
Do you know if there is some code out there that I can use or look at?
I have to create a signature file in C++ under linux platform.
UPDATE: Sorry, I appreciatte the help but I was refering to signature files not as a way to validate documents but as a way of indexing documents.
http://en.wikipedia.org/wiki/Signature_files
Any help will be greatly appreciated.
Thanks,

Firstly, lets clarify some terminology.
A Digital Signature is intended to be equivilent to a handwritten signature (see http://en.wikipedia.org/wiki/Digital_signature for a better description and overview).
When a digital signature is applied to a document you get a higher level of assurance of the authenticity of the document (you have a better idea if the document was forged or not).
The answers from Adam and Robert both refer to methods for verifying document integrity (that the document is unchanged). While a digital signature also provides this, a checksum (hash) does not provide authenticity.
So it's important that we establish the needs of your "Signature file". I will assume that you are talking about Digital Signatures, rather than checksums as the other answers address checksums.
You will want to compose a PKCS#7 detached signature (jargon - a standard format signature that does not contain the data, so it can be stored seperately). To acheive this I recommend you use a standard library such as OpenSSL (which is portable).
For more information on PKCS#7 see http://www.rsa.com/rsalabs/node.asp?id=2129
For more information on OpenSSL see http://www.openssl.org/

You might look at Semantic Hacker or Yahoo Term Extraction.

md5sum might be what you are looking for. Source code for generating md5 signatures is available if you Google around.
From Wikipedia:
Because almost any change to a file
will cause its MD5 hash to also
change, the MD5 hash is commonly used
to verify the integrity of files
(i.e., to verify that a file has not
changed as a result of file transfer,
disk error, meddling, etc.). The
md5sum program is installed by default
in most Unix, Linux, and Unix-like
operating systems or compatibility
layers. BSD variants (including Mac OS
X) have a similar utility called md5.
Versions for Microsoft Windows do
exist.

Similarly to Adam's suggestion, if you're working on a very large amount of documents, it might be a good idea to check out SHA1 and sha1sum. Less collisions, and a bit more advanced encryption.

Related

How to use SHA-256 crypt with musl?

I am using musl: https://www.musl-libc.org
If i browse the repository i can see that there are a bunch of crypt related source files (including crypt_sha256.c).
The problem is that there are no header files for them. How am i supposed to use them?
These are the symbols from the lib on my system:
I could also not find any code samples from google how to use the SHA-256 features of musl.
Thanks!
You are correct, that the three functions
sha256_init()
sha256_update()
sha256_sum()
are indeed generic SHA256 hash functions.
Unfortunately, these symbols are not exported publically, but only used internally to generate a salted password hash of the form $5$0rXgD0/KkyyT0$5PPj3bke0vPxsMDlSXzBz2D3TFNahLrXSs7.elU3u2/
For that reason, no public header files for use of these functions are provided. Only the higher-level function crypt_sha256() is exported.
Why they decided to not export the generic interface can only be speculated about, at least I could not find an explanation for that.
That's not generic SHA-256 message digest algorithm but a specific algorithm used by the crypt(3) password hashing function. See the documentation for that function on how it is used.
As Ctx wrote and commented about not knowing the reason for, these functions are not exported. I can fill in that reason.
Generally, musl does not unilaterally invent new interfaces that will almost surely end up differing in subtle ways from similar interfaces other libc providers end up inventing. We are in the process of launching a cross-libc collaboration group less formal than the POSIX standardization process that might make it reasonable to offer some interfaces like this in the future, and that might eventually funnel some of the consensus that emerges upstream to POSIX.
Short of that, anyone wanting to use these implementations is welcome to copy the code and use it under the terms of the license. They're small, self-contained, and permissively-licensed, and by using them this way you don't lock in the signature for any external interface boundary. As usual with cryptographic code, though, you should be careful of any risk of side channels leaking secrets. As used in musl I don't believe that's an issue, but I haven't analyzed other possible uses, and it might be safer to pick an implementation designed for use in arbitrary cryptographic settings.

Small portable digital signing and verification library

I'm looking for a library that allows me to authenticate data sent to embedded modules. Due to the hardware constraints, it needs to be of small footprint (both code and memory wise) and yet have security comparable to RSA-1024.
The requirements are as follows
Verification on embedded modules (custom CPUs, with only a C89 compiler available)
Signing and verification in Windows (C/C++ code)
Signing in Java (some data needs to be generated via a webpage, so Java would be a big perk)
I would very much like to not have to implement a PKCS #1 v1.5/PSS-like system myself, but I haven't been able to find any good libraries that match the above requirements. Open source would be nice, but commercial solutions are of equal interest. Note that I need access to the C-code, since it has to be recompiled for the custom CPUs.
NaCl looks promising, but it seems to be in development still.
I've had a look at OpenSSL, but it does a lot more than digital signatures and stripping out just the signature verification code was non-trivial.
Am I looking at it the wrong way?
I tried implementing SHA+RSA first, but I wasn't sure if the padding step was correct (which means that it probably wasn't secure), so I decided to post here instead for help.
EDIT: Clarification, only the verification part have the tough constraints on it. Signature and key generation will run on normal PCs.
Take a look at mbed TLS (formerly known as PolarSSL):
mbed TLS (formerly known as PolarSSL) makes it trivially easy for developers to include cryptographic and SSL/TLS capabilities in their (embedded) products, facilitating this functionality with a minimal coding footprint.
How can implement such a kind of solution is related to CPU and memory architecture that we have available, therefore would have to tell me more about your system. The first way would be to develop this on the cloud. Another alternative would be SCL. Also, you can found some answers on Small RSA or DSA lib without dependencies

Embedded systems file encryption library

I've got a project and a part of it is incorporating encryption into a FAT file system.
The goal of the project is not the encryption, so I'm free to use open-source pre-done libraries.
Ideally what I'm after is a C library which uses RSA, that already has the methods for computing keys and encrypting/decrypting files.
You might want to check out NaCl (pronounced as "salt"), especially since this is for an embedded system.
It has CPU-specific tunings and doesn't require any dynamic memory allocation.
As for licensing, the page (linked above) says "All of the NaCl software is in the public domain".
Regarding library - check Cryptlib . It has dual license and includes quite a lot of functionality.
However, capability to encrypt files right depends on how you write the data and how you expect to do encryption.
Streaming encryption for streams with random access (i.e. when you need to encrypt-decrypt file data on the fly when it's written or read) is not a trivial task and requires certain knowledge of cryptography to employ correct encryption mode and do this right.
On the other hand if you have a file and want it encrypted, CryptLib has PKCS7/CMS implementation to do the job.
You might want to give blowfish a try. It's royalty free and there are several open source C implementations. It was created by Bruce Schneier. Here is an article about using it with embedded systems.

Public key implementation in C for Linux

I'm trying to use public key crypto to sign and later verify a file. The file is a simple plaintext file that contains user information for authoring purposes.
I tried different sites for a C implementation of a public key crypto algorithm but I haven't found anything. A lot of sites point to using certificates (x.509, etc) but that is way beyond what I need. I am just looking for a way to generate and public and private keys and use a relatively well known algorithm to sign and verify a file.
Any pointers to a pure C implementation out there? The focus is on code that I can reuse and not external libs. The main problem being that I don't want to have to link against a full lib and its dependencies in order to have a very basic public key system.
Thanks.
OpenSSL is a very good package. You can just use the crypto library portion, which provides basic RSA implementations. That might be in line with what you are looking for.
Cryptlib is another alternative that could work for you. It has some strange licensing issues though, so consider those depending on how you will be using it.
Crypto++ is a set of different crypto technologies, and includes RSA, so you might try that.
Finally, RSA is not terribly complex to implement, so you could even implement it yourself using GMP, which provides the necessary mathematical functions you would need.
You may want to look at the well-respected, debugged, and tested OpenSSL libraries. Although OpenSSL is primarily for SSL/TLS networking, it contains extremely good implementations of many cryptographic protocols, which are often used by themselves for general cryptography.
Hope this helps!
DJ Bernstein's curve25519 lets you create public/private key pairs. It does not have functions for signing, but you should be able to figure that part out with not too much hassle.
Update: In the mean time, there's also Ed25519 which already has the signature generation stuff figured out, without you having to jump through hoops. Same author, same availability of software (also e.g. "Donna" implementation and python binding), same ease of use, comparable speed.
The original implementation as well as the "Donna" implementation are both available under very liberal licenses.
You need to compile one file and call exactly one function to generate a key pair, and it's very fast. No obscure requirements for the public key. All one ever needs for some "cheap, fast, easy public key crypto".
I think that there was an answer[1] that fitted your question on :: Small RSA or DSA lib without dependencies
You may find LibTomCrypt useful. It's written in C, supports RSA and
DSA (along with a host of other algorithms), and is public domain
software. You can read about its features here: http://libtom.org/?page=features
[1] https://stackoverflow.com/a/1735526/68338 ( courtesy of https://stackoverflow.com/users/33837/emerick-rogul )
The answers on this question contain some interesting links to other libraries.
However, I remember that there exists some reference source code in C for RSA and private key cryptography. I will add a link as soon as I have found it ;-)
EDIT
I just found "this link" (http://www.hackchina.com/en/cont/93068 - open on your own risk) - not sure about the source and details of that code. But, however, in the past the link to the original RSA reference implementation was contained somewhere in OpenSSL source or its documentation. Which is based on cryptsoft.com's library. I am sure the source can still be found somewhere on www.rsa.com/rsalabs/ - but I could not find it, and I am running out of time for now. Good luck ;-)

What's the best way to serialize data in a language-independent binary format?

I'm looking into a mechanism for serialize data to be passed over a socket or shared-memory in a language-independent mechanism. I'm reluctant to use XML since this data is going to be very structured, and encoding/decoding speed is vital. Having a good C API that's liberally licensed is important, but ideally there should be support for a ton of other languages. I've looked at google's protocol buffers and ASN.1. Am I on the right track? Is there something better? Should I just implement my own packed structure and not look for some standard?
Given your requirements, I would go with Google Protocol Buffers. It sounds like it's ideally suited to your application.
You could consider XDR. It has an RFC. I've used it and never had any performance problems with it. It was used in ONC RPC and has an and comes with a tool called rpcgen. It is also easy to create a generator yourself when you just want to serialize data (which is what I ended up doing for portability reasons, took me half a day).
There is an open source C implementation, but it can already be in a system library, so you wouldn't need the sources.
ASN.1 always seemed a bit baroque to me, but depending on your actual needs might be more appropriate, since there are some limitations to XDR.
Just wanted to throw in ASN.1 into this mix. ASN.1 is a format standard, but there's libraries for most languages, and the C interface via asn1c is much cleaner than the C interface for protocol buffers.
JSON is really my favorite for this kind of stuff. I have no prior experience with binary stuff in it though. Please post your results if you are planning on using JSON!
Thrift is a binary format created by Facebook. Here's a comparison with google protocol buffers.
Check out Hessian
There is also Binary XML but it seems not stabilized yet. The article I link to gives a bunch of links which might be of interest.
Another option is SNAC/TLV which is used by AOL in it's Oscar/AIM protocol.
Also check out Muscle. While it does quite a bit, it serializes to a binary format.
Few Thing's you need to Consider
1. Storage
2. Encoding Style (1 byte 2 byte)
3. TLV standards
ASN.1 Parser is the good for binary represenations the best part is ASN.1 is a well-established technology that is widely used both within ITU-T and outside of it. The notation is supported by a number of software vendors.

Resources