novice to aes. in reading http://en.wikipedia.org/wiki/AES_implementations, I am a bit surprised. I should need just one function
char16 *aes128(char16 key, char16 *secrets, int len);
where char16 is an 8*16=128bit character type. and, presumably, ignoring memory leaks,
assert( bcmp( anystring, aes128(anykey, aes128(anykey, anystring, len), len )==0 );
I am looking over the description of the algorithm on wikipedia, and although I can see myself making enough coding mistakes to take me a few days to debug my own implementation, it does not seem too complex. maybe 100 lines? I did see versions in C#, such as Using AES encryption in C#. that seem themselves almost as long as the algorithm itself. earlier recommendations on stackoverflow mostly recommend the use of individual functions inside larger libraries, but it would be nice to have a go-to function for this task that one could compile into one's code.
so, is AES implementation too complex to be for the faint of heart? or is it reasonably short and simple?
how many lines does a C implementation take? is there a self-contained aes128() C function already in free form somewhere for the taking?
another question: is each block independently encoded? presumably, it would strengthen the encryption if the first block would create a salt that the second block would then use. otoh, this would mean that disk corruption of one block would make every subsequent block undecryptable.
/iaw
You're not seeing a single function like you expect because there are so many options. For example, the block encoding mechanism you described (CBC) is just one option or mode in AES encryption. See here for more information: http://www.heliontech.com/aes_modes_basic.htm
The general rule of thumb in any language is: Don't reinvent something that's already been done and done well. This is especially true in anything related to cryptography.
well using just the AES function is basically insecure as any block X will always be encoded to block Y with key K which is too much information to give an attacker... (according to cryptographers)
so you use some method to change the block cipher at each block. you can use a nonce or Cipher Block Chaining or some other method. but there is a pretty good example on wikipedia (the penguin picture): http://en.wikipedia.org/wiki/Electronic_code_book#Electronic_codebook_.28ECB.29
so in short you can implement AES in one function that is secure (as a block cipher), but it isn't secure if you have data that is longer than 16 bytes.
also AES is fairly complex because of all the round keys... I wouldn't really want to implement it, especially with all of the many good implementations around, but I guess it wouldn't be so bad if you had a good reason to do it.
so in short, to construct a secure stream cipher from a block cipher you need to adopt some strategy to change the effective key along the stream.
ok, so I found a reasonable standalone implementation:
http://www.literatecode.com/aes256
About 400 lines. I will probably use this one.
hope it helps others, too.
Related
For a certain time now, I'm looking to build a logging framework in C (not C++!), but for small microcontrollers or devices with a small footprint of some sort. For this, I've had the idea of hashing the strings that are being logged to a certain value and just saving the hashed value with the timestamp instead of the complete ASCII string. The hash can then be correlated with a 'database' file that would be generated from an external process that parses the strings out of the C source files and saves the logged strings along with the hash value.
After doing a little bit of research, this idea is not new, but I do not find an implementation of this idea in C. In other languages, this idea has been worked out, but that is not the goal of my exercise. An example may be this talk where the same concept has been worked out in C++: youtube.com/watch?v=Dt0vx-7e_B0
Some of the requirements that I've set myself for this library are the following:
as portable C code as possible
COMPILE TIME optimization/hashing for the string hash conversion, it should be equivalent to just printf("%d\n", hashed_value) for a single log statement. (Assuming no parameters/arguments for this particular logging statement).
arguments can be passed to the logging statement similar to the printf function.
user can define their own output function (being console, file descriptor, sending the data directly over an UART connection,...)
fast to run!! fast to compile is nice to have, but it should not be terribly slow.
very easy to use, no very complicated API to use the library.
But to achieve this in C, what is a good approach? I've tried several things now, but do not seem to have found a good method of achieving this.
An overview of things I've tried so far, along with the drawbacks are:
Full pre-processor string hashing: did get it working, but the compile time is terribly slow. Also, this code does not feel to be very portable over multiple C compilers.
Semi pre-processor string hashing: The idea was to generate a hash for each string and make an external header file with the defines in of each string with their hash value. The problem here is that I cannot figure out a way of converting the string to the correct define preprocessor value.
Letting go of the default logging macro with a string pointer: Instead of working with the most used method of LOG_DEBUG("Some logging statement"), converting it with an external parser to /*LOG_DEBUG("Some logging statement") */ LOG_RAW(45). This solves the problem of hashing the string since the hash will be replaced by the external parser with the correct hash, but is not the cleanest to read since the original statement will be a comment.
Also expanding this idea to take care of arguments proved to be tricky. How to take care of multiple types of variables as efficiently as possible?
I've tried some other methods but all without success. Especially when I want to add arguments to log the value of a variable, for example, it gets very complicated, and I do not get the required result...
Question
I am wondering why do we connect to sockets by using functions like hton to take care of endianness when we could have sent the ip in plain char array.
Say we want to connect to 184.54.12.169
There is an explanation to this but I cannot figure out why we use integers instead of char, and so involving ourself in endianness hell.
I think char out_ip[] = "184.54.12.169" could have theoretically made it.
Please explain me the subtleties i don't get here.
The basic networking APIs are low level functions. These are very thin wrappers around kernel system calls. Removing these low level functions, forcing everything to use strings, would be rather bad for a low-level API like that, especially considering how tedious string handling is in C. As a concrete hurdle, even IP strings would not be fixed length, so handling them is a lot more complex than just plain 32 bit integers. And moving string handling to kernel is really quite against what kernel is supposed to be, handling arbitrary user strings is really user space problem.
So, you want to create higher-level functions which would accept strings and do the conversion in the library. But, adding such higher level "convenience" functions all over the place in the core libraries would bloat them, because certainly passing IP numbers is not the only place for such convenience. These functions would need to be maintained forever and included everywhere, after they became part of standard (official like POSIX, or de-facto) libraries.
So, removing the low-level functions is not really an option, and adding more functions for higher-level API in the same library is not a good option either.
So solution is to use another library to provide higher level networking API, which could for example handle address strings directly. Not sure what's out ther for C, but it's almost a given for other languages, which also have "real" strings built in so using them is not a hassle.
Because that's how an IP is transmitted in a packet. The "www.xxx.yyy.zzz" string form is really just a human readable form of a 4 byte integer that allows us to see the hierarchical nature a little easier. Sending a whole string would take up a lot more space as well.
Say number 127536 that requires 7 bytes not four. In addition you need to parse it.
I.e. more efficient and do not have to deal with invalid values.
The wikipedia page on TwoFish points at this reference implementation in C (and code) which is fine, but it lacks a main and my first few passes at implementing one didn't correctly process any of the "known vector" test cases I attempted. I suspect I'm looking at a problem of not using the API correctly but I have no idea where to start looking for the error. Rather than beat my head on that one, I'd rather start with a codebase that:
Runs out of the box
Has tests
Is self contained
Is written for clarity
I also have a strong preference for C or C like C++ code.
Note: I'm more interested in code readability than anything else at this point. Small, simple code that can encrypt and decrypt a single block and a main function that hard codes a call or three would be ideal. Most anything beyond that (like any user interface) will just be noise for my use case.
Also, anything that has a licence more restrictive than Boost will be useful to me only as an source of know good values and states to compare with.
I took an implementation by Neils Ferguson, one of the designers of Twofish, and wrapped it (very lightly, making very few changes) in C++, and it works well. I must strongly underline that I have done almost no work here, and don't claim to understand how Twofish works (and that's after reading up on it - but it's too hard for me to follow).
The constructor does comprehensive testing, and aborts if the tests fail, so once you have a fully constructed object you know it's going to work.
I've put the sources here: https://www.cartotype.com/assets/downloads/twofish/.
There are various configurable things in the files; one you might want to change is the abort function, Twofish_fatal, which in my version attempts to write to address 0 to force an exit, but that doesn't work on some platforms.
Like the code mentioned above, all this does is encode single 16-byte blocks (ECB = Electronic Code Book mode). But it's very easy to implement a better mode on top of it, like cipher bock chaining, in which each block of plain text is XORed with the previous block of cipher text before encrypting (use a random 'initialisation vector' of 16 bytes for the first block, and transmit that along with the encrypted data).
Another implementation can be found in the source code to Bruce Schneier's open-source password database program, PasswordSafe: the relevant sources are here: http://passwordsafe.git.sourceforge.net/git/gitweb.cgi?p=passwordsafe/pwsafe.git;a=tree;f=pwsafe/pwsafe/src/core;hb=HEAD. I haven't tried it so I can't comment on how easy it is to integrate.
The cryptcat package on Ubuntu and Debian provide a nc(1)-like functionality with twofish built in.
The twofish support is provided in twofish2.cc and twofish2.h in the source package. farm9crypt.cc provides a layer between C-style read() and write() functionality and the twofish algorithm -- it's in a style that I'd call C-like C++.
if you had taken just a minute to read
the reference implementation provided by libObfuscate
you would have found a cut'n'paste example of using TwoFish.
// Encrypt : outBuf [16] = Twofish ECB ( inBuf [16] )
TWOFISH_STATIC_DATA twofish;
BYTE passw [32];
BYTE inBuf [16] , outBuf [16];
memset( &twofish , 0 , sizeof( TWOFISH_STATIC_DATA ) );
Twofish_set_key( &twofish.key , ( DWORD * ) passw , 256 );
Twofish_encrypt( &twofish.key , ( DWORD * ) inBuf , ( DWORD * ) outBuf );
No serious REFERENCE IMPLEMENTATION would be
else but a single-block ECB implementation.
If you wish to encrypt more data you need to choose
the cipher-block chaining mode (CBC, ecc...) and apply it on top of ECB.
I eventually found this Python implementation derived from the C implamentation I listed above. The root cause of my issues turned out to be that the words of the key were in the wrong order.
I am currently working on a command line interface for a particle simulator. Its parser takes reads input in the following format:
[command] [argument]* (-[flag] [flag argument])
Currently, the command is sent through a conditional block, compared to various known commands and its corresponding data packet is sent to the matching function. This, however, seems clunky, inefficient and inelegant.
I am thinking about using a hashmap instead, with a string representation of a command as the key and a function pointer as the value. The function referenced would then be sent a data packet containing arguments, flags, etc.
Is a hash map overkill in this situation? Does the extra infrastructure required to implement one outweigh the potential benefits? I am aiming for speed, elegance, function, and, since this is an open-source project, extensibility.
Thanks for the help.
You might want to consider the Ternary Search Tree. It has good performnce, efficient use of storage; and you don't need a hash function or a collision strategy.
The linked Bentley/Sedgwick article is a very thorough-yet-readable explanation of the accompanying C source.
I've been using a TST for name-lookup in the past 3 versions of my postscript interpreter. The only changes that have been needed have been due to changes in memory management. Here's a version I modified (lightly) to use explicit pointers. I use yet another version in my postscript interpreter, any of the xpost2*.zip versions, in the file core.c, which uses byte-offsets for pointers (have to be added to the user-memory byte-pointer to yield a real pointer).
Speed gained will probably be minimal, but you could hash the command to convert it to a number and then use a switch statement. Faster than a hash map.
I am trying to do something in C with the MD5 (and latter trying to do something with the SHA1 algorithm). My main problem is that I never really did anything complex in C, just simple stuff (nothing like pointers to pointers or structs).
I got the md5 algorithm here.
I included the files md5.c and md5.h in my C project (using codeblocks) but the only problem is that I don't really understand how to use it. I have read and re-read the code and I don't understand how I use those functions to turn 'example' into a MD5 hash.
I haven't done C programming in a while (mostly php) so I am a bit lost here.
Basically what I am asking is for some examples of usage. They are provided via the md5main.c file but I don't understand them.
Am I aiming high here? Should I stop all this and start reading the C book again or can anyone give me some pointers and see if I can figure this out.
Thanks.
While I agree with Bill, you should go back to the C book if you want to really understand what you're doing. But, in an effort to help, I've modified and commented some of the code from md5main.c...
const char* testData = "12345"; // this is the data you want to hash
md5_state_t state; // this is a state object used by the MD5 lib to do "stuff"
// just treat it as a black box
md5_byte_t digest[16]; // this is where the MD5 hash will go
// initialize the state structure
md5_init(&state);
// add data to the hasher
md5_append(&state, (const md5_byte_t *)testData, strlen(testData));
// now compute the hash
md5_finish(&state, digest);
// digest will now contain a MD5 hash of the testData input
Hope this helps!
You should stop all this and start reading the C book again.
My experience is that when I am trying to learn a new programming language, it's not practical to try implementing a complex project at the same time. You should do simple exercises in C until you are comfortable with the language, and then tackle something like implementing MD5 or integrating an existing implementation.
By the way, reading code is a skill different from writing code. There are differences between these two skills, but both require that you understand the language well.
I think you picked about the worst thing to look at (by no fault of your own). Encryption and hash type algorithms are going to make the strangest use of the language possible to do the type of math they need to do quickly. They are almost guaranteed to be obfuscated and difficult to understand. Plus, you will need to get bogged down in math in order to really understand them.
If you just want a hashing algorithm, get a well-known implementation and use it as a black box. Don't try and implement it yourself, you will almost certainly introduce some cryptographic weakness into the implementation.
Edit: To be fully responsive if you want great books (or resources) on encryption, look to Bruce Schneier. Applied Cryptography is a classic.