Working reference implementation of TwoFish? - c

The wikipedia page on TwoFish points at this reference implementation in C (and code) which is fine, but it lacks a main and my first few passes at implementing one didn't correctly process any of the "known vector" test cases I attempted. I suspect I'm looking at a problem of not using the API correctly but I have no idea where to start looking for the error. Rather than beat my head on that one, I'd rather start with a codebase that:
Runs out of the box
Has tests
Is self contained
Is written for clarity
I also have a strong preference for C or C like C++ code.
Note: I'm more interested in code readability than anything else at this point. Small, simple code that can encrypt and decrypt a single block and a main function that hard codes a call or three would be ideal. Most anything beyond that (like any user interface) will just be noise for my use case.
Also, anything that has a licence more restrictive than Boost will be useful to me only as an source of know good values and states to compare with.

I took an implementation by Neils Ferguson, one of the designers of Twofish, and wrapped it (very lightly, making very few changes) in C++, and it works well. I must strongly underline that I have done almost no work here, and don't claim to understand how Twofish works (and that's after reading up on it - but it's too hard for me to follow).
The constructor does comprehensive testing, and aborts if the tests fail, so once you have a fully constructed object you know it's going to work.
I've put the sources here: https://www.cartotype.com/assets/downloads/twofish/.
There are various configurable things in the files; one you might want to change is the abort function, Twofish_fatal, which in my version attempts to write to address 0 to force an exit, but that doesn't work on some platforms.
Like the code mentioned above, all this does is encode single 16-byte blocks (ECB = Electronic Code Book mode). But it's very easy to implement a better mode on top of it, like cipher bock chaining, in which each block of plain text is XORed with the previous block of cipher text before encrypting (use a random 'initialisation vector' of 16 bytes for the first block, and transmit that along with the encrypted data).
Another implementation can be found in the source code to Bruce Schneier's open-source password database program, PasswordSafe: the relevant sources are here: http://passwordsafe.git.sourceforge.net/git/gitweb.cgi?p=passwordsafe/pwsafe.git;a=tree;f=pwsafe/pwsafe/src/core;hb=HEAD. I haven't tried it so I can't comment on how easy it is to integrate.

The cryptcat package on Ubuntu and Debian provide a nc(1)-like functionality with twofish built in.
The twofish support is provided in twofish2.cc and twofish2.h in the source package. farm9crypt.cc provides a layer between C-style read() and write() functionality and the twofish algorithm -- it's in a style that I'd call C-like C++.

if you had taken just a minute to read
the reference implementation provided by libObfuscate
you would have found a cut'n'paste example of using TwoFish.
// Encrypt : outBuf [16] = Twofish ECB ( inBuf [16] )
TWOFISH_STATIC_DATA twofish;
BYTE passw [32];
BYTE inBuf [16] , outBuf [16];
memset( &twofish , 0 , sizeof( TWOFISH_STATIC_DATA ) );
Twofish_set_key( &twofish.key , ( DWORD * ) passw , 256 );
Twofish_encrypt( &twofish.key , ( DWORD * ) inBuf , ( DWORD * ) outBuf );
No serious REFERENCE IMPLEMENTATION would be
else but a single-block ECB implementation.
If you wish to encrypt more data you need to choose
the cipher-block chaining mode (CBC, ecc...) and apply it on top of ECB.

I eventually found this Python implementation derived from the C implamentation I listed above. The root cause of my issues turned out to be that the words of the key were in the wrong order.

Related

C logging framework compile time optimization

For a certain time now, I'm looking to build a logging framework in C (not C++!), but for small microcontrollers or devices with a small footprint of some sort. For this, I've had the idea of hashing the strings that are being logged to a certain value and just saving the hashed value with the timestamp instead of the complete ASCII string. The hash can then be correlated with a 'database' file that would be generated from an external process that parses the strings out of the C source files and saves the logged strings along with the hash value.
After doing a little bit of research, this idea is not new, but I do not find an implementation of this idea in C. In other languages, this idea has been worked out, but that is not the goal of my exercise. An example may be this talk where the same concept has been worked out in C++: youtube.com/watch?v=Dt0vx-7e_B0
Some of the requirements that I've set myself for this library are the following:
as portable C code as possible
COMPILE TIME optimization/hashing for the string hash conversion, it should be equivalent to just printf("%d\n", hashed_value) for a single log statement. (Assuming no parameters/arguments for this particular logging statement).
arguments can be passed to the logging statement similar to the printf function.
user can define their own output function (being console, file descriptor, sending the data directly over an UART connection,...)
fast to run!! fast to compile is nice to have, but it should not be terribly slow.
very easy to use, no very complicated API to use the library.
But to achieve this in C, what is a good approach? I've tried several things now, but do not seem to have found a good method of achieving this.
An overview of things I've tried so far, along with the drawbacks are:
Full pre-processor string hashing: did get it working, but the compile time is terribly slow. Also, this code does not feel to be very portable over multiple C compilers.
Semi pre-processor string hashing: The idea was to generate a hash for each string and make an external header file with the defines in of each string with their hash value. The problem here is that I cannot figure out a way of converting the string to the correct define preprocessor value.
Letting go of the default logging macro with a string pointer: Instead of working with the most used method of LOG_DEBUG("Some logging statement"), converting it with an external parser to /*LOG_DEBUG("Some logging statement") */ LOG_RAW(45). This solves the problem of hashing the string since the hash will be replaced by the external parser with the correct hash, but is not the cleanest to read since the original statement will be a comment.
Also expanding this idea to take care of arguments proved to be tricky. How to take care of multiple types of variables as efficiently as possible?
I've tried some other methods but all without success. Especially when I want to add arguments to log the value of a variable, for example, it gets very complicated, and I do not get the required result...

What are the benefits to using BIO_printf() instead of printf()?

I have been reviewing example code for using OpenSSL and in every example I locate, the creator has chosen to use BIO_printf() to write things to stdout instead of printf().
I have taken their code, removed the openssl/bio.h header declaration, and changed all calls to BIO_printf() to regular printf() statements. The programs ran with identical results.
The problem I'm grasping with is why these coders use BIO_printf() when it takes a lot more to setup than just using printf(). You have to include another header (which will increase program size), you need to set the file pointer to the stream you want to write to. Then you can print your message to stdout. It seems a lot more complicated than using printf().
When I do a search on BIO_printf() it lists possible man pages for BIO_printf (3), but none of the pages actually contain any information!
I decided to do a benchmark test on both methods. I looped printf("Hey\n"); 1,000,000 times. Then I did it for BIO_printf(fp, "Hey\n");. I only timed the BIO_printf() statement and not the setting up of the file pointer (which would have increased the time). The difference came out to printf() being ~4.7x faster than using BIO_printf().
Why are they using it? What is the benefit? It's my understanding that in programming you either want code to be simple or efficient, and in the case of BIO_printf() it's neither.
In general, a BIO might not be writing to stdout.
You can have a BIO that writes to a file, or null, or a socket, or a network drive, or another BIO, etc.
By using the BIO_printf family, the code can easily be changed to have its output sent to a different location or another BIO which might do some further filtering and then pass the output onto wherever else.
As pointed by others, BIO can be stacked contrary to FILE. snprintf() and vnsprintf() were added in C99. OpenSSL/SSLeay is older than this. Hence, the SSLeay developpers had to write their own implementation. Unfortunately, having a little used implementation leads to the performance issues described by the OP or to CVE-2016-0799.

simple AES function (not library) in C?

novice to aes. in reading http://en.wikipedia.org/wiki/AES_implementations, I am a bit surprised. I should need just one function
char16 *aes128(char16 key, char16 *secrets, int len);
where char16 is an 8*16=128bit character type. and, presumably, ignoring memory leaks,
assert( bcmp( anystring, aes128(anykey, aes128(anykey, anystring, len), len )==0 );
I am looking over the description of the algorithm on wikipedia, and although I can see myself making enough coding mistakes to take me a few days to debug my own implementation, it does not seem too complex. maybe 100 lines? I did see versions in C#, such as Using AES encryption in C#. that seem themselves almost as long as the algorithm itself. earlier recommendations on stackoverflow mostly recommend the use of individual functions inside larger libraries, but it would be nice to have a go-to function for this task that one could compile into one's code.
so, is AES implementation too complex to be for the faint of heart? or is it reasonably short and simple?
how many lines does a C implementation take? is there a self-contained aes128() C function already in free form somewhere for the taking?
another question: is each block independently encoded? presumably, it would strengthen the encryption if the first block would create a salt that the second block would then use. otoh, this would mean that disk corruption of one block would make every subsequent block undecryptable.
/iaw
You're not seeing a single function like you expect because there are so many options. For example, the block encoding mechanism you described (CBC) is just one option or mode in AES encryption. See here for more information: http://www.heliontech.com/aes_modes_basic.htm
The general rule of thumb in any language is: Don't reinvent something that's already been done and done well. This is especially true in anything related to cryptography.
well using just the AES function is basically insecure as any block X will always be encoded to block Y with key K which is too much information to give an attacker... (according to cryptographers)
so you use some method to change the block cipher at each block. you can use a nonce or Cipher Block Chaining or some other method. but there is a pretty good example on wikipedia (the penguin picture): http://en.wikipedia.org/wiki/Electronic_code_book#Electronic_codebook_.28ECB.29
so in short you can implement AES in one function that is secure (as a block cipher), but it isn't secure if you have data that is longer than 16 bytes.
also AES is fairly complex because of all the round keys... I wouldn't really want to implement it, especially with all of the many good implementations around, but I guess it wouldn't be so bad if you had a good reason to do it.
so in short, to construct a secure stream cipher from a block cipher you need to adopt some strategy to change the effective key along the stream.
ok, so I found a reasonable standalone implementation:
http://www.literatecode.com/aes256
About 400 lines. I will probably use this one.
hope it helps others, too.

Microcontroller Serial Command Interpreter in C/C++; Ways to do it;

I'd like to interpret a command string, recieved by a microcontroller (PIC16f877A if that makes any difference) via serial.
The strings have a pretty simple and straight-foward formatting:
$AABBCCDDEE (5 "blocks" of 2 chracters+'$' for 11 characters in total) where:
$AA= the actual name of the command (could be letters, numbers, both; mandatory);
BB-EE= parameters (numbers; optional);
I'd like to write the code in C/C++.
I figure I could just grab the string via serial, hack it up into blocks, switch () {case} and memcmp the command block ($AA). Then I could have a binary decision tree to make use of the BB CC DD and EE blocks.
I'd like to know if that's the right way to do it (It kinda seems ugly to me, surely there must be a less tedious way to do this!).
Don't over design it ! It does not mean to go blindly coding, but once you have designed something that looks like it can do the job, you can start to implement it. Implementation will give you feedback about your architecture.
For example, when writing your switch case, you might see yourself rewriting code very similar to the one you just wrote for the preceding case. Actually writing down an algorithm will help you see some problem you did not think off, or some simplification you did not see.
Don't aim for the best code on the first try. Aim for
easy to read
easy to debug
Take litlle steps. You do not have to implement the whole thing in one go.
Grab the string from the serial port. Looks easy, right ? Well, let's do that first, just printing out the commands.
Separate the command from the parameters.
Extract the parameters. Will the extraction be the same for each command ? Can you design a data structure valid for every command ?
Once you have done it right, you can start to think of a better solution.
ASCII interfaces are ugly by definition. Ideally you have some sort of frame structure, which maybe you have, the $ indicates the division between frames and you say they are 11 characters in length. If always 11 that is good, if only sometimes that is harder, hopefully there is a $ at the start and 0x0A and or 0x0D/0x0A at the end (CR/LF). Normally I have one module of code that simply extracts bytes from the serial port and puts them into a (circular) buffer. The buffering dating to the days when serial ports had very little of no buffer on board, but even today, esp with microcontrollers, that is still the case. Then another module of code that monitors the buffer searching for frames. Ideally this buffer is big enough to leave the frame there and have room for the next frame and not require another buffer for keeping copies of the frames received. using the circular buffer this second module can move (discarding if necessary as it goes) the head pointer to the beginning of frame marker and waits for a full frames worth of data. Once a full frame appears to be there it calls another function that processes that frame. That function may be the one you are asking about. And "just code it" may be the answer, you are in a microcontroller, so you cant use lazy high level desktop application on an operating system solutions. You will need some sort of strcmp function if created yourself or available to you through a library, or not depending on your solution. The brute force if(strncmp(&frame[1],"bob",3)==0) then, else if(strncmp(&frame[1],"ted",3) then, else if... Certainly works but you may chew up your rom with that kind of thing, or not. And the buffering required for this kind of approach can chew up a lot of ram. This aproach is very readable and maintainable, and portable though. May not be fast (maintainable normally conflicts with reliable and/or performance), but that may not be a concern, so long as you can process this one before the next one comes along, and or before unprocessed data falls out of the circular buffer. Depending on the task the frame checker routine may simply check that the frame is good, I normally put start and end markers, length and some sort of arithmetic checksum and if it is a bad frame it is discarded, this saves on a lot of code checking for bad/corrupt data. When the frame processing routine returns to the search for frame routine it moves the head pointer to purge the frame as it is no longer needed, good frame or bad. The frame checker may only validate a frame and hand it off to yet another function that does the parsing. Each lego block in this arrangement has a very simple task, and operates on the assumption that the lego block below it has performed its task properly. Modular, object oriented, whatever term you want to use makes the design, coding, maintenance, debugging much easier. (at the cost of peformance and resources). This approach works well for any serial type stream be it serial port in a microcontroller (with enough resources) as well as applications on a desktop looking at serial data from a serial port or TCP data which is also serial and NOT frame oriented.
if your micro doesnt have the resources for all that, then the state machine approach also works quite well. Each byte that arrives ticks the state machine one state. Start with idle waiting for the first byte, is the first byte a $? no discard it and go back to idle. if first byte is a $ then go to the next state. If you were looking for say the commands "and", "add", "or", and "xor", then the second state would compare with "a","o", and "x", if none of these then go to idle. if an a then go to a state that compares for n and d, if an o then go to a state that looks for the r. If the look for the r in or state does not see the r then go to idle, if it does then process the command and then go to idle. The code is readable in the sense that you can look at the state machine and see the words a,n,d, a,d,d, o,r, x,o,r, and where they ultimately lead to, but generally not considered readable code. This approach uses very little ram, leans on the rom a bit more but overall could use the least amount of rom as well compared to other parsing approaches. And here again is very portable, beyond microcontrollers, but outside a microcontroller folks might think you are insane with this kind of code (well not if this were verilog or vhdl of course). This approach is harder to maintain, harder to read, but is very fast and reliable and uses the least amount of resources.
To matter what approach once the command is interpreted you have to insure you can perform the command without losing any bytes on the serial port, either through deterministic performance of the code or interrupts or whatever.
Bottom line ascii interfaces are always ugly, the code for them, no matter how many layers of libraries you use to make the job easier, the resulting instructions that get executed are ugly. And one size fits no-one by definition. Just start coding, try a state machine and try the if-then-else-strncmp, and optimizations in between. You should see quickly which one performs best both with your coding style, the tools/processor, and the problem being solved.
It depends on how fancy you want to get, how many different commands there are, and whether new commands are likely to be frequently added.
You could create a data structure that associates each valid command string with a corresponding function pointer - a sorted list accessed with bsearch() is probably fine, although a hash table is an alternative which may have better performance (since the set of valid commands is known beforehand, you could construct a perfect hash with a tool like gperf).
The bsearch() approach might look something like this:
void func_aa(char args[11]);
void func_cc(char args[11]);
void func_xy(char args[11]);
struct command {
char *name;
void (*cmd_func)(char args[11]);
} command_tbl[] = {
{ "AA", func_aa },
{ "CC", func_cc },
{ "XY", func_xy }
};
#define N_CMDS (sizeof command_tbl / sizeof command_tbl[0])
static int comp_cmd(const void *c1, const void *c2)
{
const struct command *cmd1 = c1, *cmd2 = c2;
return memcmp(cmd1->name, cmd2->name, 2);
}
static struct command *get_cmd(char *name)
{
struct command target = { name, NULL };
return bsearch(&target, command_tbl, N_CMDS, sizeof command_tbl[0], comp_cmd);
}
Then if you have command_str pointing to a string from the serial port, you'd do this to dispatch the right function:
struct command *cmd = get_cmd(command_str + 1);
if (cmd)
cmd->cmd_func(command_str);
Don't know if you're still working on this. But I'm working on a similar project and found an embedded command line interpreter http://sourceforge.net/projects/ecli/?source=recommended. That's right, they had embedded applications in mind .
The cli_engine function really helps in taking the inputs from your command line.
Warning: there is no documentation besides a readme file. I'm still working through some bugs integrating the framework but this definitely gave me a head start. You'll have to deal with comparing the strings (i.e. using strcmp) yourself.

Converting Win16 C code to Win32

In general, what needs to be done to convert a 16 bit Windows program to Win32? I'm sure I'm not the only person to inherit a codebase and be stunned to find 16-bit code lurking in the corners.
The code in question is C.
The meanings of wParam and lParam have changed in many places. I strongly encourage you to be paranoid and convert as much as possible to use message crackers. They will save you no end of headaches. If there is only one piece of advice I could give you, this would be it.
As long as you're using message crackers, also enable STRICT. It'll help you catch the Win16 code base using int where it should be using HWND, HANDLE, or something else. Converting these will greatly help with #9 on this list.
hPrevInstance is useless. Make sure it's not used.
Make sure you're using Unicode-friendly calls. That doesn't mean you need to convert everything to TCHARs, but means you better replace OpenFile, _lopen, and _lcreat with CreateFile, to name the obvious
LibMain is now DllMain, and the entire library format and export conventions are different
Win16 had no VMM. GlobalAlloc, LocalAlloc, GlobalFree, and LocalFree should be replaced with more modern equivalents. When done, clean up calls to LocalLock, LocalUnlock and friends; they're now useless. Not that I can imagine your app doing this, but make sure you don't depend on WM_COMPACTING while you're there.
Win16 also had no memory protection. Make sure you're not using SendMessage or PostMessage to send pointers to out-of-process windows. You'll need to switch to a more modern IPC mechanism, such as pipes or memory-mapped files.
Win16 also lacked preemptive multitasking. If you wanted a quick answer from another window, it was totally cool to call SendMessage and wait for the message to be processed. That may be a bad idea now. Consider whether PostMessage isn't a better option.
Pointer and integer sizes change. Remember to check carefully anywhere you're reading or writing data to disk—especially if they're Win16 structures. You'll need to manually redo them to handle the shorter values. Again, the least painful way to deal with this will be to use message crackers where possible. Otherwise, you'll need to manually hunt down and convert int to DWORD and so on where applicable.
Finally, when you've nailed the obvious, consider enabling 64-bit compilation checks. A lot of the issues faced with going from 16 to 32 bits are the same as going from 32 to 64, and Visual C++ is actually pretty smart these days. Not only will you catch some lingering issues; you'll get yourself ready for your eventual Win64 migration, too.
EDIT: As #ChrisN points out, the official guide for porting Win16 apps to Win32 is available archived, and both fleshes out and adds to my points above.
Apart from getting your build environment right, Here are few specifics you will need to address:
structs containing ints will need to change to short or widen from 16 to 32 bits. If you change the size of the structure and this is loaded/saved to disk you will need write data file upgrade code.
Per window data is often stored with the window handle using GWL_USERDATA. If you widen some of the data to 32 bits, your offsets will change.
POINT & SIZE structures are 64 bits in Win32. In Win16 they were 32 bits and could be returned as a DWORD (caller would split return value into two 16 bit values). This no longer works in Win32 (i.e. Win32 does not return 64 bit results) and the functions were changed to accept a pointers to store the return values. You will need to edit all of these. APIs like GetTextExtent are affected by this. This same issue also applies to some Windows messages.
The use of INI files is discouraged in Win32 in favour of the registry. While the INI file functions still work you will need to be careful with Vista issues. 16 bit programs often stored their INI file in the Windows system directory.
This is just a few of the issues I can recall. It has been over a decade since I did any Win32 porting. Once you get into it it is quite quick. Each codebase will have its own "feel" when it comes to porting which you will get used to. You will probably even find a few bugs along the way.
There was a definitive guide in the article Porting 16-Bit Code to 32-Bit Windows on MSDN.
The original win32 sdk had a tool that scanned source code and flagged lines that needed to be changed, but I can't remember the name of the tool.
When I've had to do this in the past, I've used a brute force technique - i.e.:
1 - update makefiles or build environment to use 32 bit compiler and linker. Optionally, just create a new project in your IDE (I use Visual Studio), and add the files manually.
2 - build
3 - fix errors
4 - repeat 2&3 until done
The pain of the process depends on the application you are migrating. I've converted 10,000 line programs in an hour, and 75,000 line programs in less than a week. I've also had some small utilities that I just gave up on and rewrote (mostly) from scratch.
I agree with Alan that trial and error is probably the best way.
Here are some good tips.
Agreed that the compiler will probably catch most of the errors. Also, if you are using "near" and "far" pointers you can remove those designations -- a pointer is just a pointer in Win32.

Resources