Static / Dynamic source code analysis - c

I took a class named "Secure Code", and in our next assignment we are supposed to do static / dynamic analysis of some C files and of a JavaEE Web Project.
I checked out "Source Monitor" and ran it on the C files, but (unless I didn't get how to use it!) it doesn't seem to do what I'm looking for.
Considering the topic, I'd be interested in knowing if there are tools for detecting "insecure" code, i.e. code that is potentially attackable through buffer overflows, SQL-Injections, XSS ... So I'd like it to point out which functions should be "upgraded" (e.g. fgets instead of gets, or a PreparedStatement instead of a normal SQL statement)
Note: I'd prefer open source softwares, possibly for Windows (I have Ubuntu on a VM but I am not really good with it... I generally spend more time finding out how to configure the tools than running them).
Thank you for your tips!

Frama-C's value analysis is open-source, available pre-compiled for Windows, and was used to find such security bugs as this one in the QuickLZ C library or this one in Polar SSL.
This said, you may find that it is a lot to get used to for just a school assignment, and then again, are you actually expected to find security bugs in a school assignment?

For the JavaEE Web Project use Persistence API, and you can use non-SQL statements, where hacking is theoretically impossible! The best open source one is the Hibernate. It's easy to use and very flexible.

Related

Code refactoring tools for C, usable on GNU/Linux? FOSS preferable

Variations of this question have been asked, but not specific to GNU/Linux and C. I use Komodo Edit as my usual Editor, but I'd actually prefer something that can be used from CLI.
I don't need C++ support; it's fine if the tool can only handle plain C.
I really appreciate any direction, as I was unable to find anything.
I hope I'm not forced to 'roll' something myself.
NOTE: Please refrain from mention vim; I know it exists and what its capabilities are. I purposefully choose to avoid vim, which is why I use Komodo (or nano on the servers).
I don't think that a pure console refactoring tool would be nice to use.
I use Eclipse CDT on linux to write and refactor C-Code.
There exists also Xrefactory for Emacs http://www.xref.sk/xrefactory/main.html
if a non console refactoring tool is o.k for you as well.
C-xrefactory was an open source version of xrefactory, covering C and Java, made available on SourceForge by Marián Vittek under GPLv2.
For those interested, there's an actively maintained c-xrefactory fork on GitHub:
https://github.com/thoni56/c-xrefactory
The goal of the GitHub fork is to refactor c-xrefactory itself, add a test suite, and try to document the original source code (which is rather obscure). Maybe, in the future, also convert it into an LSP C language server and refactoring tool.
C-xrefactory works on Emacs; setup scripts and instructions can be found at the repository. Windows users can run it via WSL/WSL2.
You could consider coding a GCC plugin or a MELT extension (MELT is a domain specific language to extend GCC) for your needs.
However, such approach would take you some time, because you'll need to understand some of GCC internals.
For Windows only, and not FOSS but you said "any direction..."
Our DMS Software Reengineering Toolkit" with its C Front End can apply transformations to C source code. DMS can be configured to carry out custom, complex reliable transformations, although the configuration isn't as easy as typing just a command like "refactor frazzle by doobaz".
One of the principal stumbling blocks is still the preprocessor. DMS can transform code that has preprocessor directives in typical places (around statements, expressions, if/for/while loop heads, declarations, etc.) but other "unstructured conditionals" give it trouble. You can run DMS by expanding the preprocessor directives out of existence, or more imporantly, expanding out the ones that give it trouble, but mostly people don't like this because they prefer to keep thier preprocessor directives. So it isn't perfect.
[Another answer suggested Concinelle, which looks pretty good from my point of view. As far as I know, it doesn't handle preprocessor directives at all; I could be wrong and it might handle some cases as DMS does, but I'm sure it can't handle all the cases].
You don't want to consider rolling your own. Building a transformation/refactoring tool is much harder than you might guess having never tried it. You need full, accurate parsers for the (C) dialect of interest and just that is pretty hard to get right. You need a preprocessor, symbol tables, flow analysis, transformation, code regeneration machinery, ... this stuff takes years of effort to build and get right. Trust me, been there, done that.

Porting Autodesk Animator Pro to be cross platform

a previous relevant question from me is here Reverse Engineering old paint programs
I have set up my base of operations here: http://animatorpro.org
wiki coming soon.
Okay, so now I have a 300,000 line legacy MSDOS codebase. It's sort of a "be careful what you wish for" situation. I am not an experienced C programmer. I'm not entirely inexperienced either, but for all intents and purposes I'm a noob to the language and in particular the intricacies of its libraries. I am especially ignorant of the vagaries of the differences between C programs written specifically for MSDOS and programs that are cross platform. However I have been studying this code base for over a year now, and this is what I know about Animator Pro:
Compilers and tools used:
Watcom C compiler
tcmake (make program from Turbo C)
386asm, a specialised assembler for the Phar Lap dos extender
and of course, the Phar Lap dos extender itself.
a selection of obscure dos utilities
Much of the compilation seems to be driven by batch files. Though I have obtained copies of all these tools, I have not yet succeeded at compiling it. (though I have compiled its older brother, autodesk animator original.
It's got a plugin system that replicates DLL before DLL's were available, based on REX. The plugin system handles:
Video Drivers (with a plethora of included VESA drivers)
Input drivers (including wacom tablets, and keyboards)
Drawing Tools
Inks (Like photoshop's filters, or blending modes)
Scripting Addons (essentially compiled scripts)
File formats
It's got its own script interpreter named POCO, based on the C language- The scripting language has enough power to do virtually all the things the plugin system can do- Just slower.
Given this information, this is my development plan. Please criticise this. The source code is available in the link above, so you can easily, if you are so inclined, assess the situation yourself.
Compile with its original tools.
Switch to using DJGPP, and make the necessary changes to get it to compile with that, plus the original assembler.
Include the Allegro.cc "Game" library, and switch over as much functionality to that library as possible- Perhaps by simply writing new video and input drivers that use the Allegro API. I'm thinking allegro rather than SDL because: there is a DOS version of Allegro, and fascinatingly, one of its core functions is the ability to play Animator Pro's native format FLIC.
Hopefully after 3, I will have eliminated most or all of the Assembler in the project. I say hopefully, because it's in an obscure dialect that doesn't assemble in any modern free assembler without significant modification. I have tried them all. Whatever is left gets converted to assemble in NASM, or to C code if I can define the assembler's actual function.
Switch the dos extender from Phar Lap to HX Dos http://www.japheth.de/HX.html, Which promises to replicate as much of the WIN32 api as possible. Then make all the necessary code changes for that to work.
Switch to the win32 version of Allegro.cc, assuming that the win32 version can run on top of HXDos. Make any further necessary changes
Modify the plugin system to use some kind of standard cross platform plugin library. What this would be, I have no idea. Maybe you can offer some suggestions? I talked to the developer who originally wrote the plugin system, and he said some of the things it does aren't possible on modern OS's because of segmentation restrictions. I'm not sure what this means, but I'm guessing it means all the plugins will need to be rewritten almost from scratch.
Magically, I got all the above done, and we can try and make it run in windows, osx, and linux, whilst dealing with other cross platform niggles like long file names, and things I haven't thought of.
Anyone got a problem with any of this? Is allegro a good choice? if not, why? what would you do about this plugin system? What would you do different? Is this whole thing foolish, and should I just rewrite it from scratch, using the original as inpiration? (it would apparently take the original developer "About a month" to do that)
One thing I haven't covered above is the text/font system. Not sure what to do about that, but Animator Pro has its own custom font format, but also is able to use Postscript Type 1 fonts, and some other formats.
My biggest concern with your plan, in a nutshell: Your approach seems to be to attempt to keep the whole enormous thing working at all times, tweaking the environment ever-further away from DOS. During each tweak to the environment, that means you will have approximately a billion subtle assumptions that might have broken at once, none of which you necessarily understand yet. Untangling them all at once will be incredibly painful.
If I were doing the port, my approach would be to disable as much code as possible to get SOMETHING running in a modern environment, and bring the parts back online, one piece at a time. Write a simple test harness program that loads a display driver and draws some stuff, and compile it for DOS to make sure you understand the interface. Then write some C code that implements the same interface, but with Allegro (or SDL or SFML), and make that program work under Windows or Linux. When the output differs, you have a simple test case to work from.
Your entire job on this port is swapping out implementations of various interfaces and functions with completely new ones. This is a job that unit testing excels at. Don't write any new code without a test of some kind that runs on the old code under DOS! Make your potential problems as small and simple as you possibly can. Port assembly code instead of rewriting it only if you're reasonably confident that it will actually make your job easier (ie, algorithmic stuff that compiles fine with few tweaks under NASM). Don't bite off a bigger piece than you can comfortably fit in your brain at once.
I, for one, look forward to seeing your progress! I think what you're attempting to do is great. Thanks for doing it.
Hmmm - I might approach it by writing an OpenGL video "driver" for it. and todays machines are fast enough with tons of ram that you could do all the pixel specific algorithms on main CPU into a back buffer and it would work. As the "generic" VGA driver just mapped the video buffer to a pointer this would be a place to start. There was a zoom mode in the UI so you can look at the pixels on a high res display.
It is often very difficult to take an existing non-trivial code base that wasn't written with portability in mind - you mention a few - and then try to make it portable. There will be a lot of problems on the way. It is probably a better idea to start from scratch and rewrite the code using the existing code as reference only. If you start from scratch you can leverage existing portable UI solution in your new project like Qt.

How can I best deploy a web application written in C?

Say I have fancy new algorithm written in C,
int addone(int a) {
return a + 1;
}
And I want to deploy as a web application, for example at
http://example.com/addone?a=5
which responds with,
Content-Type: text/plain
6
What is the best way to host something like this? I have an existing setup using Python mod_wsgi on Apache2, and for testing I've just built a binary from C and call as a subprocess using Python's os.popen2.
I want this to be very fast and not waste overhead (ie I don't need this other Python stuff at all). I can dedicate the whole server to it, re-compile anything needed, etc.
I'm thinking about looking into Apache C modules. Is that useful? Or I may build SWIG wrappers to call directly from Python, but again that seems wasteful if I'm not using Python at all. Any tips?
The easiest way should be to write this program as a CGI app (http://en.wikipedia.org/wiki/.cgi). It would run with any webserver that supports the Common Gateway Interface.
The output format needs to follow the CGI rules.
If you want to take full advantage of the web server capabilities then you can write an Apache module in C. That needs a bit more preparation but gives you full control.
Maybe this tiny dynamic webserver in C to be used with C language can help you.. it should be easy to use and self-contained.
Probably the fastest solution you can adopt according to the benchmarks shown on their homepage!
This article from yesterday has a good discussion on why not to use C as a web framework. I think an easy solution for you would be to use ctypes, it's certainly faster than starting a subprocess. You should make sure that your method is thread safe and that you check your input argument.
from ctypes import *
libcompute = CDLL("libcompute.so")
libcompute.addone(int(a))
I'm not convinced that you're existing general approach might not be the best one. I'm not saying that Apache/Python is necessarily the correct one but there is something compelling about separating the concerns in your architecture being composed of highly focused elements that are specialists in their functions within the overall system.
Having your C-based algorithm server being decoupled from the HTTP server may give you access to things like HTTP scalability and caching facilities that might otherwise have to be in-engineered (or reinvented) within your algorithm component if things are too tightly coupled.
I don't think performance concerns in of themselves are always the best or only reasons when designing an architecture. For example the a YAWS deployment with a C-based driver could be a very performant option.
I have just setup a web service using libmicrohttpd and have had amazing results. On a quad core I've been handling 20400 requests a second and the CPU is running only at 58%. This is probably going to be deployed on a server with 8 cores, so I'm expecting much better results. A very simple C service will be even faster!
I have tried GWAN, it is very good, but it's closed, and doesn't play well with virtual environments. I will give #Gil kudos being good at supporting it here though. We just had a few issues and found LibMicroHttpd works better for our needs.
If you go here, you may need to update your openssl if you're using CentOs from axivo
rpm -ivh --nosignature http://rpm.axivo.com/redhat/axivo-release-6-1.noarch.rpm
yum --disablerepo=* --enablerepo=axivo update openssl-devel
You can try Duda I/O, it only requires a Linux host: http://duda.io

Best way to implement plugin framework - are DLLs the only way (C/C++ project)?

Introduction:
I am currently developing a document classifier software in C/C++ and I will be using Naive-Bayesian model for classification. But I wanted the users to use any algorithm that they want(or I want in the future), hence I went to separate the algorithm part in the architecture as a plugin that will be attached to the main app # app start-up. Hence any user can write his own algorithm as a plugin and use it with my app.
Problem Statement:
The way I am intending to develop this is to have each of the algorithms that user wants to use to be made into a DLL file and put into a specific directory. And at the start, my app will search for all the DLLs in that directory and load them.
My Questions:
(1) What if a malicious code is made as a DLL (and that will have same functions mandated by plugin framework) and put into my plugins directory? In that case, my app will think that its a plugin and picks it and calls its functions, so the malicious code can easily bring down my entire app down (In the worst case could make my app as a malicious code launcher!!!).
(2) Is using DLLs the only way available to implement plugin design pattern? (Not only for the fear of malicious plugin, but its a generic question out of curiosity :) )
(3) I think a lot of softwares are written with plugin model for extendability, if so, how do they defend against such attacks?
(4) In general what do you think about my decision to use plugin model for extendability (do you think I should look at any other alternatives?)
Thank you
-MicroKernel :)
Do not worry about malicious plugins. If somebody managed to sneak a malicious DLL into that folder, they probably also have the power to execute stuff directly.
As an alternative to DLLs, you could hook up a scripting language like Python or Lua, and allow scripted plugins. But maybe in this case you need the speed of compiled code?
For embedding Python, see here. The process is not very difficult. You can link statically to the interpreter, so users won't need to install Python on their system. However, any non-builtin modules will need to be shipped with your application.
However, if the language does not matter much to you, embedding Lua is probably easier because it was specifically designed for that task. See this section of its manual.
See 1. They don't.
Using a plugin model sounds like a fine solution, provided that a lack of extensibility really is a problem at this point. It might be easier to hard-code your current model, and add the plugin interface later, if it turns out that there is actually a demand for it. It is easy to add, but hard to remove once people started using it.
Malicious code is not the only problem with DLLs. Even a well-meaning DLL might contain a bug that could crash your whole application or gradually leak memory.
Loading a module in a high-level language somewhat reduces the risk. If you want to learn about embedding Python for example, the documentation is here.
Another approach would be to launch the plugin in a separate process. It does require a bit more effort on your part to implement, but it's much safer. The seperate process approach is used by Google's Chrome web browser, and they have a document describing the architecture.
The basic idea is to provide a library for plugin writers that includes all the logic for communicating with the main app. That way, the plugin author has an API that they use, just as if they were writing a DLL. Wikipedia has a good list of ways for inter-process communication (IPC).
1) If there is a malicious dll in your plugin folder, you are probably already compromised.
2) No, you can load assembly code dynamically from a file, but this would just be reinventing the wheel, just use a DLL.
3) Firefox extensions don't, not even with its javascript plugins. Everything else I know uses native code from dynamic libraries, and is therefore impossible to guarantee safety. Then again Chrome has NaCL which does extensive analysis on the binary code and rejects it if it can't be 100% sure it doesn't violate bounds and what not, although I'm sure they will have more and more vulnerabilities as time passes.
4) Plugins are fine, just restrict them to trusted people. Alternatively, you could use a safe language like LUA, Python, Java, etc, and load a file into that language but restrict it only to a subset of API that wont harm your program or environment.
(1) Can you use OS security facilities to prevent unauthorized access to the folder where the DLL's are searched or loaded from? That should be your first approach.
Otherwise: run a threat analysis - what's the risk, what are known attack vectors, etc.
(2) Not necessarily. It is the most straigtforward if you want compiled plugins - which is mostly a question of performance, access to OS funcitons, etc. As mentioned already, consider scripting languages.
(3) Usually by writing "to prevent malicous code execution, restrict access to the plugin folder".
(4) There's quite some additional cost - even when using a plugin framework you are not yet familiar with. it increases cost of:
the core application (plugin functionality)
the plugins (much higher isolation)
installation
debugging + diagnostics (bugs that occur only with a certain combinaiton of plugins)
administration (users must know of, and manage plugins)
That pays only if
installing/updating the main software is much more complex than updating the plugins
individual components need to be updated individually (e.g. a user may combine different versions of plugins)
other people develop plugins for your main application
(There are other benefits of moving code into DLL's, but they don't pertain to plugins as such)
What if a malicious code is made as a DLL
Generally, if you do not trust dll, you can't load it one way or another.
This would be correct for almost any other language even if it is interpreted.
Java and some languages do very hard job to limit what user can do and this works only because they run in virtual machine.
So no. Dll loaded plug-ins can come from trusted source only.
Is using DLLs the only way available to implement plugin design pattern?
You may also embed some interpreter in your code, for example GIMP allows writing plugins
in python.
But be aware of fact that this would be much slower because if nature of any interpreted language.
We have a product very similar in that it uses modules to extend functionality.
We do two things:
We use BPL files which are DLLs under the covers. This is a specific technology from Borland/Codegear/Embarcadero within C++ Builder. We take advantage of some RTTI type features to publish a simple API similar to the main (argv[]) so any number of paramters can be pushed onto the stack and popped off by the DLL.
We also embed PERL into our application for things that are more business logic in nature.
Our software is an accounting/ERP suite.
Have a look at existing plugin architectures and see if there is anything that you can reuse. http://git.dronelabs.com/ethos/about/ is one link I came across while googling glib + plugin. glib itself might may it easier to develop a plugin architecture. Gstreamer uses glib and has a very nice plugin architecture that may give you some ideas.

Why do you obfuscate your code?

Have you ever obfuscated your code before? Are there ever legitimate reasons to do so?
I have obfuscated my JavaScript. It made it smaller, thus reducing download times. In addition, since the code is handed to the client, my company didn't want them to be able to read it.
Yes, to make it harder to reverse engineer.
To ensure a job for life, of course (kidding).
This is pretty hilarious and educational: How to Write Unmaintanable Code.
It's called "Job Security". This is also the reason to use Perl -- no need to do obfuscation as separate task, hence higher productivity, without loss of job security.
Call it "security through obsfuscability" if you will.
I don't believe making reverse engineering harder is a valid reason.
A good reason to obfuscate your code is to reduce the compiled footprint. For instance, J2ME appliactions need to be as small as possible. If you run you app through an obfuscator (and optimiser) then you can reduce the jar from a couple of Mb to a few hundred Kb.
The other point, nestled above, is that most obfuscators are also optimisers which can improve your application's performance.
Isn't this also used as security through obscurity? When your source code is publically available (javascript etc) you might want to at least it somewhat harder to understand what is actually occuring on the client side.
Security is always full of compromises. but i think that security by obscurity is one of the least effective methods.
I believe all TV cable boxes will have the java code obfuscated. This does make things harder to hack, and since the cable boxes will be in your home, they are theoretically hackable.
I'm not sure how much it will matter since the cable card will still control signal encryption and gets its authorization straight from the video source rather than the java code guide or java apps, but they are pretty dedicated to the concept.
By the way, it is not easy to trace exceptions thrown from an obfuscated stack! I actually memorized at one point that aH meant "Null Pointer Exception" for a particular build.
I remember creating a Windows Service for Online Backup application that was built in .NET. I could easily use either Visual Studio or tools like .NET Reflector to see the classes and the source code inside it.
I created a new Visual Studio Test application and added the Windows Service reference to it. Double clicked on the reference and I can see all the classes, namespaces everything (not the source code though). Anybody can figure out the internal working of your modules by looking at the class names. In my case, one such class was FTPHandler that clearly tells where the backups are going.
.NET Reflector goes beyond that by showing the actual code. It even has an option to Export the whole project so you get a VS project with all the classes and source code similar to what the developer had.
I think it makes sense to obfuscate, to make it atleast harder if not impossible for someone to disassemble. Also I think it makes sense for products involving large customer base where you do not want your competitors to know much about your products.
Looking at some of the code I wrote for my disk driver project makes me question what it means to be obfuscated.
((int8_t (*)( int32_t, void * )) hdd->_ctrl)( DISK_CMD_REQUEST, (void *) dr );
Or is that just system programming in C? Or should that line be written differently? Questions...
Yes and no, I haven't delivered apps with a tool that was easy decompilable.
I did run something like obfuscators for old Basic and UCSD Pascal interpreters, but that was for a different reason, optimizing run time.
If I am delivering Java Swing apps to clients, I always obfuscate the class files before distribution.
You can never be too careful - I once pointed a decent Java decompiler (I used the JD Java Decompiler - http://www.djjavadecompiler.com/ ) at my class files and was rewarded with an almost perfect reproduction of the original code. That was rather unnerving, so I started obfuscating my production code ever since. I use Klassmaster myself (http://www.zelix.com/klassmaster/)
I obfuscated code of my Android applications mostly. I used ProGuard tool to obfuscate the code.
When I worked on the C# project, our team used the ArmDot. It's licensing and obfuscation system.
Modern obfuscators are used not only to make hacking process difficult. They are able to protect programs and games from cheating, check licenses/keys and even optimize code.
But I don't think it is necessary to use obfuscator in every project.
It's most commonly done when you need to provide something in source (usually due to the environment it's being built in, such as systems without shared libraries, especially if you as the seller don't have the exact system being build for), but you don't want the person you're giving it to to be able to modify or extend it significantly (or at all).
This used to be far more common than today. It also led to the (defunct?) Obfuscated C Contest.
A legal (though arguably not "legitimate") use might be to release "source" for an app you're linking with GPL code in obfuscated fashion. It's source, it can be modified, it's just very hard. That would be a more extreme version of releasing it without comments, or releasing with all whitespace trimmed, or (and this would be pushing the legal grounds probably) releasing assembler source generated from C (and perhaps hand-tweaked so you can say it's not just intermediate code).

Resources