In general, what needs to be done to convert a 16 bit Windows program to Win32? I'm sure I'm not the only person to inherit a codebase and be stunned to find 16-bit code lurking in the corners.
The code in question is C.
The meanings of wParam and lParam have changed in many places. I strongly encourage you to be paranoid and convert as much as possible to use message crackers. They will save you no end of headaches. If there is only one piece of advice I could give you, this would be it.
As long as you're using message crackers, also enable STRICT. It'll help you catch the Win16 code base using int where it should be using HWND, HANDLE, or something else. Converting these will greatly help with #9 on this list.
hPrevInstance is useless. Make sure it's not used.
Make sure you're using Unicode-friendly calls. That doesn't mean you need to convert everything to TCHARs, but means you better replace OpenFile, _lopen, and _lcreat with CreateFile, to name the obvious
LibMain is now DllMain, and the entire library format and export conventions are different
Win16 had no VMM. GlobalAlloc, LocalAlloc, GlobalFree, and LocalFree should be replaced with more modern equivalents. When done, clean up calls to LocalLock, LocalUnlock and friends; they're now useless. Not that I can imagine your app doing this, but make sure you don't depend on WM_COMPACTING while you're there.
Win16 also had no memory protection. Make sure you're not using SendMessage or PostMessage to send pointers to out-of-process windows. You'll need to switch to a more modern IPC mechanism, such as pipes or memory-mapped files.
Win16 also lacked preemptive multitasking. If you wanted a quick answer from another window, it was totally cool to call SendMessage and wait for the message to be processed. That may be a bad idea now. Consider whether PostMessage isn't a better option.
Pointer and integer sizes change. Remember to check carefully anywhere you're reading or writing data to disk—especially if they're Win16 structures. You'll need to manually redo them to handle the shorter values. Again, the least painful way to deal with this will be to use message crackers where possible. Otherwise, you'll need to manually hunt down and convert int to DWORD and so on where applicable.
Finally, when you've nailed the obvious, consider enabling 64-bit compilation checks. A lot of the issues faced with going from 16 to 32 bits are the same as going from 32 to 64, and Visual C++ is actually pretty smart these days. Not only will you catch some lingering issues; you'll get yourself ready for your eventual Win64 migration, too.
EDIT: As #ChrisN points out, the official guide for porting Win16 apps to Win32 is available archived, and both fleshes out and adds to my points above.
Apart from getting your build environment right, Here are few specifics you will need to address:
structs containing ints will need to change to short or widen from 16 to 32 bits. If you change the size of the structure and this is loaded/saved to disk you will need write data file upgrade code.
Per window data is often stored with the window handle using GWL_USERDATA. If you widen some of the data to 32 bits, your offsets will change.
POINT & SIZE structures are 64 bits in Win32. In Win16 they were 32 bits and could be returned as a DWORD (caller would split return value into two 16 bit values). This no longer works in Win32 (i.e. Win32 does not return 64 bit results) and the functions were changed to accept a pointers to store the return values. You will need to edit all of these. APIs like GetTextExtent are affected by this. This same issue also applies to some Windows messages.
The use of INI files is discouraged in Win32 in favour of the registry. While the INI file functions still work you will need to be careful with Vista issues. 16 bit programs often stored their INI file in the Windows system directory.
This is just a few of the issues I can recall. It has been over a decade since I did any Win32 porting. Once you get into it it is quite quick. Each codebase will have its own "feel" when it comes to porting which you will get used to. You will probably even find a few bugs along the way.
There was a definitive guide in the article Porting 16-Bit Code to 32-Bit Windows on MSDN.
The original win32 sdk had a tool that scanned source code and flagged lines that needed to be changed, but I can't remember the name of the tool.
When I've had to do this in the past, I've used a brute force technique - i.e.:
1 - update makefiles or build environment to use 32 bit compiler and linker. Optionally, just create a new project in your IDE (I use Visual Studio), and add the files manually.
2 - build
3 - fix errors
4 - repeat 2&3 until done
The pain of the process depends on the application you are migrating. I've converted 10,000 line programs in an hour, and 75,000 line programs in less than a week. I've also had some small utilities that I just gave up on and rewrote (mostly) from scratch.
I agree with Alan that trial and error is probably the best way.
Here are some good tips.
Agreed that the compiler will probably catch most of the errors. Also, if you are using "near" and "far" pointers you can remove those designations -- a pointer is just a pointer in Win32.
Related
It have often noticed that I would have been able to solve practical problems in C elegantly if there had been a way of creating a ‘virtual FILE’ and attaching the necessary callbacks for events such as buffer full, input requested, close, flush. It should then be possible to use a large part of the stdio.h functions, e.g. fprintf unchanged. Is there a framework enabling one to do this? If not, is it feasible with a moderate amount of effort, on at least some platforms?
Possible applications would be:
To write to or read from a dynamic or static region of memory.
To write to multiple files in parallel.
To read from a thread or co-routine generating data.
To apply a filter to another (virtual or real) FILE.
Support for file formats with indirection (like #include).
A C pre-processor(?).
I am less interested in solutions for specific cases than in a framework to let you roll your own FILE. I am also not looking for a virtual filesystem, but rather virtual FILE*s that I can pass to the CRT.
To my disappointment I have never seen anything of the sort; as far as I can see C11 considers FILE entirely up to the language implementer, which is perhaps reasonable if one wishes to keep the language (+library) specifications small but sad if you compare it with Java I/O streams.
I feel sure that virtual FILEs must be possible with any (fully) open source implementation of the C run-time, but I imagine there might be a large number of details making it trickier than it seems, and if it has already been done it would be a shame to reduplicate the effort. It would also be greatly preferable not to have to modify the CRT code. Without open source one might be able to reverse engineer the functions supplied, but I fear the result would be far too vulnerable to changes in unsupported features, unless there were a commitment to a set of interfaces. I suppose too that any system for which one can write a device driver would allow one to create a virtual device, but I suspect that of being unnecessarily low-level and of requiring one to write privileged code.
I have to admit that while I have code that would have benefited from virtual FILEs, I have no current requirement for it; nonetheless it is something I have often wondered about and that I imagine could be of interest to others.
This is somewhat similar to a-reader-interface-that-consumes-files-and-char-in-c, but there the questioner did not hope to return a virtual FILE; the answer, however, using fmemopen, did.
There is no standard C interface for creating virtual FILE*s, but both the GNU and the BSD standard libraries include one. On linux (glibc), you can use fopencookie; on most *BSD systems, funopen (including Mac OS X). (See Note 1)
The two interfaces are similar but slightly different in some details. However, it is usually very simple to adapt code written for one interface to the other.
These are not complete virtualizations. They associated the FILE* with four callbacks and a void* context (the "cookie" in fopencookie). The callbacks are read, write, seek and close; there are no callbacks for flush or tell operations. Still, this is sufficient for many simple FILE* adaptors.
For a simple example, see the two answers to Write simultaneousely to two streams.
Notes:
funopen is derived from "functional open", not from "file unopen".
Question
I am wondering why do we connect to sockets by using functions like hton to take care of endianness when we could have sent the ip in plain char array.
Say we want to connect to 184.54.12.169
There is an explanation to this but I cannot figure out why we use integers instead of char, and so involving ourself in endianness hell.
I think char out_ip[] = "184.54.12.169" could have theoretically made it.
Please explain me the subtleties i don't get here.
The basic networking APIs are low level functions. These are very thin wrappers around kernel system calls. Removing these low level functions, forcing everything to use strings, would be rather bad for a low-level API like that, especially considering how tedious string handling is in C. As a concrete hurdle, even IP strings would not be fixed length, so handling them is a lot more complex than just plain 32 bit integers. And moving string handling to kernel is really quite against what kernel is supposed to be, handling arbitrary user strings is really user space problem.
So, you want to create higher-level functions which would accept strings and do the conversion in the library. But, adding such higher level "convenience" functions all over the place in the core libraries would bloat them, because certainly passing IP numbers is not the only place for such convenience. These functions would need to be maintained forever and included everywhere, after they became part of standard (official like POSIX, or de-facto) libraries.
So, removing the low-level functions is not really an option, and adding more functions for higher-level API in the same library is not a good option either.
So solution is to use another library to provide higher level networking API, which could for example handle address strings directly. Not sure what's out ther for C, but it's almost a given for other languages, which also have "real" strings built in so using them is not a hassle.
Because that's how an IP is transmitted in a packet. The "www.xxx.yyy.zzz" string form is really just a human readable form of a 4 byte integer that allows us to see the hierarchical nature a little easier. Sending a whole string would take up a lot more space as well.
Say number 127536 that requires 7 bytes not four. In addition you need to parse it.
I.e. more efficient and do not have to deal with invalid values.
When writing safe code in straight C, I'm sick and tired of coming up with arbitrary
numbers to represent limitations -- specifically, the maximum amount of
memory to allocate for a single line of text. I know I can always say
stuff like
#define MAX_LINE_LENGTH 1024
and then pass that macro to functions such as snprintf().
I work and code in NetBSD, which has a sysctl(3) variable called
"user.line_max" designed for this very purpose. So I don't need to come up
with an arbitrary number like MAX_LINE_LENGTH, above. I just read the
"user.line_max" sysctl variable, which by the way is settable by the user.
My question is whether this is the Right Thing in terms of safety and
portability. Perhaps different operating systems have a different name for
this sysctl, but I'm more interested in whether I should be using this
technique at all.
And for the record, "portability" excludes Microsoft Windows in this case.
Well the linux SYSCTL (2) man page has this to say in the Notes section:
Glibc does not provide a wrapper for this system call; call it using syscall(2).
Or rather... don't call it: use of this system call has long been discouraged, and it is so unloved that it is likely to disappear in a future kernel version. Remove it from your programs now; use the /proc/sys interface instead.
So that is one consideration.
Not a good idea. Even if it weren't for what Duck told you, relying on a system-wide setting that's runtime-variable is bad design and error-prone. If you're going to go to the trouble of having buffer size limits be variable (which typically requires dynamic allocation and checking for failure) then you should go the last step and make it configurable on a more local scope.
With your example of buffer size limits, opinions differ as to what's the best practice. Some people think you should always use dynamically-growing buffers with no hard limit. Others prefer fixed limits sufficiently large that reasonable data would not exceed them. Or, as you've noted, configurable limits are an option. In choosing what's right for your application, I would consider the user experience implications. Sure users don't like arbitrary limits, but they also don't like it when accidentally (or by somebody else's malice) reading data with no newlines in it causes your application to consume unbounded amounts of memory, start swapping, and/or eventually crash or bog down the whole system.
The nearest portable construct for this is "getconf LINE_MAX" or the equivalent C.
1) Check out the Single Unix Specification, keyword: "limits"
2) s/safety/security/
Since there is this: http://en.wikipedia.org/wiki/Code_page_437 For MSDOS, is there something similar for Linux systems? Is it possible to access that font data via userland program? I would actually just need an access to the actual bit patterns which define the font, and I would do the rendering myself. I'm fairly sure that something like this exists, but I haven't been able to find what exactly is it and how to access it. After all, e.g. text mode console font has to reside somewhere, and I really do hope it is "rawly" accessible somehow for a userland program.
Before I forget, I'm programming my program in C, and have access only to the "standard" linux/posix development headers. The only thing I could came up with myself is to use the fonts in /usr/share/fonts, but having to write my own implementations to extract the data from there doesn't sound really an option; I would really want to achieve this with the least amount of bytes possible, so I feel I'm left with finding a standard way of doing this.
It's not really feasible for me to store my own 8x8 ASCII-compatible font with the program either(it takes some 1024 bytes(128 chars * 8x8 bits) just to store the font, which is definitely unacceptable for the strict size limits(some < 1024 bytes for code+data) which I am working with), so being able to use the font data stored at the system itself would greatly simplify my task.
I had a look at consolechars sources and it looks like there is a whole library for this kind of stuff. On Ubuntu it's named libconsole and header files (like lct/font.h) are in the console-tools-dev package. There are functions to find and load fonts which seems to be exactly what you need. And consolechars source is a nice example of how to use them.
You should use freetype , its commonly installed in all the Linuxes.
I'm attempting to build a control-flow graph of the assembly results that are returned via a call to objdump -d . Currently the best method I've come up with is to put each line of the result into a linked list, and separate out the memory address, opcode, and operands for each line. I'm separating them out by relying on the regular nature of objdump results (the memory address is from character 2 to character 7 in the string that represents each line) .
Once this is done I start the actual CFG instruction. Each node in the CFG holds a starting and ending memory address, a pointer to the previous basic block, and pointers to any child basic blocks. I'm then going through the objdump results and comparing the opcode against an array of all control-flow opcodes in x86_64. If the opcode is a control-flow one, I record the address as the end of the basic block, and depending on the opcode either add two child pointers (conditional opcode) or one (call or return ) .
I'm in the process of implementing this in C, and it seems like it will work but feels very tenuous. Does anyone have any suggestions, or anything that I'm not taking into account?
Thanks for taking the time to read this!
edit:
The idea is to use it to compare stack traces of system calls generated by DynamoRIO against the expected CFG for a target binary, I'm hoping that building it like this will facilitate that. I haven't re-used what's available because A) I hadn't really though about it and B) I need to get the graph into a usable data structure so I can do path comparisons. I'm going to take a look at some of the utilities on the page you lined to, thanks for pointing me in the right direction. Thanks for your comments, I really appreciate it!
You should use an IL that was designed for program analysis. There are a few.
The DynInst project (dyninst.org) has a lifter that can translate from ELF binaries into CFGs for functions/programs (or it did the last time I looked). DynInst is written in C++.
BinNavi uses the ouput from IDA (the Interactive Disassembler) to build an IL out of control flow graphs that IDA identifies. I would also recommend a copy of IDA, it will let you spot check CFGs visually. Once you have a program in BinNavi you can get its IL representation of a function/CFG.
Function pointers are just the start of your troubles for statically identifying the control flow graph. Jump tables (the kinds generated for switch case statements in certain cases, by hand in others) throw a wrench in as well. Every code analysis framework I know of deals with those in a very heuristics-heavy approach. Then you have exceptions and exception handling, and also self-modifying code.
Good luck! You're getting a lot of information out of the DynamoRIO trace already, I suggest you utilize as much information as you can from that trace...
I found your question since I was interested in looking for the same thing.
I found nothing and wrote a simple python script for this and threw it on github:
https://github.com/zestrada/playground/blob/master/objdump_cfg/objdump_to_cfg.py
Note that I have some heuristics to deal with functions that never return, the gcc stack protector on 32bit x86, etc... You may or may not want such things.
I treat indirect calls similar to how you do (basically have a node in the graph that is a source when returning from an indirect).
Hopefully this is helpful for anyone looking to do similar analysis with similar restrictions.
I was also facing a similar issue in the past and wrote asm2cfg tool for this purpose: https://github.com/Kazhuu/asm2cfg. Tool has support for GDB disassembly and objdump inputs and spits out CFG as a dot or pdf.
Hopefully someone finds this helpful!