Rebasing and debugging - rebase

So usually when I debug with IDA I don't come across any issues; however, with this one particular process (which is 9.9 MB in size before modules) IDA insists it rebase every single time it starts the process, which freezes IDA and forces me to wait a good 20-30 minutes before it actually starts.
Why does it do this, and can I somehow disable this? I'm new-ish to advanced debugging such as this so rebasing only makes a little sense to me.

In case anyone else finds this page like I did, this can also be caused if the DLL's preferred entry point is already in use it must rebase it before it can continue.
To correct this you can use the ReBase.exe tool that comes with the windows SDK (or visual studio)
ReBase.Exe -b 7600000 myBadBasedDll.dll
so that will reset the base of the dll to 0x7600000. You then must do the rebase in IDA one last time to make your idb in sync (or make a new idb after you rebase)
Edit->Segments->Rebase Program...
In the new menu check the boxes for Fix up Program and Rebase the whole image and it should be good to go.

This question was answered by Will Donohoe on 31-05-2013. The website at the time of access is https://will.io/blog/2013/05/31/disable-aslr/
As explained on the site, the problem arose (at least in my case) as a result of Address Space Layout Randomization (ASLR). ASLR is enabled when the DllCharacteristics field of the PE Optional Header contains the mask IMAGE_DLL_CHARACTERISTICS_DYNAMIC_BASE which has a value of 0x0040.
In my case the DllCharacteristics field was 0x8160 so clearly the 0x0040 mask was present.
The recurrent rebasing problem was corrected thus by removing the 0x0040 mask. Setting the DllCharacteristics field to 0x8120 or 0x8100 did the trick for me.
NB: The DllCharacteristics field can be located at an offset of 0x5E from the beginning of the PE Signature Offset when using a Hex Editor.

Related

How can I get a filesystem label from sysfs?

How can I get the label of a filesystem using /sys? I know I can get much of the info about a block device by going to /sys/class/block/<device>, e.g. /sys/class/block/sr1 for a cd that I know has the filesystem label config. I hunted through each item, found everything but the label.
I did dig through the lsblk source code, which, in turn, depends on calling udev_device_new_from_subsystem_sysname in libudev, so I went through that. It does appear to populate the property ID_FS_LABEL_ENC, but I cannot figure out where it takes it from in the tree, unless it is tracking it elsewhere?
I would just use libudev, but need to access outside of a C program.
I think that the problem here is that you seem to think that the label of a volume is a kernel thing, as is the size or the free space.
But AFAIK it is not, the kernel doesn't care at all about volume labels, it is just a thing that goes from the in-disk format to user-land: there is no kernel API to get that information. If you need it, you just open the raw binary volume and read the data from there.
But then, there is the big issue that every filesystem is different, so you need special code to manage every single partition type there is. Fortunately, somebody has done the hard work, and you have blkid, part of util-linux available in most Linux distributions. If you need it, you can call the program directly, or link to the library libblkid that does the hard work.
Naturally, to use blkid/libblkid you need read access to the block device, that is, root access. If you think that root access should not be needed to read a label, the people from udev think the same, and that is why there is a udev rule that copies the label when the filesystem is first dectected (running blkid of course). This is the ID_FS_LABEL_ENC you already know about.

how to get added content of a file since last modification

I'm working on a project in golang that needs to index recently added file content (using framework called bleve), and I'm looking for a solution to get content of a file since last modification. My current work-around is to record the last indexed position of each file, and during indexing process later on I only retrieve file content starting from the previous recorded position.
So I wonder if there's any library or built-in functionality for this? (doesn't need to be restricted to go, any language could work)
I'll really appreciate it if anyone has a better idea than my work-around as well!
Thanks
It depends on how the files change.
If the files are append-only, then you only need to record the last offset where you stopped indexing, and start from there.
If the changes can happen anywhere, and the changes are mostly replacing old bytes with new bytes (like changing pixels of an image), then perhaps you can consider computing checksum for small chucks, and only index those chunks that has different checksums.
You can check out crypto package in Go standard library for computing hashes.
If the changes are line insertion/deletion to text files (like changes to source code), then maybe a diff algorithm can help you find the differences. Something like https://github.com/octavore/delta.
If you're running in a Unix-like system, you could just use tail. If you specify to follow the file, the process will keep waiting after reaching end of file. You can invoke this in your program with os/exec and pipe the Stdout to your program. Your program can then read from it periodically or with blocking.
The only way I can think of to do this natively in Go is like how you described. There's also a library that tries to emulate tail in Go here: https://github.com/hpcloud/tail

what's the difference between switch_root and run_init?

What's the difference between switch_root and run_init, besides switch_root being made by busybox while run_init is from klibc?
Thanks very much
They both perform exactly the same function, which is to switch to the "real" root and execv(3) the "real" init(8) program from an initramfs. They both assume that the filesystem that should become the root has been mounted on some directory, which they take as an argument.
(An initramfs is a (usually) temporary in-memory filesystem loaded by the bootloader. Its purpose is to do any setup that might be required before mounting the real root and switching to the real init program.)
Recent source code for run-init can be found here. run_init() is the entry point (called from run-init.c, which parses the arguments).
Recent source code for switch_root can be found here. switch_root_main() is the entry point.
The code is short for both implementations (though a bit tricky), which makes it easy to compare them by eye. The only difference seems to be that they perform slightly different sanity checks, and that recent versions of run-init have an extra option to drop selected capabilities(7) before execv()'ing the new init.

File structure of EDB file

I have an offline .EDB file (exchange Database) that I want to pull information from such as the Computer name and the Flags etc. I have found the following offsets from http://www.edbsearch.com/edb.html which indicate that the Computer name etc comes from byte 0x24 0x10 However, looking at the following EDB file in 101 editor, the value appears to be non existent. It appears later on within the file, but not in a constant place.
Is there a constant byte that I can reliably pull the Computer name from the .EDB file ? I am working on backups from another computer, but all of the solutions that I have found are for Live versions of .EDB files - which are useless for myself as I have offline databases.
Many thanks,
With database replication (CCR in 2007, DAGs in 2010+), the concept of a computer name isn't that helpful. After a failover/switchover, what should the computer name be?
I don't think that the Computer Name is populated anymore. If eseutil.exe -mh doesn't report it, then it's not there.
Also check out JetGetDatabaseFileInfo. http://msdn.microsoft.com/en-us/library/windows/desktop/gg269239(v=exchg.10).aspx Note that the documentation is for esent.dll (Windows), and that ese.dll (Exchange) is not documented. While esent.dll and ese.dll are very similar, and for simple things (such as this) you can treat them similarly and get away with it, they are NOT identical, and you will sometimes come upon incompatibilities. In other words: Do it at your own risk, your mileage may vary, etc. etc. :)
-martin

Building a Control-flow Graph using results from Objdump

I'm attempting to build a control-flow graph of the assembly results that are returned via a call to objdump -d . Currently the best method I've come up with is to put each line of the result into a linked list, and separate out the memory address, opcode, and operands for each line. I'm separating them out by relying on the regular nature of objdump results (the memory address is from character 2 to character 7 in the string that represents each line) .
Once this is done I start the actual CFG instruction. Each node in the CFG holds a starting and ending memory address, a pointer to the previous basic block, and pointers to any child basic blocks. I'm then going through the objdump results and comparing the opcode against an array of all control-flow opcodes in x86_64. If the opcode is a control-flow one, I record the address as the end of the basic block, and depending on the opcode either add two child pointers (conditional opcode) or one (call or return ) .
I'm in the process of implementing this in C, and it seems like it will work but feels very tenuous. Does anyone have any suggestions, or anything that I'm not taking into account?
Thanks for taking the time to read this!
edit:
The idea is to use it to compare stack traces of system calls generated by DynamoRIO against the expected CFG for a target binary, I'm hoping that building it like this will facilitate that. I haven't re-used what's available because A) I hadn't really though about it and B) I need to get the graph into a usable data structure so I can do path comparisons. I'm going to take a look at some of the utilities on the page you lined to, thanks for pointing me in the right direction. Thanks for your comments, I really appreciate it!
You should use an IL that was designed for program analysis. There are a few.
The DynInst project (dyninst.org) has a lifter that can translate from ELF binaries into CFGs for functions/programs (or it did the last time I looked). DynInst is written in C++.
BinNavi uses the ouput from IDA (the Interactive Disassembler) to build an IL out of control flow graphs that IDA identifies. I would also recommend a copy of IDA, it will let you spot check CFGs visually. Once you have a program in BinNavi you can get its IL representation of a function/CFG.
Function pointers are just the start of your troubles for statically identifying the control flow graph. Jump tables (the kinds generated for switch case statements in certain cases, by hand in others) throw a wrench in as well. Every code analysis framework I know of deals with those in a very heuristics-heavy approach. Then you have exceptions and exception handling, and also self-modifying code.
Good luck! You're getting a lot of information out of the DynamoRIO trace already, I suggest you utilize as much information as you can from that trace...
I found your question since I was interested in looking for the same thing.
I found nothing and wrote a simple python script for this and threw it on github:
https://github.com/zestrada/playground/blob/master/objdump_cfg/objdump_to_cfg.py
Note that I have some heuristics to deal with functions that never return, the gcc stack protector on 32bit x86, etc... You may or may not want such things.
I treat indirect calls similar to how you do (basically have a node in the graph that is a source when returning from an indirect).
Hopefully this is helpful for anyone looking to do similar analysis with similar restrictions.
I was also facing a similar issue in the past and wrote asm2cfg tool for this purpose: https://github.com/Kazhuu/asm2cfg. Tool has support for GDB disassembly and objdump inputs and spits out CFG as a dot or pdf.
Hopefully someone finds this helpful!

Resources