seing a static variable with hexdump - c

I am preparing myself for a lecture exam on security aspects of software development. I would like to know if it is always possible to read the value of a static char array from a binary with hexdump?
If not on what factors does it depend whether I can read the value of it or not with a hexeditor??
thanks,

If you can locate the variable in the memory, you can read it with a hexdump - that's what hexdump programs are for. How easy it is to locate depends on how much information you have about the binary and on what you know about its expected contents.

Assuming C, yes, in the simple case. However, there are methods to obfuscate such variables to limit reverse engineering.

Yes, but only if it is initialized at compile time. You could get more from a core dump or a debugger.

Related

What does it mean to dump an array?

I am unfamiliar with this concept of dumping an array. Does it mean to remove all contents of an array? Also this concept of dumping. What does it imply?
The term "dump" in computing is an old one, going back at least to the 1970's and probably a lot further than that.
Classically, it meant to output the values of something in an uninterpreted form (e.g. octal or hexadecimal) that can then be used when diagnosing a problem. (So for example, I recall reading computer printouts with "register dumps" and "core dumps" to try to figure why my CDC 6400 programs had crashed ... when I was an undergraduate in 1970-something.)
Without seeing the context, it sounds like "array dumping" is the same idea; i.e. output / display the array contents to see what is in it, for diagnostic purposes.
References:
Wikipedia - http://en.wikipedia.org/wiki/Core_dump
Whatis.com - http://whatis.techtarget.com/definition/core-dump
StackOverflow - https://stackoverflow.com/tags/coredump/info

Best way of creating large bit arrays in Lua

I want to read a large binary file (1MB in size) into memory using Lua. The target device is mobile so I very much want to minimise the memory footprint.
From a quick look online it seems that Lua tables will use 16B for each sequential integer index (key) plus the space to store the value, which as I am storing binary data will hopefully only use 2 bits but lets just say 1 byte.
For 1e6 records that will be 1e6*17 =~ 17MB - which is huge!
From my brief reading it seems that I can use userdata to implement anything I want in C. I have not used C before but it seems that it would use
1b * 1e6 = 125kB
Shall i do this or have I got something very wrong / is there an easier way to do this.
Any advice or even name-calling for crappy calculations very much welcome :)
EDIT: Some interesting answers below about storing the data in a String (thanks!) and using bitwise ops. I just came accross an example in the PIL book (3rd edition pg293) that compares storing arrays of booleans in C so they use 3% of the mem. While this is cool and useful it may be overkill for me as the solutions below suggest I can fit in 1MB which is fine for me.
EDIT: Came across this C blob impl
EDIT: Solution - I read the file contents into a String as suggested and as Im using 5.1 had to use a 3rd party bit op lib - I went with a pure Lua implementation LuaBit. Thanks everyone!!
You can store a big blob in Lua string, it will work with any binary data. Now the question is what you want to do with the data. In any way, you can use string.byte to extract any individual byte, and use Lua's bit32 library to get down to bits. (For Lua 5.1 and older, you'll either have to write your own C routines, or use third party package.)
You can store the data in a string and manipulate it with the string library and Lua BitOp
Lua5.2's built-in bit32 library is preferred if available.
If you want to read 1 MB into memory, you won't end up with 250 kB...
If you read the file into a Lua string, you end up with 1 MB, as Lua strings are just 8-bit clean bytes.
After that, you can process the data according its structure, using the perhaps the struct library.

Readable text in dissassembled code

is there any widely used procedure for hiding readable strings? After debugging my code i found a lot of plain text. I can use some simple encryption (Caesar cipher etc...) but this solution will totally slow down my code. Any ideas? Thanks for help
No, there is no widely used method for hiding referenced strings.
At some point an accessed string would have to be decrypted and this would reveal the key/method and your decryption becomes just obfuscation. If somebody wants to read all your referenced strings he could easily write some script to just convert them all to be readable.
I can't think of any reason to obfuscate strings like that. They are only visible to someone that analyses your executable. Those people would at the same time also be capable to reverse engineer your deobfuscation an apply it to all strings.
If secrecy of strings is vital to the security of your application, you have to rethink that.
Sidenote: There is no way that deciphering strings in C will slow down your application ...Except your application is full of strings and you do something very inefficient in the deciphering. Have you tested this?

How Debuggers Find Expressions From Code Lines

A debugger gets a line number of an expression and translates it into an program address, what does the implementation look like? I want to implement this in a program I'm writing and the most promising library I've found to accomplish this is libbfd. All I would need is the address of the expression, and I can wait for it with ptrace(2). I can imagine that the debugger looks for the function name from the C file within the executable, but after that I'm lost.
Does anyone know? I don't need a code example, just enough info so that I can get an idea.
And I don't mind architecture-specific answers, the only ones I really care about are Arm and x86-64.
You should take a look at the DWARF2 format to try to understand how the mapping is done. Do consider how DWARF2 is vast and complex. It's not for everyone, but reading about it might satisfy your curiosity faster and more easily than reading the source for GCC/GDB.

Building a Control-flow Graph using results from Objdump

I'm attempting to build a control-flow graph of the assembly results that are returned via a call to objdump -d . Currently the best method I've come up with is to put each line of the result into a linked list, and separate out the memory address, opcode, and operands for each line. I'm separating them out by relying on the regular nature of objdump results (the memory address is from character 2 to character 7 in the string that represents each line) .
Once this is done I start the actual CFG instruction. Each node in the CFG holds a starting and ending memory address, a pointer to the previous basic block, and pointers to any child basic blocks. I'm then going through the objdump results and comparing the opcode against an array of all control-flow opcodes in x86_64. If the opcode is a control-flow one, I record the address as the end of the basic block, and depending on the opcode either add two child pointers (conditional opcode) or one (call or return ) .
I'm in the process of implementing this in C, and it seems like it will work but feels very tenuous. Does anyone have any suggestions, or anything that I'm not taking into account?
Thanks for taking the time to read this!
edit:
The idea is to use it to compare stack traces of system calls generated by DynamoRIO against the expected CFG for a target binary, I'm hoping that building it like this will facilitate that. I haven't re-used what's available because A) I hadn't really though about it and B) I need to get the graph into a usable data structure so I can do path comparisons. I'm going to take a look at some of the utilities on the page you lined to, thanks for pointing me in the right direction. Thanks for your comments, I really appreciate it!
You should use an IL that was designed for program analysis. There are a few.
The DynInst project (dyninst.org) has a lifter that can translate from ELF binaries into CFGs for functions/programs (or it did the last time I looked). DynInst is written in C++.
BinNavi uses the ouput from IDA (the Interactive Disassembler) to build an IL out of control flow graphs that IDA identifies. I would also recommend a copy of IDA, it will let you spot check CFGs visually. Once you have a program in BinNavi you can get its IL representation of a function/CFG.
Function pointers are just the start of your troubles for statically identifying the control flow graph. Jump tables (the kinds generated for switch case statements in certain cases, by hand in others) throw a wrench in as well. Every code analysis framework I know of deals with those in a very heuristics-heavy approach. Then you have exceptions and exception handling, and also self-modifying code.
Good luck! You're getting a lot of information out of the DynamoRIO trace already, I suggest you utilize as much information as you can from that trace...
I found your question since I was interested in looking for the same thing.
I found nothing and wrote a simple python script for this and threw it on github:
https://github.com/zestrada/playground/blob/master/objdump_cfg/objdump_to_cfg.py
Note that I have some heuristics to deal with functions that never return, the gcc stack protector on 32bit x86, etc... You may or may not want such things.
I treat indirect calls similar to how you do (basically have a node in the graph that is a source when returning from an indirect).
Hopefully this is helpful for anyone looking to do similar analysis with similar restrictions.
I was also facing a similar issue in the past and wrote asm2cfg tool for this purpose: https://github.com/Kazhuu/asm2cfg. Tool has support for GDB disassembly and objdump inputs and spits out CFG as a dot or pdf.
Hopefully someone finds this helpful!

Resources