Code
#define TEXT_LEN 256
#define NUM_NUMBERS (2*65536)
int numNumbers = NUM_NUMBERS;
PuTTY command for global variable numNumbers
objdump -s -j .data assign1-0
Output of command
602070 00000000 00000200
Hello,
Can someone help me understand this output or if I put the wrong command?
Im trying to find global variable numNumbers using objdump.
But im pretty sure the output should be 00020000 because numNumbers is 131072 (2*65536) but it's coming out 00000200 which is 512 from hexadecimal to decimal.
Am I reading it wrong and the output is correct or is the command wrong to find a global variable?
You are probably on a little endian computer, and so the bytes that make up your int are not in the order in which you're used reading digits or bits as a human. Familiarize yourself with the concept of endianness.
Related
I recently started learning assembly language for the Intel x86-64 architecture using YASM. While solving one of the tasks suggested in a book (by Ray Seyfarth) I came to following problem:
When I place some characters into a buffer in the .bss section, I still see an empty string while debugging it in gdb. Placing characters into a buffer in the .data section shows up as expected in gdb.
segment .bss
result resb 75
buf resw 100
usage resq 1
segment .data
str_test db 0, 0, 0, 0
segment .text
global main
main:
mov rbx, 'A'
mov [buf], rbx ; LINE - 1 STILL GET EMPTY STRING AFTER THAT INSTRUCTION
mov [str_test], rbx ; LINE - 2 PLACES CHARACTER NICELY.
ret
In gdb I get:
after LINE 1: x/s &buf, result - 0x7ffff7dd2740 <buf>: ""
after LINE 2: x/s &str_test, result - 0x601030: "A"
It looks like &buf isn't evaluating to the correct address, so it still sees all-zeros. 0x7ffff7dd2740 isn't in the BSS of the process being debugged, according to its /proc/PID/maps, so that makes no sense. Why does &buf evaluate to the wrong address, but &str_test evaluates to the right address? Neither are "global" symbols, but we did build with debug info.
Tested with GNU gdb (Ubuntu 7.10-1ubuntu2) 7.10 on x86-64 Ubuntu 15.10.
I'm building with
yasm -felf64 -Worphan-labels -gdwarf2 buf-test.asm
gcc -g buf-test.o -o buf-test
nm on the executable shows the correct symbol addresses:
$ nm -n buf-test # numeric sort, heavily edited to omit symbols from glibc
...
0000000000601028 D __data_start
0000000000601038 d str_test
...
000000000060103c B __bss_start
0000000000601040 b result
000000000060108b b buf
0000000000601153 b usage
(editor's note: I rewrote a lot of the question because the weirdness is in gdb's behaviour, not the OP's asm!).
glibc includes a symbol named buf, as well.
(gdb) info variables ^buf$
All variables matching regular expression "^buf$":
File strerror.c:
static char *buf;
Non-debugging symbols:
0x000000000060108b buf <-- this is our buf
0x00007ffff7dd6400 buf <-- this is glibc's buf
gdb happens to choose the symbol from glibc over the symbol from the executable. This is why ptype buf shows char *.
Using a different name for the buffer avoids the problem, and so does a global buf to make it a global symbol. You also wouldn't have a problem if you wrote a stand-alone program that didn't link libc (i.e. define _start and make an exit system call instead of running a ret)
Note that 0x00007ffff7dd6400 (address of buf on my system; different from yours) is not actually a stack address. It visually looks like a stack address, but it's not: it has a different number of f digits after the 7. Sorry for that confusion in comments and an earlier edit of the question.
Shared libraries are also loaded near the top of the low 47 bits of virtual address space, near where the stack is mapped. They're position-independent, but a library's BSS space has to be in the right place relative to its code. Checking /proc/PID/maps again more carefully, gdb's &buf is in fact in the rwx block of anonymous memory (not mapped to any file) right next to the mapping for libc-2.21.so.
7ffff7a0f000-7ffff7bcf000 r-xp 00000000 09:7f 17031175 /lib/x86_64-linux-gnu/libc-2.21.so
7ffff7bcf000-7ffff7dcf000 ---p 001c0000 09:7f 17031175 /lib/x86_64-linux-gnu/libc-2.21.so
7ffff7dcf000-7ffff7dd3000 r-xp 001c0000 09:7f 17031175 /lib/x86_64-linux-gnu/libc-2.21.so
7ffff7dd3000-7ffff7dd5000 rwxp 001c4000 09:7f 17031175 /lib/x86_64-linux-gnu/libc-2.21.so
7ffff7dd5000-7ffff7dd9000 rwxp 00000000 00:00 0 <--- &buf is in this mapping
...
7ffffffdd000-7ffffffff000 rwxp 00000000 00:00 0 [stack] <---- more FFs before the first non-FF than in &buf.
A normal call instruction with a rel32 encoding can't reach a library function, but it doesn't need to because GNU/Linux shared libraries have to support symbol interposition, so calls to library functions actually jump to the PLT, where an indirect jmp (with a pointer from the GOT) goes to the final destination.
I'm on an Ubuntu 18.04 laptop coding C with VSCode and compiling it with GNU's gcc.
I'm doing some basic engineering on my own C code and I noticed a few interesting details, on of which is the pair []A\A]A^A_ and ;*3$" that seems to appear in every one of my compiled C binaries. Between them is usually (or always) strings that I hard code in for printf() functions.
An example is this short piece of code here:
#include <stdio.h>
#include <stdbool.h>
int f(int i);
int main()
{
int x = 5;
int o = f(x);
printf("The factorial of %d is: %d\n", x, o);
return 0;
}
int f(int i)
{
if(i == 0)
{
return i;
}
else
{
return i*f(i-1);
}
}
... is then compiled using gcc test.c -o test.
When I run strings test, the following is outputted:
/lib64/ld-linux-x86-64.so.2
0HSn(
libc.so.6
printf
__cxa_finalize
__libc_start_main
GLIBC_2.2.5
_ITM_deregisterTMCloneTable
__gmon_start__
_ITM_registerTMCloneTable
AWAVI
AUATL
[]A\A]A^A_
The factorial of %d is: %d
;*3$"
GCC: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
crtstuff.c
deregister_tm_clones
__do_global_dtors_aux
completed.7697
__do_global_dtors_aux_fini_array_entry
frame_dummy
__frame_dummy_init_array_entry
test.c
__FRAME_END__
__init_array_end
_DYNAMIC
__init_array_start
__GNU_EH_FRAME_HDR
_GLOBAL_OFFSET_TABLE_
__libc_csu_fini
_ITM_deregisterTMCloneTable
_edata
printf##GLIBC_2.2.5
__libc_start_main##GLIBC_2.2.5
__data_start
__gmon_start__
__dso_handle
_IO_stdin_used
__libc_csu_init
__bss_start
main
__TMC_END__
_ITM_registerTMCloneTable
__cxa_finalize##GLIBC_2.2.5
.symtab
.strtab
.shstrtab
.interp
.note.ABI-tag
.note.gnu.build-id
.gnu.hash
.dynsym
.dynstr
.gnu.version
.gnu.version_r
.rela.dyn
.rela.plt
.init
.plt.got
.text
.fini
.rodata
.eh_frame_hdr
.eh_frame
.init_array
.fini_array
.dynamic
.data
.bss
.comment
Same as other scripts I've written, the 2 pieces []A\A]A^A_ and ;*3$" always pop up, 1 before the strings used with printf and one right after.
I'm curious: What exactly do those strings mean? I'm guessing they mainly mark the begining and endding of the use of hard-coded output strings.
Our digital computers work on bits, most commonly clustered in bytes containing 8 bits each. The meaning of such a combination depends on the context and the interpretation.
A non-exhausting list of possible interpretation is:
ASCII characters with the eighth bit ignored or accepted only if 0;
signed or unsigned 8-bit integer;
operation code (or part of it) of one specific machine language, each processor (family) has its own different set.
For example, the hex value 0x43 can be seen as:
ASCII character 'C';
Unsigned 8-bit integer 67 (signed is the same if 2's complement is used);
Operation code "LD B,E" for a Z80 CPU (see, I'm really old and learned that processor in depth);
Operation code "EORS ari" for an ARM CPU.
Now strings simply (not to say "primitively") scans through the given file and tries so interpret the bytes as sequences of printable ASCII characters. By default a sequence has to have at least 4 characters and the bytes are interpreted as 7-bit ASCII. BTW, the file does not have to be an executable. You can scan any file but if you give it an object file by default it scans only sections that are loaded in memory.
So what you see are sequences of bytes which by chance are at least 4 printable characters in a row. And because some patterns are always in an executable it just looks as if they have a special meaning. Actually they have but they don't have to relate to your program's strings.
You can use strings to quickly peek into a file to find, well, strings which might help you with whatever you're trying to accomplish.
What you're seeing is an ASCII representation of a particular bit pattern that happens to be common in executable programs generated by that particular compiler. The pattern might correspond to a particular sequence of machine language instructions which the compiler is fond of emitting. Or it might correspond to a particular data structure which the compiler or linker uses to mark the various other pieces of data stored in the executable.
Given enough work, it would probably be possible to work out the actual details, for your C code and your particular version of your particular compiler, precisely what the bit patterns behind []A\A]A^A_ and ;*3$" correspond to. But I don't do much machine-language programming any more, so I'm not going to try, and the answers probably wouldn't be too interesting in the end, anyway.
But it reminds me of little quirk which I have noticed and can explain. Suppose you wrote the very simple program
int i = 12345;
If you compiled that program and ran strings on it, and if you told it to look for strings as short as two characters, you'd probably see (among lots of other short, meaningless strings), the string
90
and that bit pattern would, in fact, correspond to your variable! What's up with that?
Well, 12345 in hexadecimal is 0x3039, and most machines these days are little-endian, so those two bytes in memory are stored in the other order as
39 30
and in ASCII, 0x39 is '9', while 0x30 is '0'.
And if this is interesting to you, you can try compiling the program fragment
int i = 12345;
long int a = 1936287860;
long int b = 1629516649;
long int c = 1953719668;
long long int x = 48857072035144;
long long int y = 36715199885175;
and running strings -2 on it, and see what else you get.
I wrote the simple C program (test.c) below:-
#include<stdio.h>
int main()
{
return 0;
}
and executed the follwing to understand size changes in .bss segment.
gcc test.c -o test
size test
The output came out as:-
text data bss dec hex filename
1115 552 8 1675 68b test
I didn't declare anything globally or of static scope. So please explain why the bss segment size is of 8 bytes.
I made the following change:-
#include<stdio.h>
int x; //declared global variable
int main()
{
return 0;
}
But to my surprise, the output was same as previous:-
text data bss dec hex filename
1115 552 8 1675 68b test
Please explain.
I then initialized the global:-
#include<stdio.h>
int x=67; //initialized global variable
int main()
{
return 0;
}
The data segment size increased as expected, but I didn't expect the size of bss segment to reduce to 4 (on the contrary to 8 when nothing was declared). Please explain.
text data bss dec hex filename
1115 556 4 1675 68b test
I also tried the comands objdump, and nm, but they too showed variable x occupying .bss (in 2nd case). However, no change in bss size is shown upon size command.
I followed the procedure according to:
http://codingfox.com/10-7-memory-segments-code-data-bss/
where the outputs are coming perfectly as expected.
When you compile a simple main program you are also linking startup code.
This code is responsible, among other things, to init bss.
That code is the code that "uses" 8 bytes you are seeing in .bss section.
You can strip that code using -nostartfiles gcc option:
-nostartfiles
Do not use the standard system startup files when linking. The standard system libraries are used normally, unless -nostdlib or -nodefaultlibs is used
To make a test use the following code
#include<stdio.h>
int _start()
{
return 0;
}
and compile it with
gcc -nostartfiles test.c
Youll see .bss set to 0
text data bss dec hex filename
206 224 0 430 1ae test
Your first two snippets are identical since you aren't using the variable x.
Try this
#include<stdio.h>
volatile int x;
int main()
{
x = 1;
return 0;
}
and you should see a change in .bss size.
Please note that those 4/8 bytes are something inside the start-up code. What it is and why it varies in size isn't possible to tell without digging into all the details of mentioned start-up code.
I am playing with the Unix hexdump utility. My input file is UTF-8 encoded, containing a single character ñ, which is C3 B1 in hexadecimal UTF-8.
hexdump test.txt
0000000 b1c3
0000002
Huh? This shows B1 C3 - the inverse of what I expected! Can someone explain?
For getting the expected output I do:
hexdump -C test.txt
00000000 c3 b1 |..|
00000002
I was thinking I understood encoding systems.
This is because hexdump defaults to using 16-bit words and you are running on a little-endian architecture. The byte sequence b1 c3 is thus interpreted as the hex word c3b1. The -C option forces hexdump to work with bytes instead of words.
I found two ways to avoid that:
hexdump -C file
or
od -tx1 < file
I think it is stupid that hexdump decided that files are usually 16bit word little endian. Very confusing IMO.
I am trying to insert a md5 hash of part of my binary into the binary, for keeping track of MCU FW version.
I have approached it like this:
in the link script I have split the flash in two sections
MEMORY
{
FLASH0 (rx) : ORIGIN = 0x8000000, LENGTH = 64K - 16
FLASH1 (r) : ORIGIN = 0x800FFF0, LENGTH = 16
RAM (xrw) : ORIGIN = 0x20000000, LENGTH = 8K
}
Then I have specified a output section like so:
.fw_version :
{
KEEP(*(.fw_version))
} >FLASH1
Next I have my firmware_version.c file containing only:
#define FW_VERSION_SIZE 16
const unsigned char FW_VERSION[FW_VERSION_SIZE]
__attribute__((section(".fw_version"), used)) = {0};
Then after the binary is compiled and objcopy has been used to create a .bin file I have a 65536 B large file, I split that file at 65520 bytes, do a md5 checksum of the first part and insert that into the second part (16 B). Lastly I do cat parta partb > final.bin.
When i examine this binary with hexdump I can see that the md5 checksum is indeed at the end.
Using objdump -h I get:
...
8 .fw_version 00000010 0800fff0 0800fff0 00017ff0 2**2
...
and objdump -t gives:
...
0800fff0 g O .fw_version 00000010 FW_VERSION
...
I thought that this meant that I could just use FW_VERSION[i] to get part i of the md5 checksum from within the mcu fw but when I examine the memory in gdb I get that it's all zeroed out like it was never changed.
What am I missing here?
[edit] the device is a stm32f030c8t6 arm cortex m0 programmed through gdb.
Like I commented under the question I found that the (one) reason for it not working was that while I was manipulating the .bin file while I loaded the .elf file when programming with gdb.
It should (could) have worked if I used a programmer or bootloader to download the .bin file to the target.
I found a better (I think) way of doing it though.
Compile all the sources in the project to .o files.
cat *.o > /tmp/tmp.something_unique. I used $(shell mktemp) in the Makefile
openssl dgst -md5 -binary /tmp/tmp.something_unique > version_file
objcopy -I binary -O elf32-littlearm -B arm version_file v_file.o
linkscript has a section .fw_version : { KEEP(v_file.o(.data)) } >FLASH1
link application
in application get the address of the version number by doing extern unsigned char _binary_version_file_start; uint8_t *FW_VERSION = &_binary_version_file_start; const size_t FW_VERSION_SIZE = (size_t) &_binary_version_file_size;. Note that the uses of & are correct.
This will result in the checksum being taken over all the objects that are compiled from source and then this checksum is linked into the binary that is flashed in the target.