I am wondering that in assembly (in my case Y86), is it possible to have an array inside of an array? And if it is, how would I access the elements inside of that array. I know you dereference arrays to get their elements, but that's only with one array in a stack. Is there a way to get an element inside of an array inside of an array.
Example because that's challenging to explain:
Normal grab of an element:
array1:
.long 0x0
.long 0x0
.long 0x0
.long 0x0
Main:
pushl %ebp
rrmovl %esp,%ebp
irmovl array1,%edx #store array1 on the stack
pushl %edx
mrmovl (%edx), %eax #get the first element of array1
rrmovl %ebp, %esp
popl %ebp
ret
Now say I have this:
array1:
.long 0x0
.long 0x0
.long 0x0
.long 0x0
array2:
.long array1
Am I able to access array2 element one and then access array1's elements?
The pushl %edx does not store the array to the stack, but the memory address of first element.
In your other example the first element of array2 is 32 bit integer value, which is equal to memory address of array1, so in C language terms the array2 is array of pointers.
When you fetch first element of array2 into some register, you have "pointer" (memory address) in it, and by fetching value from that address you will fetch first element of array1 (or you can modify it by some offset to fetch further elements).
This "array of pointers to arrays" pattern is often used when you have several arrays of same/similar type with different lengths and you want to store them continuously in memory, for example:
array0:
.long 1, 2, 3
array1:
.long 4
array2:
.long 5, 6, 7, 8
array3:
.long 9, 10
; notice the values in memory form a continuous block of 10 long values,
; so you can also access them through "array0" as single array of 10 values
mainArray:
.long array0, array1, array2, array3
Now if you want value "[2, 3]", i.e. the value "8", you can't simply multiply row value 2 by "column size" like in the matrix16x16 example, because rows don't have fixed length, so you will instead calculate offset into mainArray first, like (I will use x86 AT&T syntax, because I don't know Y86, but you should be able to get the idea, as they are basically the same instructions, just Y86 has more limited instruction set and you have more verbose and cryptic syntax with more prefix/suffix parts of instruction name):
; edi = 2 (row), esi = 3 (column)
movl mainArray(, %edi, 4), %ebx ; ebx = mainArray[2] (*4 because pointers are 32 bit)
; (in x86 AT&T syntax the target memory address is "edi*4 + mainArray")
; here ebx = array2 (and "array2" is symbolic name for memory address value)
; it is NOT whole array in single register, just the memory address of first element
movl (%ebx, %esi, 4), %eax ; eax = 8 (array2[3]) (again *4 because longs are used)
; (the target memory address is "ebx + esi*4")
Sorry for not using y86, but as I said, I don't know it... If you have hard time to decipher the x86 example, try to describe your difficulties in comment, I may try eventually to fix the syntax to y86, or maybe somebody else will suggest fixes...
Am I able to access array2 element one and then access array1's elements?
Yes, of course, those values are just ordinary 32 bit integers (memory addresses too, on your y86 platform), so you can of course fetch that address of sub-array from top array, and then fetch the value from that sub-array address to reach "value". Try to check in debugger, how the memory looks after defining your array, and how those values represent your original source code.
The assembly is sort of so simple and trivial, that it is quite difficult/tedious to write complex abstractions in it, but as long as we are talking about single instruction or memory access, expect the thing to be super simple. If you see some complexity there, you are probably misunderstanding what is happening under the hood, it's all just 0/1 bit values, and moving them around a bit (usually in common quantities like 8, 16, 32 or 64, for other group of bits you need often several instructions to get the desired result, while these above are natively supported as byte/short/long/...). The complexity comes from the how-to-write-that-algorithm with simple copy/plus/minus instructions only.
Related
all, I have a interesting question about memory alignment for array in C. My OS is 32 bit Ubuntu, I compile it with gcc -S -fno-stack-protector option.
Code:
char array1[5] = "aaaaa";
char array2[8];
array2[0] = 'b';
The assembly code:
pushl %ebp
move %esp, %ebp. # esp and ebp are pointing to the same words
subl $16, %esp # move esp to lower 16
movl $1633771873, -5(%ebp) # input "aaaa"
movb $97, -1(%ebp). # input 'a'
movb $98, -13(%ebp) # input 'b'
movl $0, %eax
leave
I have GDB to inspect the memory,
%ebp is efe8,
%esp is efd8,
&buf1 is efe3,
&buf2 is efdb.
In the GDB, I run x/4bd 0xbfffefd8, it shows
0xbfffefd8: 9 -124 4 98
if I run x/bd 0xbfffefd8, it shows
0xbfffefd8: 9
if I run x/bd 0xbfffefdb, it shows
0xbfffefd8: 98
So the memory looks like this
## high address ##
? efe8 <-- ebb
97 97 97 97 efe4
0 -80 -5 97(a) efe0
0 0 0 0 efdc
9 -124 4 98(b) efd8 <-- esp
^ ^
| |
efd8 efdb
Now my questions are:
why the character 'b'(98) is at efdb, while %esp is efd8? I think 'b' should also be at efd8, because it is the start of the 4-bytes word. Furthermore, if I keep filling more 'b' to buf2 which starts from efdb, it can only fill 5'b', not 8. How come? And what about the '\0'?
The same thing occurred to buf1, it starts from efe3, not efe0. What kind of alignment is this? It does not make sense to me.
From the assembly code, it doesn't show 16 alignment which I saw from other place, like this,
andl $-16, %esp # this aligns esp to 16 boundary
When will the andl command show and when not? It is very common so I expect to see it in every program.
From the assembly code above, I could not see the memory alignment. Is it alway true? My understanding is that the assembly code is just interpreting high level code (very readable) to not-very-readable code, but still converts the exact message, so char[5] is not interpreted to the way considering memory alignment. Then the memory alignment should occur in the running time. Am I right? But GDB debug shows exactly the same as assembly code. No alignment at all.
Thanks.
I see nothing wrong here. TLDR answer: char arrays are aligned to 1 byte, the compiler is right.
Digging a bit further. On my 64-bit machine, using GCC 7 with the -m32 option, I run and debugged the same code and I got the same results:
(gdb) x/4bd $esp+12
0xffffcdd4: 97 97 97 97
(gdb) x/4bd $esp+8
0xffffcdd0: 0 -48 -7 97
(gdb) x/4bd $esp+4
0xffffcdcc: 0 0 0 0
(gdb) x/4bd $esp+0
0xffffcdc8: 41 85 85 98
The addresses differ, of course and that's fine. Now, let me try to explain.
First, the $esp, is aligned at 4 byte, as expected:
(gdb) p $esp
$9 = (void *) 0xffffcdc8
So far, so good. Now, because we know that char arrays use 1 by default as alignment, let's try to figure out what happened at compile-time. First, the compiler saw array1[5] and put it on the stack, but because it was 5 bytes wide it had extend it to a 2nd dword. So, the first dword is full of 'a' while just 1 byte of the 2nd dword was used. Now, array2[8] is placed immediately after (or before, depending on how you look things) array1[5]. It extends on 3 dwords, ending on the dword pointed by $esp.
So, we have:
[esp + 0] <3 bytes of garbage /* no var */>, 'b' /* array2 */,
[esp + 4] 0x0, 0x0, 0x0, 0x0, /* still array2 */
[esp + 8] <3 bytes of garbage /* still array2 */>, 'a' /* array1 */,
[esp + 12] 'a', 'a', 'a', 'a', /* still array1 */.
If you add a char[2] array after array2 you'll see it using the same dword pointed by $esp and still have 1 byte of garbage from $esp to your array3[2].
The compiler is absolutely allowed to do that. If you want your char arrays to be aligned at 4-bytes (but you need a good reason for that!), you have to use special compiler attributes like:
__attribute__ ((aligned(4)))
Say we are given a function:
int exchange(int*xp, int y)
{
x = *xp;
*xp = y;
return x;
}
So, the book I am reading explains that xp is stored at offsets 8 and 12 relative to the address register %ebp. What I am not understanding is why they are stored as any kind of unit 8 and 12, further more: What is an offset in this context? Finally, how do 8 and 12 fit when the register accepts movement in units of 1 2 and 4 bytes respectively?
The assembly code :
xp at %ebp+8, y at%ebp+12
1 movl 8(%ebp), %edx (Get xp By copying to %eax below, x becomes the return value)
2 movl (%edx), %eax (Get x at xp)
3 movl 12(%ebp), %ecx (Get y)
4 movl %ecx, (%edx) (Store y at xp)
What I think the answer is:
So, when examining registries, it was common to see something like registry %rdi holding a value of 0x1004 which is an address and 0x1004 is in the address which holds a value 0xAA.
Of course, this is a hypothetical example that doesn't line up with the registries listed in the book. Each registry is 16-32 bit and the top four can be used to store integers freely. Does offsetting it by 8 make it akin to 0x1000 + 8? Again, I'm not entirely sure what the offset in this scenario is for when we are storing new units into empty space.
Because of how the call stack is structured when using C declaration.
First the caller will push the 4-byte y, then the 4-byte xp (this order is important so C can support Variadic Functions), then the call to your function will implicitly push the return address which is also 4-byte (this is a 32-bit program).
The first thing your function does is push the state of ebp which it will need to recover later so that the caller can continue working properly, and then copy the current state of esp (stack pointer) to ebp. In sum:
push %ebp
movl %esp, %ebp
This is also known as function prologue.
When all this is done you are finally ready to actually run the code you wrote, at this stage the stack is something like this:
%ebp- ? = address of your local variables (which in this example you don't have)
%ebp+ 0 = address of the saved state of previous ebp
%ebp+ 4 = ret address
%ebp+ 8 = address where is stored the value of xp
%ebp+12 = address where is stored the value of y
%ebp+16 = out of bonds, this memory space belongs to the caller
When your function is done it will wrap it up by setting esp back to ebp, then pop the original ebp and ret.
movl %ebp, %esp
pop %ebp
ret
ret is basically a shortcut to pop a pointer from the stack and jmp to it.
Edit: Fixed order of parameters for AT&T assembly
Look at the normal function entry in assembler:
push ebp
mov ebp, esp
sub esp, <size of local variables>
So ebp+4 holds the previous value of ebp. Before the old ebp was the return address, at ebp+8. Before that are the parameters of the function, in reverse order, so the first parameter is at ebp+12 and the second at ebp+8.
from what I understood the stack is used in a function to stock all the local variables that are declared.
I also understood that the bottom of the stack correspond to the largest address, and the top to the smallest ones.
So, let's say I have this C program:
#include <stdio.h>
#include <unistd.h>
int main(int argc, char *argv[]){
FILE *file1 = fopen("~/file.txt", "rt");
char buffer[10];
printf(argv[1]);
fclose(file1);
return 0;
}
Where would be pointer named "file1" in the stack compared to pointer named "buffer" ? would it be with upper in the stack (smaller address), or down (larger address) ?
Also, I know that printf() when giving format args (like %d, or %s) will read on the stack, but in this example where will it start to read ?
Wiki article:
http://en.wikipedia.org/wiki/Stack_(abstract_data_type)
The wiki article makes an analogy to a stack of objects, where the top of the stack is the only object you can see (peek) or remove (pop), and where you would add (push) another object onto.
For a typical implementation of a stack, the stack starts at some address and the address decreases as elements are pushed onto the stack. A push typically decrements the stack pointer before storing an element onto the stack, and a pop typically loads an element from the stack and increments the stack pointer after.
However, a stack could also grow upwards, where a push stores an element then increments the stack pointer after, and a pop would decrement the stack pointer before, then load an element from the stack. This is a common way to implement a software stack using an array, where the stack pointer could be a pointer or an index.
Back to the original question, there's no rule on the ordering of local variables on a stack. Typically the total size of all local variables is subtracted from the stack pointer, and the local variables are accessed as offsets from the stack pointer (or a register copy of the stack pointer, such as bp, ebp, or rbp in the case of a X86 processor).
The C language definition does not specify how objects are to be laid out in memory, nor does it specify how arguments are to be passed to functions (the words "stack" and "heap" don't appear anywhere in the language definition itself). That is entirely a function of the compiler and the underlying platform. The answer for x86 may be different from the answer for M68K which may be different from the answer for MIPS which may be different from the answer for SPARC which may be different from the answer for an embedded controller, etc.
All the language definition specifies is lifetime of objects (when storage for an object is allocated and how long it lasts) and the linkage and visibility of identifiers (linkage controls whether multiple instances of the same identifier refer to the same object, visibility controls whether that identifier is usable at a given point).
Having said all that, almost any desktop or server system you're likely to use will have a runtime stack. Also, C was initially developed on a system with a runtime stack, and much of its behavior certainly implies a stack model. A C compiler would be a bugger to implement on a system that didn't use a runtime stack.
I also understood that the bottom of the stack correspond to the largest address, and the top to the smallest ones.
That doesn't have to be true at all. The top of the stack is simply the place something was most recently pushed. Stack elements don't even have to be consecutive in memory (such as when using a linked-list implementation of a stack). On x86, the runtime stack grows "downwards" (towards decreasing addresses), but don't assume that's universal.
Where would be pointer named "file1" in the stack compared to pointer named "buffer" ? would it be with upper in the stack (smaller address), or down (larger address) ?
First, the compiler is not required to lay out distinct objects in memory in the same order that they were declared; it may re-order those objects to minimize padding and alignment issues (struct members must be laid out in the order declared, but there may be unused "padding" bytes between members).
Secondly, only file1 is a pointer. buffer is an array, so space will only be allocated for the array elements themselves - no space is set aside for any pointer.
Also, I know that printf() when giving format args (like %d, or %s) will read on the stack, but in this example where will it start to read ?
It may not read arguments from the stack at all. For example, Linux on x86-64 uses the System V AMD64 ABI calling convention, which passes the first six arguments via registers.
If you're really curious how things look on a particular platform, you need to a) read up on that platform's calling conventions, and b) look at the generated machine code. Most compilers have an option to output a machine code listing. For example, we can take your program and compile it as
gcc -S file.c
which creates a file named file.s containing the following (lightly edited) output:
.file "file.c"
.section .rodata
.LC0:
.string "rt"
.LC1:
.string "~/file.txt"
.text
.globl main
.type main, #function
main:
.LFB2:
pushq %rbp ;; save the current base (frame) pointer
.LCFI0:
movq %rsp, %rbp ;; make the stack pointer the new base pointer
.LCFI1:
subq $48, %rsp ;; allocate an additional 48 bytes on the stack
.LCFI2:
movl %edi, -36(%rbp) ;; since we use the contents of the %rdi(%edi) and %rsi(esi) registers
movq %rsi, -48(%rbp) ;; below, we need to preserve their contents on the stack frame before overwriting them
movl $.LC0, %esi ;; Write the *second* argument of fopen to esi
movl $.LC1, %edi ;; Write the *first* argument of fopen to edi
call fopen ;; arguments to fopen are passed via register, not the stack
movq %rax, -8(%rbp) ;; save the result of fopen to file1
movq $0, -32(%rbp) ;; zero out the elements of buffer (I added
movw $0, -24(%rbp) ;; an explicit initializer to your code)
movq -48(%rbp), %rax ;; copy the pointer value stored in argv to rax
addq $8, %rax ;; offset 8 bytes (giving us the address of argv[1])
movq (%rax), %rdi ;; copy the value rax points to to rdi
movl $0, %eax
call printf ;; like with fopen, arguments to printf are passed via register, not the stack
movq -8(%rbp), %rdi ;; copy file1 to rdi
call fclose ;; again, arguments are passed via register
movl $0, %eax
leave
ret
Now, this is for my specific platform, which is Linux (SLES-10) on x86-64. This does not apply to different hardware/OS combinations.
EDIT
Just realized that I left out some important stuff.
The notation N(reg) means offset N bytes from the address stored in register reg (basically, reg acts as a pointer). %rbp is the base (frame) pointer - it basically acts as the "handle" for the current stack frame. Local variables and function arguments (assuming they are present on the stack) are accessed by offsetting from the address stored in %rbp. On x86, local variables typically have a negative offset from %rbp, while function arguments have a positive offset.
The memory for file1 starts at -8(%rbp) (pointers on x86-64 are 64 bits wide, so we need 8 bytes to store it). That's fairly easy to determine based on the lines
call fopen
movq %rax, -8(%rbp)
On x86, function return values are written to %rax or %eax (%eax is the lower 32 bits of %rax). So the result of fopen is written to %rax, and we copy the contents of %rax to -8(%rbp).
The location for buffer is a little trickier to determine, since you don't do anything with it. I added an explicit initializer (char buffer[10] = {0};) just to generate some instructions that access it, and those are
movq $0, -32(%rbp)
movw $0, -24(%rbp)
From this, we can determine that buffer starts at -32(%rbp). There's 14 bytes of unused "padding" space between the end of buffer and the beginning of file1.
Again, this is how things play out on my specific system; you may see something different.
Very implementation dependent but still nearby. In faxt this is very crucial to setting up buffer overflow based attacks.
Consider your typical gcc compiler (C99 mode) and consider an array:
char array[2][4];
clearly the compiler would compile (I assume it's in the translation process?) the code to make the target machine (let's suppose it's an X86) work in the way that it "settles", or shall we say, "assigns" both addresses of array[0][0] and array[1][0] before accessing either one of them (I could be completely wrong). My problem is how does the compiler "know" this, since it is just a dumb program? Is it a simple recursion algorithm of some sort done amazingly right so we don't really have to care how many dimensions there ever will be (as in "oh, there's a bracket pair following the name "array"? I'll just translate it into an address, wait, there are 2? Address of an address then) or those people who designed the compiler specifically studied the situation and coded the compiler to tackle it?
If you were confused by my question, consider an one dimensional array arr[2].
I can get arr involved in all sorts of calculation knowing it's just an address, a "beginning address" so to speak. But for a 1D array, you only need one "beginning address" which is easily accomplished during compilation since the compiler will just translate that name (in this case, arr) into an uninitialized address (again, I could be completely wrong), but for a 2D array, the compiler need to deal with more than one address, how does it work?
And what would the assembly code look like?
A 2D array such as
int arr[3][2] = {{0, 1}, {2, 3}, {4, 5}};
is laid out in memory as:
0 1 2 3 4 5
because C is a row-major language.
This is the same layout as:
int arrflat[6] = { 0, 1, 2, 3, 4, 5 };
And you can access and manipulate both arrays using just their address (arr and arrflat, respectively).
However, when you access elements via arr[y][x] or arrflat[i] a translation occurs.
arrflat[i] becomes
arrflat+i
whereas arr[y][x] becomes
arr+(y*width+x)
and, in fact, you can do pointer mathematics on arr in this way.
A simple test program is:
#include <stdio.h>
int main(){
int arr[3][2] = {{0, 1}, {2, 3}, {4, 5}};
int arrflat[6] = { 0, 1, 2, 3, 4, 5 };
for(int y=0;y<3;y++)
for(int x=0;x<2;x++)
printf("%d\n",arr[y][x]);
for(int i=0;i<6;i++)
printf("%d\n",arrflat[i]);
}
Compile this to generate assembly with
gcc -g -Wa,-adhls test.c
The (abbreviated) output is:
9:test.c **** printf("%d\n",arr[y][x]);
49 .loc 1 9 0 discriminator 3
50 007d 8B45B8 movl -72(%rbp), %eax
51 0080 4898 cltq
52 0082 8B55B4 movl -76(%rbp), %edx
53 0085 4863D2 movslq %edx, %rdx
54 0088 4801D2 addq %rdx, %rdx
55 008b 4801D0 addq %rdx, %rax
56 008e 8B4485C0 movl -64(%rbp,%rax,4), %eax
57 0092 89C6 movl %eax, %esi
58 0094 BF000000 movl $.LC0, %edi
58 00
59 0099 B8000000 movl $0, %eax
59 00
60 009e E8000000 call printf
12:test.c **** printf("%d\n",arrflat[i]);
80 .loc 1 12 0 discriminator 3
81 00c0 8B45BC movl -68(%rbp), %eax
82 00c3 4898 cltq
83 00c5 8B4485E0 movl -32(%rbp,%rax,4), %eax
84 00c9 89C6 movl %eax, %esi
85 00cb BF000000 movl $.LC0, %edi
85 00
86 00d0 B8000000 movl $0, %eax
86 00
87 00d5 E8000000 call printf
Eliminating common code between the two calls to printf and annotating gives:
9:test.c **** printf("%d\n",arr[y][x]);
49 .loc 1 9 0 discriminator 3
50 007d 8B45B8 movl -72(%rbp), %eax #Load address -72 bytes from the memory pointed to by %rbp
51 0080 4898 cltq #Turn this into a 64-bit integer address (where is `arr`?)
52 0082 8B55B4 movl -76(%rbp), %edx #Load address -76 bytes from the memory pointed to by %rbp
53 0085 4863D2 movslq %edx, %rdx #Turn %edx into a signed 64-bit offset
54 0088 4801D2 addq %rdx, %rdx #Add rdx to itself
55 008b 4801D0 addq %rdx, %rax #Add offset to the address
56 008e 8B4485C0 movl -64(%rbp,%rax,4), %eax #Load *(rbp - 4 + (rax * 4)) into eax (get arr[y][x])
12:test.c **** printf("%d\n",arrflat[i]);
80 .loc 1 12 0 discriminator 3
81 00c0 8B45BC movl -68(%rbp), %eax #Load address -62 bytes from the memory pointed to by %rbp
82 00c3 4898 cltq #Convert this into a 64-bit integer address (where is `arrflat`?)
83 00c5 8B4485E0 movl -32(%rbp,%rax,4), %eax #Load *(rbp - 4 + (rax * 4)) into eax (get arrflat[i])
What you need to understand is linearization (or serialization if you prefer). Machine memory is usually flat 1D space, and this is largely sufficient provided that what you need can be embedded into it. In your case, for example, a 2D array is just a way to interpret a 1D array sequencing.
If you have a 2x3 square array, this is reducible to a 6 linear array: map [0][0] to [0], [0][1] to [1], [O][2] to [2], [1][0] to [3], [1][1] to [4] and [1][2] to [5]. This mapping is really obvious as it maps [x][y] to [x*3+y] (remember that 3 is the size of the second dimension of you 2x3 2D array). In general for any 2D array of size NxM the mapping is x*M+y. Now for greater dimension, it works as well... So you just need the beginning address to store any kind of object and computes the right offset from it.
C does not have multidimensional arrays. Multidimensional arrays are simulated by declaring arrays of arrays, which is what the declaration char a[2][2] is doing.
An array doesn't have multiple addresses; it has just one base address. The addresses of the elements are calculated by displacement ("pointer arithmetic"). C in fact defines the A[B] notation to be equivalent to *(A + B) in every way including the commutativity implied by the + operator.
Apparent multi-dimensional references are just cascaded application of the above: A[B][C] is just *(A[B] + C) which is *(*(A + B) + C). Here A[B] is expected to designate an array (or a pointer). If the subexpression A[B] refers to an array, then the value it produces is a pointer to the first element. The + C then simply performs another round of pointer arithmetic to displace relative to that element; i.e., calculating an offset within the second-level array, completing the illusion that A[B][C] is a multi-dimensional array referencing notation.
The address of an array isn't stored (as a pointer). If an array is defined at file scope, then its storage is associated with an internal or external symbol. If an array is defined as a nonstatic local variable, the compiler will typically arrange for a region of storage on the stack corresponding to that array's size and the generated code will refer to its base address using a fixed offset from the stack pointer. An array defined inside a struct is just a reserved block of space inside the struct, with a fixed offset from the base address of the struct.
Lexical analysis is the process you're looking for. It turns a sequence of characters into tokens, a 2D array is one of those tokens.
In the case of an array the compiler will know to put aside that much memory for each token.
While writing some C code, I decided to compile it to assembly and read it--I just sort of, do this from time to time--sort of an exercise to keep me thinking about what the machine is doing every time I write a statement in C.
Anyways, I wrote these two lines in C
asm(";move old_string[i] to new_string[x]");
new_string[x] = old_string[i];
asm(";shift old_string[i+1] into new_string[x]");
new_string[x] |= old_string[i + 1] << 8;
(old_string is an array of char, and new_string is an array of unsigned short, so given two chars, 42 and 43, this will put 4342 into new_string[x])
Which produced the following output:
#move old_string[i] to new_string[x]
movl -20(%ebp), %esi #put address of first char of old_string in esi
movsbw (%edi,%esi),%dx #put first char into dx
movw %dx, (%ecx,%ebx,2) #put first char into new_string
#shift old_string[i+1] into new_string[x]
movsbl 1(%esi,%edi),%eax #put old_string[i+1] into eax
sall $8, %eax #shift it left by 8 bits
orl %edx, %eax #or edx into it
movw %ax, (%ecx,%ebx,2) #?
(I'm commenting it myself, so I can follow what's going on).
I compiled it with -O3, so I could also sort of see how the compiler optimizes certain constructs. Anyways, I'm sure this is probably simple, but here's what I don't get:
the first section copies a char out of old_string[i], and then movw's it (from dx) to (%ecx,%ebx). Then the next section, copies old_string[i+1], shifts it, ors it, and then puts it into the same place from ax. It puts two 16 bit values into the same place? Wouldn't this not work?
Also, it shifts old_string[i+1] to the high-order dword of eax, then ors edx (new_string[x]) into it... then puts ax into the memory! Wouldn't ax just contain what was already in new_string[x]? so it saves the same thing to the same place in memory twice?
Is there something I'm missing? Also, I'm fairly certain that the rest of the compiled program isn't relevant to this snippet... I've read around before and after, to find where each array and different variables are stored, and what the registers' values would be upon reaching that code--I think that this is the only piece of the assembly that matters for these lines of C.
--
oh, turns out GNU assembly comments are started with a #.
Okay, so it was pretty simple after all.
I figured it out with a pen and paper, writing down each step, what it did to each register, and then wrote down the contents of each register given an initial starting value...
What got me was that it was using 32 bit and 16 bit registers for 16 and 8 bit data types...
This is what I thought was happening:
first value put into memory as, say, 0001 (I was thinking 01).
second value (02) loaded into 32 bit register (so it was like, 00000002, I was thinking, 0002)
second value shifted left 8 bits (00000200, I was thinking, 0200)
first value (0000001, I thought 0001) xor'd into second value (00000201, I thought 0201)
16 bit register put into memory (0201, I was thinking, just 01 again).
I didn't get why it wrote it to memory twice though, or why it was using 32 bit registers (well, actually, my guess is that a 32 bit processor is way faster at working with 32 bit values than it is with 8 and 16 bit values, but that's a totally uneducated guess), so I tried rewriting it:
movl -20(%ebp), %esi #gets pointer to old_string
movsbw (%edi,%esi),%dx #old_string[i] -> dx (0001)
movsbw 1(%edi,%esi),%ax #old_string[i + 1] -> ax (0002)
salw $8, %ax #shift ax left (0200)
orw %dx, %ax #or dx into ax (0201)
movw %ax,(%ecx,%ebx,2) #doesn't write to memory until end
This worked exactly the same.
I don't know if this is an optimization or not (aside from taking one memory write out, which obviously is), but if it is, I know it's not really worth it and didn't gain me anything. In any case, I get what this code is doing now, thanks for the help all.
I'm not sure what's not to understand, unless I'm missing something.
The first 3 instructions load a byte from old_string into dx and stores that to your new_string.
The next 3 instructions utilize what's already in dx and combines old_string[i+1] with it, and stores it as a 16-bit value (ax) to new_string.
Also, it shifts old_string[i+1] to the high-order dword of eax, then
ors edx (new_string[x]) into it... then puts ax into the memory! Wouldn't
ax just contain what was already in new_string[x]? so it saves the same
thing to the same place in memory twice?
Now you see why optimizers are a Good Thing. That kind of redundant code shows up pretty often in unoptimized, generated code, because the generated code comes more or less from templates that don't "know" what happened before or after.