Sorting ARM Assembly

Sorting ARM Assembly - c

I am newbie. I have difficulties with understanding memory ARM memory map.
I have found example of simple sorting algorithm
AREA ARM, CODE, READONLY
CODE32
PRESERVE8
EXPORT __sortc
; r0 = &arr[0]
; r1 = length
__sortc
stmfd sp!, {r2-r9, lr}
mov r4, r1 ; inner loop counter
mov r3, r4
sub r1, r1, #1
mov r9, r1 ; outer loop counter
outer_loop
mov r5, r0
mov r4, r3
inner_loop
ldr r6, [r5], #4
ldr r7, [r5]
cmp r7, r6
; swap without swp
strls r6, [r5]
strls r7, [r5, #-4]
subs r4, r4, #1
bne inner_loop
subs r9, r9, #1
bne outer_loop
ldmfd sp!, {r2-r9, pc}^
END
And this assembly should be called this way from C code
#define MAX_ELEMENTS 10
extern void __sortc(int *, int);
int main()
{
int arr[MAX_ELEMENTS] = {5, 4, 1, 3, 2, 12, 55, 64, 77, 10};
__sortc(arr, MAX_ELEMENTS);
return 0;
}
As far as I understand this code creates array of integers on the stack and calls _sortc function which implemented in assembly. This function takes this values from the stack and sorts them and put back on the stack. Am I right ?
I wonder how can I implement this example using only assembly.
For example defining array of integers
DCD 3, 7, 2, 8, 5, 7, 2, 6
BTW Where DCD declared variables are stored in the memory ??
How can I operate with values declared in this way ? Please explain how can I implement this using assembly only without any C code, even without stack, just with raw data.
I am writing for ARM7TDMI architecture

AREA ARM, CODE, READONLY - this marks start of section for code in the source.
With similar AREA myData, DATA, READWRITE you can start section where it's possible to define data like data1 DCD 1,2,3, this will compile as three words with values 1, 2, 3 in consecutive bytes, with label data1 pointing to the first byte of first word. (some AREA docs from google).
Where these will land in physical memory after loading executable depends on how the executable is linked (linker is using a script file which is helping him to decide which AREA to put where, and how to create symbol table for dynamic relocation done by the executable loader, by editing the linker script you can adjust where the code and data land, but normally you don't need to do that).
Also the linker script and assembler directives can affect size of available stack, and where it is mapped in physical memory.
So for your particular platform: google for memory mappings on web and check the linker script (for start just use linker option to produce .map file to see where the code and data are targeted to land).
So you can either declare that array in some data area, then to work with it, you load symbol data1 into register ("load address of data1"), and use that to fetch memory content from that address.
Or you can first put all the numbers into the stack (which is set probably to something reasonable by the OS loader of your executable), and operate in the code with the stack pointer to access the numbers in it.
You can even DCD some values into CODE area, so those words will end between the instructions in memory mapped as read-only by executable loader. You can read those data, but writing to them will likely cause crash. And of course you shouldn't execute them as instructions by accident (forgetting to put some ret/jump instruction ahead of DCD).
without stack
Well, this one is tricky, you have to be careful to not use any call/etc. and to have interrupts disabled, etc.. basically any thing what needs stack.
When people code a bootloader, usually they set up some temporary stack ASAP in first few instructions, so they can use basic stack functionality before setting up whole environment properly, or loading OS. A space for that temporary stack is often reserved somewhere in/after the code, or an unused memory space according to defined machine state after reset.
If you are down to the metal, without OS, usually all memory is writeable after reset, so you can then intermix code and data as you wish (just jumping around the data, not executing them by accident), without using AREA definitions.
But you should make your mind, whether you are creating application in user space of some OS (so you have things like stack and data areas well defined and you can use them for your convenience), or you are creating boot loader code which has to set it all up for itself (more difficult, so I would suggest at first going into user land of some OS, having C wrapper around with clib initialized is often handy too, so you can call things like printf from ASM for convenient output).
How can I operate with values declared in this way
It doesn't matter in machine code, which way the values were declared. All that matters is, if you have address of the memory, and if you know the structure, how the data are stored there. Then you can work with them in any way you want, using any instruction you want. So body of that asm example will not change, if you allocate the data in ASM, you will just pass the pointer as argument to it, like the C does.
edit: some example done blindly without testing, may need further syntax fixing to work for OP (or maybe there's even some bug and it will not work at all, let me know in comments if it did):
AREA myData, DATA, READWRITE
SortArray
DCD 5, 4, 1, 3, 2, 12, 55, 64, 77, 10
SortArrayEnd
AREA ARM, CODE, READONLY
CODE32
PRESERVE8
EXPORT __sortasmarray
__sortasmarray
; if "add r0, pc, #SortArray" fails (code too far in memory from array)
; then this looks like some heavy weight way of loading any address
; ldr r0, =SortArray
; ldr r1, =SortArrayEnd
add r0, pc, #SortArray ; address of array
; calculate array size from address of end
; (as I couldn't find now example of thing like "equ $-SortArray")
add r1, pc, #SortArrayEnd
sub r1, r1, r0
mov r1, r1, lsr #2
; do a direct jump instead of "bl", so __sortc returning
; to lr will actually return to called of this
b __sortc
; ... rest of your __sortc assembly without change
You can call it from C code as:
extern void __sortasmarray();
int main()
{
__sortasmarray();
return 0;
}
I used among others this Introducing ARM assembly language to refresh my ARM asm memory, but I'm still worried this may not work as is.
As you can see, I didn't change any thing in the __sortc. Because there's no difference in accessing stack memory, or "dcd" memory, it's the same computer memory. Once you have the address to particular word, you can ldr/str it's value with that address. The __sortc receives address of first word in array to sort in both cases, from there on it's just memory for it, without any context how that memory was defined in source, allocated, initialized, etc. As long as it's writeable, it's fine for __sortc.
So the only "dcd" related thing from me is loading array address, and the quick search for ARM examples shows it may be done in several ways, this add rX, pc, #label way is optimal, but does work only for +-4k range? There's also pseudo instruction ADR rX, #label doing this same thing, and maybe switching to other in case of range problem? For any range it looks like ldr rX, = label form is used, although I'm not sure if it's pseudo instruction or how it works, check some tutorials and disassembly the machine code to see how it was compiled.
It's up to you to learn all the ARM assembly peculiarities and how to load addresses of arrays, I don't need ARM ASM at the moment, so I didn't dig into those details.
And there should be some equ way to define length of array, instead of calculating it in code from end address, but I couldn't find any example, and I'm not going to read full Assembler docs to learn about all it's directives (in gas I think ArrayLength equ ((.-SortArray)/4) would work).

Related

Where do the values of uninitialized variables come from, in practice on real CPUs?

I want to know the way variables are initialized :
#include <stdio.h>
int main( void )
{
int ghosts[3];
for(int i =0 ; i < 3 ; i++)
printf("%d\n",ghosts[i]);
return 0;
}
this gets me random values like -12 2631 131 .. where did they come from?
For example with GCC on x86-64 Linux: https://godbolt.org/z/MooEE3ncc
I have a guess to answer my question, it could be wrong anyways:
The registers of the memory after they are 'emptied' get random voltages between 0 and 1, these values get 'rounded' to 0 or 1, and these random values depend on something?! Maybe the way registers are made? Maybe the capacity of the memory comes into play somehow? And maybe even the temperature?!!

Your computer doesn't reboot or power cycle every time you run a new program. Every bit of storage in memory or registers your program can use has a value left there by some previous instruction, either in this program or in the OS before it started this program.
If that was the case, e.g. for a microcontroller, yes, each bit of storage might settle into a 0 or 1 state during the voltage fluctuations of powering on, except in storage engineered to power up in a certain state. (DRAM is more likely to be 0 on power-up, because its capacitors will have discharged). But you'd also expect there to be internal CPU logic that does some zeroing or setting of things to guaranteed state before fetching and executing the first instruction of code from the reset vector (a memory address); system designers normally arrange for there to be ROM at that physical address, not RAM, so they can put non-random bytes of machine-code there. Code that executes at that address should probably assume random values for all registers.
But you're writing a simple user-space program that runs under an OS, not the firmware for a microcontroller, embedded system, or mainstream motherboard, so power-up randomness is long in the past by the time anything loads your program.
Modern OSes zero registers on process startup, and zero memory pages allocated to user-space (including your stack space), to avoid information leaks of kernel data and data from other processes. So the values must come from something that happened earlier inside your process, probably from dynamic linker code that ran before main and used some stack space.
Reading the value of a local variable that's never been initialized or assigned is not actually undefined behaviour (in this case because it couldn't have been declared register int ghosts[3], that's an error (Godbolt) because ghosts[i] effectively uses the address) See (Why) is using an uninitialized variable undefined behavior? In this case, all the C standard has to say is that the value is indeterminate. So it does come down to implementation details, as you expected.
When you compile without optimization, compilers don't even notice the UB because they don't track usage across C statements. (This means everything is treated somewhat like volatile, only loading values into registers as needed for a statement, then storing again.)
In the example Godbolt link I added to your question, notice that -Wall doesn't produce any warnings at -O0, and just reads from the stack memory it chose for the array without ever writing it. So your code is observing whatever stale value was in memory when the function started. (But as I said, that must have been written earlier inside this program, by C startup code or dynamic linking.)
With gcc -O2 -Wall, we get the warning we'd expect: warning: 'ghosts' is used uninitialized [-Wuninitialized], but it does still read from stack space without writing it.
Sometimes GCC will invent a 0 instead of reading uninitialized stack space, but it happens not in this case. There's zero guarantee about how it compiles the compiler sees the use-uninitialized "bug" and can invent any value it wants, e.g. reading some register it never wrote instead of that memory. e.g. since you're calling printf, GCC could have just left ESI uninitialized between printf calls, since that's where ghost[i] is passed as the 2nd arg in the x86-64 System V calling convention.
Most modern CPUs including x86 don't have any "trap representations" that would make an add instruction fault, and even if it did the C standard doesn't guarantee that the indeterminate value isn't a trap representation. But IA-64 did have a Not A Thing register result from bad speculative loads, which would trap if you tried to read it. See comments on the trap representation Q&A - Raymond Chen's article: Uninitialized garbage on ia64 can be deadly.
The ISO C rule about it being UB to read uninitialized variables that were candidates for register might be aimed at this, but with optimization enabled you could plausibly still run into this anyway if the taking of the address happens later, unless the compiler takes steps to avoid it. But ISO C defect report N1208 proposes saying that an indeterminate value can be "a value that behaves as if it were a trap representation" even for types that have no trap representations. So it seems that part of the standard doesn't fully cover ISAs like IA-64, the way real compilers can work.
Another case that's not exactly a "trap representation": note that only some object-representations (bit patterns) are valid for _Bool in mainstream ABIs, and violating that can crash your program: Does the C++ standard allow for an uninitialized bool to crash a program?
That's a C++ question, but I verified that GCC will return garbage without booleanizing it to 0/1 if you write _Bool b[2] ; return b[0]; https://godbolt.org/z/jMr98547o. I think ISO C only requires that an uninitialized object has some object-representation (bit-pattern), not that it's a valid one for this object (otherwise that would be a compiler bug). For most integer types, every bit-pattern is valid and represents an integer value. Besides reading uninitialized memory, you can cause the same problem using (unsigned char*) or memcpy to write a bad byte into a _Bool.
An uninitialized local doesn't have "a value"
As shown in the following Q&As, when compiling with optimization, multiple reads of the same uninitialized variable can produce different results:
Is uninitialized local variable the fastest random number generator?
What happens to a declared, uninitialized variable in C? Does it have a value?
The other parts of this answer are primarily about where a value comes from in un-optimized code, when the compiler doesn't really "notice" the UB.

The registers of the memory after they are 'emptied' get random voltages between 0 and 1,
Nothing so mysterious. You are just seeing what was written to those memory locations last time they were used.
When memory is released it is not cleared or emptied. The system just knows that its free and the next time somebody needs memory it just gets handed over, the old contents are still there. Its like buying an old car and looking in the glove compartment, the contents are not mysterious, its just a surprise to find a cigarette lighter and one sock.
Sometimes in a debugging environment freed memory is cleared to some identifiable value so that its easy to recognize that you are dealing with uninitialized memory. For examples 0xccccccccccc or maybe 0xdeadbeefDeadBeef
Maybe a better analogy. You are eating in a self serve restaurant that never cleans its plates, when a customer has finished they put the plates back on the 'free' pile. When you go to serve yourself you pick up the top plate from the free pile. You should clean the plate otherwise you get what was left there by previous customer

I am going to use a platform that is easy to see what is going on. The compilers and platforms work the same way independent of architecture, operating system, etc. There are exceptions of course...
In main am going to call this function:
test();
Which is:
extern void hexstring ( unsigned int );
void test ( void )
{
unsigned int x[3];
hexstring(x[0]);
hexstring(x[1]);
hexstring(x[2]);
}
hexstring is just a printf("%008X\n",x).
Build it (not using x86, using something that is overall easier to read for this demonstration)
test.c: In function ‘test’:
test.c:7:2: warning: ‘x[0]’ is used uninitialized in this function [-Wuninitialized]
7 | hexstring(x[0]);
| ^~~~~~~~~~~~~~~
test.c:8:2: warning: ‘x[1]’ is used uninitialized in this function [-Wuninitialized]
8 | hexstring(x[1]);
| ^~~~~~~~~~~~~~~
test.c:9:2: warning: ‘x[2]’ is used uninitialized in this function [-Wuninitialized]
9 | hexstring(x[2]);
| ^~~~~~~~~~~~~~~
The disassembly of the compiler output shows
00010134 <test>:
10134: e52de004 push {lr} ; (str lr, [sp, #-4]!)
10138: e24dd014 sub sp, sp, #20
1013c: e59d0004 ldr r0, [sp, #4]
10140: ebffffdc bl 100b8 <hexstring>
10144: e59d0008 ldr r0, [sp, #8]
10148: ebffffda bl 100b8 <hexstring>
1014c: e59d000c ldr r0, [sp, #12]
10150: e28dd014 add sp, sp, #20
10154: e49de004 pop {lr} ; (ldr lr, [sp], #4)
10158: eaffffd6 b 100b8 <hexstring>
We can see that the stack area is allocated:
10138: e24dd014 sub sp, sp, #20
But then we go right into reading and printing:
1013c: e59d0004 ldr r0, [sp, #4]
10140: ebffffdc bl 100b8 <hexstring>
So whatever was on the stack. Stack is just memory with a special hardware pointer.
And we can see the other two items in the array are also read (load) and printed.
So whatever was in that memory at this time is what gets printed. Now the environment I am in likely zeroed the memory (including stack) before we got there:
00000000
00000000
00000000
Now I am optimizing this code to make it easier to read, which adds a few challenges.
So what if we did this:
test2();
test();
In main and:
void test2 ( void )
{
unsigned int y[3];
y[0]=1;
y[1]=2;
y[2]=3;
}
test2.c: In function ‘test2’:
test2.c:5:15: warning: variable ‘y’ set but not used [-Wunused-but-set-variable]
5 | unsigned int y[3];
|
and we get:
00000000
00000000
00000000
but we can see why:
00010124 <test>:
10124: e52de004 push {lr} ; (str lr, [sp, #-4]!)
10128: e24dd014 sub sp, sp, #20
1012c: e59d0004 ldr r0, [sp, #4]
10130: ebffffe0 bl 100b8 <hexstring>
10134: e59d0008 ldr r0, [sp, #8]
10138: ebffffde bl 100b8 <hexstring>
1013c: e59d000c ldr r0, [sp, #12]
10140: e28dd014 add sp, sp, #20
10144: e49de004 pop {lr} ; (ldr lr, [sp], #4)
10148: eaffffda b 100b8 <hexstring>
0001014c <test2>:
1014c: e12fff1e bx lr
test didn't change but test2 is dead code as one would expect when optimized, so it did not actually touch the stack. But what if we:
test2.c
void test3 ( unsigned int * );
void test2 ( void )
{
unsigned int y[3];
y[0]=1;
y[1]=2;
y[2]=3;
test3(y);
}
test3.c
void test3 ( unsigned int *x )
{
}
Now
0001014c <test2>:
1014c: e3a01001 mov r1, #1
10150: e3a02002 mov r2, #2
10154: e3a03003 mov r3, #3
10158: e52de004 push {lr} ; (str lr, [sp, #-4]!)
1015c: e24dd014 sub sp, sp, #20
10160: e28d0004 add r0, sp, #4
10164: e98d000e stmib sp, {r1, r2, r3}
10168: eb000001 bl 10174 <test3>
1016c: e28dd014 add sp, sp, #20
10170: e49df004 pop {pc} ; (ldr pc, [sp], #4)
00010174 <test3>:
10174: e12fff1e bx lr
test2 is actually putting stuff on the stack. Now the calling conventions generally require that the stack pointer is back where it started when you were called, which means function a might move the pointer and read/write some data in that space, call function b move the pointer, read/write some data in that space, and so on. Then when each function returns it does not make sense usually to clean up, you just move the pointer back and return whatever data you wrote to that memory remains.
So if test 2 writes a few things to the stack memory space and then returns then another function is called at the same level as test2. Then the stack pointer is at the same address when test() is called as when test2() was called, in this example. So what happens?
00000001
00000002
00000003
We have managed to control what test() is printing out. Not magic.
Now rewind back to the 1960s and then work forward to the present, particularly 1980s and later.
Memory was not always cleaned up before your program ran. As some folks here are implying if you were doing banking on a spreadsheet then you closed that program and opened this program...back in the day...you would almost expect to see some data from that spreadsheet program, maybe the binary maybe the data, maybe something else, due to the nature of the operating systems use of memory it may be a fragment of the last program you ran, and a fragment of the one before that, and a fragment of a program still running that just did a free(), and so on.
Naturally, once we started to get connected to each other and hackers wanted to take over and send themselves your info or do other bad things, you can see how trivial it would be to write a program to look for passwords or bank accounts or whatever.
So not only do we have protections today to prevent one program sniffing around in another programs space, we generally assume that, today, before our program gets some memory that was used by some other program, it is wiped.
But if you disassemble even a simple hello world printf program you will see that there is a fair amount of bootstrap code that happens before main() is called. As far as the operating system is concerned, all of that code is part of our one program so even if (let's assume) memory were zeroed or cleaned before the OS loads and launches our program. Before main, within our program, we are using the stack memory to do stuff, leaving behind values, that a function like test() will see.
You may find that each time you run the same binary, one compile many runs, that the "random" data is the same. Now you may find that if you add some other shared library call or something to the overall program, then maybe, maybe, that shared library stuff causes extra code pre-main to happen to try to be able to call the shared code, or maybe as the program runs it takes different paths now because of a side effect of a change to the overall binary and now the random values are different but consistent.
There are explanations why the values could be different each time from the same binary as well.
There is no ghost in the machine though. Stack is just memory, not uncommon when a computer boots to wipe that memory once if for no other reason than to set the ecc bits. After that that memory gets reused and reused and reused and reused. And depending on the overall architecture of the operating system. How the compiler builds your application and shared libraries. And other factors. What happens to be in memory where the stack pointer is pointing when your program runs and you read before you write (as a rule never read before you write, and good that compilers are now throwing warnings) is not necessarily random and the specific list of events that happened to get to that point, were not just random but controlled, are not values that you as the programmer may have predicted. Particularly if you do this at the main() level as you have. But be it main or seventeen levels of nested function calls, it is still just some memory that may or may not contain some stuff from before you got there. Even if the bootloader zeros memory, that is still a written zero that was left behind from some other program that came before you.
There are no doubt compilers that have features that relate to the stack that may do more work like zero at the end of the call or zero up front or whatever for security or some other reason someone thought of.
I would assume today that when an operating system like Windows or Linux or macOS runs your program it is not giving you access to some stale memory values from some other program that came before (spreadsheet with my banking information, email, passwords, etc). But you can trivially write a program to try (just malloc() and print or do the same thing you did but bigger to look at the stack). I also assume that program A does not have a way to get into program B's memory that is running concurrently. At least not at the application level. Without hacking (malloc() and print is not hacking in my use of the term).

The array ghosts is uninitialized, and because it was declared inside of a function and is not static (formally, it has automatic storage duration), its values are indeterminate.
This means that you could read any value, and there's no guarantee of any particular value.

Why is SP (apparently) stored on exception entry on Cortex-M3?

I am using a TI LM3S811 (a older Cortex-M3) with the SysTick interrupt to trigger at 10Hz. This is the body of the ISR:
void SysTick_Handler(void)
{
__asm__ volatile("sub r4, r4, #32\r\n");
}
This produces the following assembly with -O0 and -fomit-frame-pointer with gcc-4.9.3. The STKALIGN bit is 0, so stacks are 4-byte aligned.
00000138 <SysTick_Handler>:
138: 4668 mov r0, sp
13a: f020 0107 bic.w r1, r0, #7
13e: 468d mov sp, r1
140: b401 push {r0}
142: f1ad 0420 sub.w r4, r4, #32
146: f85d 0b04 ldr.w r0, [sp], #4
14a: 4685 mov sp, r0
14c: 4770 bx lr
14e: bf00 nop
I don't understand what's going on with r0 in the listing above. Specifically:
1) It seems like we're clearing the lower 3 bits of SP and storing it on the stack. Is that to maintain 8-byte alignment? Or is it something else?
2) Is the exception exit procedure is equally confusing. From my limited understanding of the ARM assembly, it does something like this:
SP = SP + 4; R0 = SP;
Followed by storing it back to SP. Which seems to undo the manipulations until this stage.
3) Why is there a nop instruction after the unconditional branch (at 0x14E)?

The ARM Procedure Calling Standard and C ABI expect an 8 byte (64 bit) alignment of the stack. As an interrupt might occur after pushing/poping a single word, it is not guaranteed the stack is correctly aligned on interrupt entry.
The STKALIGN bit, if set (the default) enforces the hardware to align the stack automatically by conditionally pushing an extra (dummy) word onto the stack.
The interrupt attribute on a function tells gcc, OTOH the stack might be missaligned, so it adds this pre-/postamble which enforces the alignment.
So, both actually do the same; one in hardware, one in software. If you can live with a word-aligned stack only, you should remove the interrupt attribute from the function declarations and clear the STKALIGN bit.
Make sure such a "missaligned" stack is no problem (I would not expect any, as this is a pure 32 bit CPU). OTOH, you should leave it as-is, unless you really need to safe that extra conditional(!) clock and word (very unlikely).
Warning: According to the ARM Architecture Reference Manual, setting STKALIGN == 0 is deprecated. Briefly: do not set this bit to 0!

Since you're using -O0, you should expect lots of redundant and useless code. The general way in which a compiler works is to generate code with the full generality of everything that might be used anywhere in the program, and then rely on the optimizer to get rid of things that are unneeded.
Yes this is doing 8byte alignment. Its also allocating a stack frame to hold local variables even though you have none.
The exit is the reverse, deallocating the stack frame.
The nop at the end is to maintain 4-byte alignment in the code, as you might want to link with non-thumb code at some point.
If you enable optimization, it will eliminate the stack frame (as its unneeded) and the code will become much simpler.

ARMv7 assembly store values in an array

I'm trying to add a user inputted number to a every element in an array. I had everything working until I realized that the original array was not being updated. Simple, I thought, just store the value back into the array and move on with life. Sadly, this doesn't seem to be quite so simple.
As the title suggests, I'm using ARMv7 and writing assembly. I've been using this guide to understand the basics and to have some good code to look at. When I run the example code given here it works fine: str r2, [r3] puts whatever is in r2 into what r3 points at. The following is my attempt to do the same thing which gives me a Signal 11 occurred: SIGSEGV (Invalid memory segment access) and Execution stopped at: 0x0000580C STR r3,[r5,#0]:
# Loop and add value to all values in array regardless of array length
# Setup loop
# r4 comes from above and the scanf value, I've checked the registers and the value is correct
mov r0, #0
ldr r1, =array_b
ldr r2, addrArr
loop: # Start loop to add inputed number to every value in array
add r3, r2, r0
ldr r3, [r3]
add r3, r3, r4 # Add input to each index in array
add r5, r2, r0 # Pointer to location in array
str r3, [r5] # Put new value into array
cmp r0, r1 # Check for end of array
addne r0, r0, #4 # Not super necessary but it shows one of the cool things ARM can do, condition math
bne loop # Branch if not equal
beq doneLoop # Branch if equal
doneLoop: # End loop
Here are the vars
.align 2
array:
.word 0
.word 1
.word 2
.word 3
.word 4
.word 5
.word 6
.word 7
.equ array_b, .-array
addrArr: .word array
My understanding is that str takes the source first and the destination second (which is different from other instructions for some reason). So r5 is used to calculate where in the array to store the value and r3 has the value from the add instruction. I've checked and the value in r5 is valid, ie: it's the start of the array and the array_b is the proper length (32 in this case). I've also tried doing =array instead of addrArr but they give the same value and the same segfault message.

This is because there is historically in systems two major kind of memories :
ROM, Read Only Memory, cannot be written to and can store only the program and constant data
RAM, Random Access Memory, can be both read and written to. It is used to store variables.
Many systems do no use ROM directly, instead the data can be loaded from an other permanent support, for example a floppy disc, a tape or a hard disc into RAM. In order to avoid a program writing to RAM memory that wasn't supposed to be written, the RAM can be divided in multiple areas, using segmented memory.
Not all system features this, so it really depends on the architecture. If segmented memory is used, it basically makes the processor quit the application when you try to write to a segment of RAM that is designed to be read only. This is exactly what appears to be your problem here.
In order to solve this you should declare your array, which is a variable and should be stocked in RAM, by precedding it by .data.
On the other hand your executable instructions should be placed in the read only segment marked with the assembler directive .text

How to properly create an array in ARM assembly?

I'm currently learning ARM assembly for a class and have come across a problem where I'd need to use an "array." I'm aware that there is no such thing as an array in ARM so I have to allocate space and treat that as an array. I have two questions.
Am I correctly adding new values to the array or am I merely overwriting the previous value? If I am overwriting the values, how do I go about adding new values?
How do I go about looping through the different values of the array? I know I have to use loop: but don't know how to use it to access different "indexes."
So far, this is what I've gotten from reading ARM documentation as I've gathered from resources online.
.equ SWI_Exit, 0x11
.text
.global _start
_start: .global _start
.global main
b main
main:
ldr R0, =MyArray
mov R1, #42
str R1, [R0], #4
mov R1, #43
str R1, [R0], #4
swi SWI_Exit
MyArray: .skip 20 * 4
.end
As a side note, I am using ARMSim# as required by my professor, so some commands recognized by GNU tools won't be recognized by ARMSim#, or at least I believe that is the case. Please correct me if I'm wrong.

You are just overwriting elements. At this level there are "such things as arrays", but only fixed-sized, preallocated arrays. The .skip is allocating the fixed-size array.* A variable-sized, growable array would typically be implemented with more complex dyanamic memory allocation code using the stack or a heap.
If you had a label like loop: (the actual name is arbitrary) you could branch (back) to it by using b loop. (Probably, you would want to do the branch conditionally so that the program didn't loop forever.) You can access different elements in the loop by changing R0, which you are already doing
Also the b main isn't really serving any purpose since it is branching to he next instruction. The code will do the same thing if you remove it.
[*] Alternately, you could say that your array is the just elements between MyArray and R0 (not including the memory R0 points to), in which, by changing R0 you are extending the array. But the maximum size Is still fixed by the .skip directive.

Interpreting jump tables / branch tables

I've been slowly picking things up with assembly. I am working on a Canon Rebel T1i, here is a small snippet of a code flow chart that I am trying to understand. To my knowledge, I believe the camera has a 132MHz ARM v5 processor:
http://i.imgur.com/PtWC9.png
I have searched the bottom of google attempting to understand how jump tables work, and no matter how much I read I just can't connect things together to understand it. I understand a jump table is similar to a case statement, but I don't understand just how it moves through the table.
Ex: in this example there is only one CMP operation, so I don't understand how exactly this is working. Any help will be greatly appreciated!!

I dont think you have enough info on the screen shot to understand how it connects to your question. But a jump table in general...
In C think of an array of functions, and you have initialized each element in the array of functions, at some point later your code makes some decision and uses an index to choose one of those functions. As you mentioned a case statement, could be implemented that way but that would be the exception not the rule, all depends on the variable being used in the switch and the size/width/nature of the elements in the case statement.
You have been picking up assembly, so you understand registers, doing math with registers, storing things in registers, etc. The program counter can be used by many instructions as just another register, the difference is when you write something to it, you change what instruction is executed next.
Lets try a case statement example:
switch(bob&3)
{
case 0: ted(); break;
case 1: joe(); break;
case 2: jim(); bob=2; break;
case 3: tim(); bob=7; break;
}
What you COULD (probably would not) do is:
casetable:
.word a
.word b
.word c
.word d
caseentry:
ldr r1,=bob
ldr r0,[r1]
ldr r2,=casetable
and r0,#3
ldr pc,[r2,r0,lsl #2]
a:
bl ted
b caseend
b:
bl joe
b caseend
c:
bl jim
mov r0,#2
ldr r1,=bob
str r0,[r1]
b caseend
d:
bl tim
mov r0,#7
ldr r1,=bob
str r0,[r1]
b caseend
caseend:
So the four words after the label casetable: are the addresses where the code starts for each of the cases, case0 starts at a: case1 code starts at b: and so on. What we need to do is take the variable used by the switch statement and mathematically compute an address for the item in the table. Then we need to load the address from the table into the program counter. Writing to the program counter is the same as performing a jump.
So the C sample was crafted intentially to make this easy. First load the contents of the bob variable into r0. And it with 3. The items in the jump table are 32 bit addresses, or 4 bytes so we need to multiply r0 times 4 to get the offset in the table. A shift left of 2 is the same as a multiply by 4. And we need to add r0<<2 to the base address for the jump table. So essentially we are computing address_of(casetable)+((bob&3)<<2) The read memory at that computed address and load that value into the program counter.
With arm (you mentioned this was arm) you can do much of this in one instruction:
ldr pc,[r2,r0,lsl #2]
Load into the register pc, the contents of the memory location [r2+(r0<<2)]. r2 is the address of casetable, and r0 is bob&3.
Basically a jump table boils down to mathmatically computing an offset into a table of addresses. The table of addresses are addresses you want to jump/branch to depending on one of the parameters used in the math operation, in my example above bob is that variable. And the addresses a,b,c,d are the address choices I want to pick from based on the contents of bob. There are a zillion fun and interesting ways to do this sort of thing, but it all boils down to computing at runtime the address to branch to, and shoving that address into the program counter in a way that causes the particular processor to perform what is essentially a jump.
Note another, perhaps easier to read way to compute and jump in my example would be:
mov r3,r0,lsl #2
add r3,r2
bx r3
The cores that support thumb use the bx instruction with a register often, normally you see bx lr to return from a branch link (subroutine) call. bx lr means pc = lr. bx r3 means pc = r3.
I hope this is what you were asking about, if I have misunderstood the question, please elaborate.
EDIT:
Looking at the code on your screen shot.
cmp r0,#4
addls pc,pc,r0,lsl #2
The optional math (ADDLS add if lower or same) computes the new program counter value (a jump table is a computation stored in the program counter) based on the program counter itself plus an offset r0 times 4. For arm processors, at the time of execution, the program counter is two instructions ahead. so, mixing those two lines of code and a portion of my example:
cmp r0,#4
addls pc,pc,r0,lsl #2
ldr pc,=a
ldr pc,=b
ldr pc,=c
ldr pc,=d
...
At the time addls is executed the program counter contains the address for the ldr pc,=b instruction. So if r0 contains a 0 then 0<<2 = 0, pc plus 0 would branch to the ldr pc,=b instruction then that instruction causes a branch to the b: label. if r0 contained a 1 at the time of addls then you would execute the ldr pc,=c instruction next and so on. You can make a table as deep as you want this way. Also note that since the add is conditional, if the condition does not happen you will execute that first instruction after the addls, so maybe you want that to be an unconditional branch to branch over the table, or branch backward an loop or maybe it is a nop so that you fall into the first jump, or what I did above is have it branch to some other place. So to understand what is going on you need to example the instructions that follow the addls to figure out what the possible jump table destinations are.