According to this thread, initializing an array with a shorter string literal pads the array with zeros.
So, is there any reason why these two functions (test1 and test2) would produce different results when compiled for ARM Cortex M4?
extern void write(char * buff);
void test1(void)
{
char buff[8] = {'t', 'e', 's', 't', 0, 0, 0, 0 };
write(buff);
}
void test2(void)
{
char buff[8] = "test";
write(buff);
}
I am getting equivalent assemblies for x86-64, but on ARM gcc I get different output:
test1:
str lr, [sp, #-4]!
sub sp, sp, #12
mov r3, sp
ldr r2, .L4
ldm r2, {r0, r1}
stm r3, {r0, r1}
mov r0, r3
bl write
add sp, sp, #12
ldr pc, [sp], #4
.L4:
.word .LANCHOR0
test2:
mov r3, #0
str lr, [sp, #-4]!
ldr r2, .L8
sub sp, sp, #12
ldm r2, {r0, r1}
str r0, [sp]
mov r0, sp
strb r1, [sp, #4]
strb r3, [sp, #5]
strb r3, [sp, #6]
strb r3, [sp, #7]
bl write
add sp, sp, #12
ldr pc, [sp], #4
.L8:
.word .LANCHOR0+8
First of, the code is equivalent in the sense that the contents of memory constituting the objects named buf will be the same.
That said, the compiler clearly produces worse code for the second function. Consequently, since there is a way to produce more optimized code that has equivalent semantics, it would be reasonable to consider this an optimization failure in the compiler and file a bug for it.
If the compiler wanted to emit the same code, it would have to recognize that the string literal representation in memory can be zero-padded without changing the semantics of the program (though the string literal itself cannot be padded, since sizeof "test" cannot be equal to sizeof "test\0\0\0").
However, since this would only be advantageous if the string literal is used to initialize an explicit-length array (usually a bad idea) that is longer than the literal and where normal string semantics are not sufficient (bytes past the null-terminator are relevant), the value of better optimization for this case seems limited.
Addendum: If you set godbolt to not remove assembler directives, you can see how the literals are created:
.section .rodata
.align 2
.set .LANCHOR0,. + 0
.byte 116
.byte 101
.byte 115
.byte 116
.byte 0
.byte 0
.byte 0
.byte 0
.ascii "test\000"
.space 3
Interestingly, the compiler does not deduplicate, and it leaves the padding after the string literal to the assembler (.space 3), rather than explicitly zeroing it.
Related
when I create ARM assembly code from C code with gcc -S, I get a variant of the LDR instruction that I don't know. Specifically, I get the "ldr r3, .L5" instruction where ".L5" is a lable defined by the compiler. It is not clear to me why I don't get the pseudoinstruction "ldr r3, =.L5", which should be the only way to load an arbitrary number in a register.
More in details:
I start from this C code (file name: sum_squares_C.c):
int sum;
int main(){
sum = 0;
for(int i=1; i<=n; i++){
sum = sum + i*i;
}
}
Then on a Raspeberry PI, I compile with "gcc -O0 -S sum_squares_C.c", with compiler version gcc (Raspbian 8.3.0-6+rpi1) 8.3.0.
The output is this ARM code (the instruction "ldr r3, .L5" is in the 7th line after label "main"):
.arch armv6
.eabi_attribute 28, 1
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 2
.eabi_attribute 30, 6
.eabi_attribute 34, 1
.eabi_attribute 18, 4
.file "sum_squares_C.c"
.text
.global n
.data
.align 2
.type n, %object
.size n, 4
n:
.word 1
.comm sum,4,4
.text
.align 2
.global main
.arch armv6
.syntax unified
.arm
.fpu vfp
.type main, %function
main:
# args = 0, pretend = 0, frame = 8
# frame_needed = 1, uses_anonymous_args = 0
# link register save eliminated.
str fp, [sp, #-4]!
add fp, sp, #0
sub sp, sp, #12
ldr r3, .L5
mov r2, #0
str r2, [r3]
mov r3, #1
str r3, [fp, #-8]
b .L2
.L3:
ldr r3, [fp, #-8]
ldr r2, [fp, #-8]
mul r2, r2, r3
ldr r3, .L5
ldr r3, [r3]
add r3, r2, r3
ldr r2, .L5
str r3, [r2]
ldr r3, [fp, #-8]
add r3, r3, #1
str r3, [fp, #-8]
.L2:
ldr r3, .L5+4
ldr r3, [r3]
ldr r2, [fp, #-8]
cmp r2, r3
ble .L3
mov r3, #0
mov r0, r3
add sp, fp, #0
# sp needed
ldr fp, [sp], #4
bx lr
.L6:
.align 2
.L5:
.word sum
.word n
.size main, .-main
.ident "GCC: (Raspbian 8.3.0-6+rpi1) 8.3.0"
.section .note.GNU-stack,"",%progbits
It seems to me that gcc uses the instruction "ldr r3, .L5" as equivalent to "ldr r3, =.L5". Is it correct? Where can I find the definition of this instruction syntax? Is it possible to force gcc to not use this instruction, but use "ldr r3, =.L5" (I need this for teaching reasons)?
Thanks!
Francesco
ldr r3, .L5 loads a word from the address .L5 into r3. At the label .L5 there is the address of the variable sum. So this loads the address of sum into r3.
ldr r3, =.L5 loads the address of .L5 into r3. Then the program would need to dereference it again in order to get the address of sum. There is no reason to do this.
When you use ldr r3, =.L5 the assembler stores the address of .L5 somewhere, and then loads from that address. So this:
ldr r3, =.L5
...
.L5:
.word sum
is the same as this:
ldr r3, .address_of_L5
...
.L5:
.word sum
...
.address_of_L5:
.word .L5
As you can see, the compiler has already done this for sum. Instead of writing this assembly:
ldr r3, =sum
the compiler has written:
ldr r3, .L5
...
.L5:
.word sum
which is exactly what the assembler would have done anyway. I don't know why the compiler wants to do this instead of the assembler.
It is not clear to me why I don't get the pseudoinstruction "ldr r3, =.L5", which should be the only way to load an arbitrary number in a register.
Notice this is not the only way to load an arbitrary number into a register. It's not even a real way to load an arbitrary number into a register. It's a pseudoinstruction (as you know): it's not something the CPU can actually do, it's something that the assembler can "compile" for your convenience.
To save typing and assume a risk a person might use:
ldr r3,=sum
ldr r3,[r3]
As pointed out in the other example the assembler will create in machine code the equivalent of what the human could have typed without the =address trick:
ldr r3,address_of_sum (without the =)
ldr r3,[r3]
...
address_of_sum: .word sum
And that first ldr (not pseudo as it translates directly into a known instruction, one to one) is a pc-relative load (assuming it can reach).
Both of these though are assembler specific as assembly language is defined by the assembler not the target.
The =address shortcut is not supported by all arm assemblers and should be used with care, for certain values it does not turn into a word in the pool with a pc relative load.
For questions like this first examine the disassembly, most of the time that will answer your question, even better examine the dissasembly first then in question the assembly. Compiler generated assembly is not as easy to read and follow as a disassembly, especially when linked. It is also easier to learn from optimized code than unoptimized as so much of the code is this stack (or in this case global) variable stuff.
ldr r3,=0x1000
ldr r3,=0x1234
b .
00000000 <.text>:
0: e3a03a01 mov r3, #4096 ; 0x1000
4: e51f3000 ldr r3, [pc, #-0] ; c <.text+0xc>
8: eafffffe b 8 <.text+0x8>
c: 00001234 andeq r1, r0, r4, lsr r2
In one case where it can it generates a mov, where it cant then it allocates from the pool and places the value there then does a pc relative load. Now yes when reading the output this way you need to see/understand/ignore the andeq disassembly that line we are looking at the value 0x00001234 and seeing the instruction generated.
You should not always assume the =address trick will work if you choose to try various tools, it works for gnu now if it can find a pool if it can't then you either need to just do the typing yourself or add a .pool or whatever the other pseudocode that does the same thing is to help the assembler find a place for this value as needed.
I would expect an assembler to always place this (=address) in the pool for an external reference, but it is technically possible for a toolchain to put a placeholder there and let the linker fill it in either with a mov or add a nearby item and place the value there like binutils does with a bl to an external reference.
gas:
ldr r3,=sum
b .
00000000 <.text>:
0: e51f3000 ldr r3, [pc, #-0] ; 8 <.text+0x8>
4: eafffffe b 4 <.text+0x4>
8: 00000000 andeq r0, r0, r0
The linker will fill in the address later as with your compiler output. Now the -0 disassembly is very interesting, almost amusing.
What is it that makes value in a variable of type _Bool 1 , even when we assign a value greater than 1 to it.
For ex:
_Bool tmp = 10;
printf("%x , %lu", tmp, sizeof(tmp));
This would print 1, 1. Trying to understand what is it that makes a variable of size Byte act as a single bit and when assigned a value > 1 which has LSB 0 still get converted to 1.
What is it that makes value in a variable of type _Bool 1 , even when we assign a value greater than 1 to it. The compiler does.
For example on ARM (arm-none-eabi-gcc):
#include "stdio.h"
#include "stdbool.h"
int main()
{
_Bool tmp = 10;
printf("%x , %lu", tmp, sizeof(tmp));
return 0;
}
compiles to:
.LC0:
.ascii "%x , %lu\000"
main:
stmfd sp!, {fp, lr}
add fp, sp, #4
sub sp, sp, #8
mov r3, #1
strb r3, [fp, #-5]
ldrb r3, [fp, #-5] # zero_extendqisi2
mov r2, #1
mov r1, r3
ldr r0, .L3
bl printf
mov r3, #0
mov r0, r3
sub sp, fp, #4
ldmfd sp!, {fp, lr}
bx lr
.L3:
.word .LC0
you can see in the instruction mov r3, #1 that the compiler directly converts the initialisation value 10 to 1 as specified by the standard .
int main(int argc, char *argv[])
{
char string[100];
string = *argv[1];
}
Why doesn't this work? Do I actually need to use loops to iterate through each element and do everything the long way?
Why doesn't this work?
Because that's simply how it works in C. Trying with string = argv[1] (without *) would be a better guess, but you cannot copy arrays with simple assignments.
Do I actually need to use loops to iterate through each element and do everything the long way?
Unless you are prepared to use functions like strcpy, strncpy or strdup or something similar, then yes. Using strncpy in your code would look like this:
char string[100];
strncpy(string, argv[1], sizeof(string));
string[sizeof(string) - 1] = 0;
The last line is to make sure that string is terminated. Clunky? Yes, it is. There are better functions in some compilers like strlcpy, which is available on POSIX systems, but it's not a part of the C standard. If you use strlcpy instead of strncpy you can skip the last line.
If you're planing to do a lot of string copying and don't have a compiler supporting strlcpy, it might be a good idea to write your own implementation (good practice) or just copy an existing one. Here is one I found:
size_t
strlcpy(char *dst, const char *src, size_t siz)
{
char *d = dst;
const char *s = src;
size_t n = siz;
/* Copy as many bytes as will fit */
if (n != 0) {
while (--n != 0) {
if ((*d++ = *s++) == '\0')
break;
}
}
/* Not enough room in dst, add NUL and traverse rest of src */
if (n == 0) {
if (siz != 0)
*d = '\0'; /* NUL-terminate dst */
while (*s++)
;
}
return(s - src - 1); /* count does not include NUL */
}
Source: https://android.googlesource.com/platform/system/core.git/+/brillo-m7-dev/libcutils/strlcpy.c
In the main function in C argv is a vector to strings which are arrays of characters themself. So argv is a pointer to a pointer (like **char).
Your code assigns a reference to one pointer (to first argument).
char* string = argv[1]; would do it. To copy the whole string (array of characters) use strcpy. To copy all arguments use memcpy.
But usually in a C program you do not copy arguments, just use references to them.
The short answer is: because it is. In the C language only structures and unions are copied by value with one exception:
Initialization of the array
void foo(void)
{
char x[] = "This string literal will be copied! Test it yourself";
char z[] = "This string literal will be copied as well But because it is much loger memcpy will be used! Test it yourself";
float y[] = {1.0,2.0, 3,0,4.0,5.0,1.0,2.0, 3,0,4.0,5.0,1.0,2.0, 3,0,4.0,5.0};
long long w[] = {1,2,3,4,5,6,7,8,9,0};
foo1(x,z); // this functions are only to prevent the variable removal
foo2(y,w);
}
and the compiled code:
foo:
push {r4, lr}
sub sp, sp, #320
mov ip, sp
ldr lr, .L4
ldr r4, .L4+4
ldmia lr!, {r0, r1, r2, r3}
stmia ip!, {r0, r1, r2, r3}
ldmia lr!, {r0, r1, r2, r3}
stmia ip!, {r0, r1, r2, r3}
ldmia lr!, {r0, r1, r2, r3}
stmia ip!, {r0, r1, r2, r3}
ldm lr, {r0, r1}
str r0, [ip], #4
strb r1, [ip]
add r0, sp, #208
mov r2, #110
ldr r1, .L4+8
bl memcpy
mov r1, r4
add r0, sp, #56
mov r2, #72
bl memcpy
mov r2, #80
add r1, r4, #72
add r0, sp, #128
bl memcpy
add r1, sp, #208
mov r0, sp
bl foo1
add r1, sp, #128
add r0, sp, #56
bl foo2
add sp, sp, #320
pop {r4, pc}
.L4:
.word .LC2
.word .LANCHOR0
.word .LC3
.LC2:
.ascii "This string literal will be copied! Test it yoursel"
.ascii "f\000"
.LC3:
.ascii "This string literal will be copied as well But beca"
.ascii "use it is much loger memcpy will be used! Test it y"
.ascii "ourself\000"
Structures and unions are copied by the value sothe assignment copies the whole structure to another.
typedef struct
{
char str[100];
}string;
string a = {.str = "This string literal will be copied before main starts"},b;
void foo3(string c)
{
string g = a;
b = a;
foo4(g);
}
and the code:
foo3:
sub sp, sp, #16
push {r4, r5, r6, lr}
mov r6, #100
sub sp, sp, #104
ldr r5, .L4
add ip, sp, #116
add r4, sp, #4
stmib ip, {r0, r1, r2, r3}
mov r2, r6
mov r1, r5
mov r0, r4
bl memcpy
mov r2, r6
mov r1, r5
ldr r0, .L4+4
bl memcpy
add r1, sp, #20
mov r2, #84
add r0, sp, #136
bl memcpy
ldm r4, {r0, r1, r2, r3}
add sp, sp, #104
pop {r4, r5, r6, lr}
add sp, sp, #16
b foo4
.L4:
.word .LANCHOR0
.word b
a:
.ascii "This string literal will be copied before main star"
.ascii "ts\000"
you can play with it yourself:
https://godbolt.org/z/lag4uL
In C, an array name is not an L-value expression. Hence you can not use it in an assignment statement. To make a copy of a character array, you can either use a for statement or strcpy function, which is declared in string.h header file.
I am trying to figure out how arrays work in ARM assembly, but I am just overwhelmed. I want to initialize an array of size 20 to 0, 1, 2 and so on.
A[0] = 0
A[1] = 1
I can't even figure out how to print what I have to see if I did it correctly. This is what I have so far:
.data
.balign 4 # Memory location divisible by 4
string: .asciz "a[%d] = %d\n"
a: .skip 80 # allocates 20
.text
.global main
.extern printf
main:
push {ip, lr} # return address + dummy register
ldr r1, =a # set r1 to index point of array
mov r2, #0 # index r2 = 0
loop:
cmp r2, #20 # 20 elements?
beq end # Leave loop if 20 elements
add r3, r1, r2, LSL #2 # r3 = r1 + (r2*4)
str r2, [r3] # r3 = r2
add r2, r2, #1 # r2 = r2 + 1
b loop # branch to next loop iteration
print:
push {lr} # store return address
ldr r0, =string # format
bl printf # c printf
pop {pc} # return address
ARM confuses me enough as it is, I don't know what i'm doing wrong. If anyone could help me better understand how this works that would be much appreciated.
This might help down the line for others who want to know about how to allocate memory for array in arm assembly language
here is a simple example to add corresponding array elements and store in the third array.
.global _start
_start:
MOV R0, #5
LDR R1,=first_array # loading the address of first_array[0]
LDR R2,=second_array # loading the address of second_array[0]
LDR R7,=final_array # loading the address of final_array[0]
MOV R3,#5 # len of array
MOV R4,#0 # to store sum
check:
cmp R3,#1 # like condition in for loop for i>1
BNE loop # if R3 is not equal to 1 jump to the loop label
B _exit # else exit
loop:
LDR R5,[R1],#4 # loading the values and storing in registers and base register gets updated automatically R1 = R1 + 4
LDR R6,[R2],#4 # similarly
add R4,R5,R6
STR R4,[R7],#4 # storing the values back to the final array
SUB R3,R3,#1 # decrment value just like i-- in for loop
B check
_exit:
LDR R7,=final_array # before exiting checking the values stored
LDR R1, [R7] # R1 = 60
LDR R2, [R7,#4] # R2 = 80
LDR R3, [R7,#8] # R3 = 100
LDR R4, [R7,#12] # R4 = 120
MOV R7, #1 # terminate syscall, 1
SWI 0 # execute syscall
.data
first_array: .word 10,20,30,40
second_array: .word 50,60,70,80
final_array: .word 0,0,0,0,0
as mentioned your printf has problems, you can use the toolchain itself to see what the calling convention is, and then conform to that.
#include <stdio.h>
unsigned int a,b;
void notmain ( void )
{
printf("a[%d] = %d\n",a,b);
}
giving
00001008 <notmain>:
1008: e59f2010 ldr r2, [pc, #16] ; 1020 <notmain+0x18>
100c: e59f3010 ldr r3, [pc, #16] ; 1024 <notmain+0x1c>
1010: e5921000 ldr r1, [r2]
1014: e59f000c ldr r0, [pc, #12] ; 1028 <notmain+0x20>
1018: e5932000 ldr r2, [r3]
101c: eafffff8 b 1004 <printf>
1020: 0000903c andeq r9, r0, ip, lsr r0
1024: 00009038 andeq r9, r0, r8, lsr r0
1028: 0000102c andeq r1, r0, ip, lsr #32
Disassembly of section .rodata:
0000102c <.rodata>:
102c: 64255b61 strtvs r5, [r5], #-2913 ; 0xb61
1030: 203d205d eorscs r2, sp, sp, asr r0
1034: 000a6425 andeq r6, sl, r5, lsr #8
Disassembly of section .bss:
00009038 <b>:
9038: 00000000 andeq r0, r0, r0
0000903c <a>:
903c:
the calling convention is generally first parameter in r0, second in r1, third in r2 up to r3 then use the stack. There are many exceptions to this, but we can see here that the compiler which normally works fine with a printf call, wants the address of the format string in r0. the value of a then the value of b in r1 and r2 respectively.
Your printf has the string in r0, but a printf call with that format string needs three parameters.
The code above used a tail optimization and branch to printf rather than called it and returned from. The arm convention these days prefers the stack to be aligned on 64 bit boundaries, so you can put some register, you dont necessarily care to preserve on the push/pop in order to keep that alignment
push {r3,lr}
...
pop {r3,pc}
It certainly wont hurt you to do this, it may or may not hurt to not do it depending on what downstream assumes.
Your setup and loop should function just fine assuming that r1 (label a) is a word aligned address. Which it may or may not be if you mess with your string, should put a first then the string or put another alignment statement before a to insure the array is aligned. There are instruction set features that can simply the code, but it appears functional as is.
I'm working in a C file, which is something like this:
#define v2 0x560000a0
int main(void)
{
long int v1;
v1 = ioread32(v2);
return 0;
}
and I've extracted this part to write it in assembly:
int main ()
{
extern v1,v2;
v1=ioread32(v2);
return 0;
}
I'm trying to write the value of v2 in v1 using assembly code for armv4. Using
arm-linux-gnueabi-gcc -S -march=armv4 assembly_file.c
I get this code:
.arch armv4
.eabi_attribute 27, 3
.fpu vfpv3-d16
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 2
.eabi_attribute 30, 6
.eabi_attribute 34, 0
.eabi_attribute 18, 4
.file "txind-rsi.c"
.text
.align 2
.global main
.type main, %function
main:
# Function supports interworking.
# args = 0, pretend = 0, frame = 0
# frame_needed = 1, uses_anonymous_args = 0
stmfd sp!, {fp, lr}
add fp, sp, #4
ldr r3, .L2
ldr r3, [r3, #0]
mov r0, r3
bl ioread32
mov r2, r0
ldr r3, .L2+4
str r2, [r3, #0]
mov r3, #0
mov r0, r3
sub sp, fp, #4
ldmfd sp!, {fp, lr}
bx lr
.L3:
.align 2
.L2:
.word v1
.word v2
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",%progbits
I use that code to put it back inside the C file this way:
asm volatile(
"stmfd sp!, {fp, lr}\n"
"add fp, sp, #4\n"
"ldr r3, =v1\n"
"ldr r3, [r3, #0]\n"
"mov r0, r2\n"
"ldr r3, =v2\n"
"str r2, [r3, #0]\n"
"sub sp, fp, #4\n"
"ldmfd sp!, {fp, lr}\n"
"bx lr"
);
The code doesn't do anything.
In fact, it stops the target working.
Does anyones know why?
EDITED:
After reading your answers I have another question:
How would I put a constant value in a register?. The code in C would be this:
#define v2 0x560000a0
int main(void)
{
long int value = 0x0000ffff;
long int v1;
v1 = ioread32(v2);
iowrite32(v2,value);
return 0;
}
I've tried this:
asm volatile("mov r3, #value");
and I get an assembler message: "value symbol is in a different section";
I've also tried
asm volatile("mov r3, #0x0000ffff);
and the assembler message is:
"invalid constant(ffff) after fixup". And after reading this: Invalid constant after fixup?
I don't know how I can put that value into r3, as it seems I can't do it with mov.I'm using armv4, not armv7. And this solution proposed in the link, doesn't work for me:
asm volatile("ldr r3, =#0000ffff\n");
There are no problems with your code, bx lr is the proper way to terminate main, no issue there. Your code is most likely crashing due to the address you are accessing, it is probably not an address you are allowed to access...
#define v2 0x560000a0
int main(void)
{
long int v1;
v1 = ioread32(v2);
return 0;
}
if you optimize on the C compile step you can see a cleaner, simpler version
00000000 <main>:
0: e92d4008 push {r3, lr}
4: e59f0008 ldr r0, [pc, #8] ; 14 <main+0x14>
8: ebfffffe bl 0 <ioread32>
c: e3a00000 mov r0, #0
10: e8bd8008 pop {r3, pc}
14: 560000a0 strpl r0, [r0], -r0, lsr #1
which is not hard to implement in asm.
.globl main
main:
push {r3,lr}
ldr r0,=0x560000A0
bl ioread32
mov r0,#0
pop {r3,pc}
assembled and disassembled
00000000 <main>:
0: e92d4008 push {r3, lr}
4: e59f0008 ldr r0, [pc, #8] ; 14 <main+0x14>
8: ebfffffe bl 0 <ioread32>
c: e3a00000 mov r0, #0
10: e8bd8008 pop {r3, pc}
14: 560000a0 strpl r0, [r0], -r0, lsr #1
you had simplified the code to the point that v1 becomes dead code, but the function call cannot be optimized out, so the return is discarded.
if you dont use main but create a separate function that returns
#define v2 0x560000a0
long int fun(void)
{
long int v1;
v1 = ioread32(v2);
return v1;
}
...damn...tail optimization:
00000000 <fun>:
0: e59f0000 ldr r0, [pc] ; 8 <fun+0x8>
4: eafffffe b 0 <ioread32>
8: 560000a0 strpl r0, [r0], -r0, lsr #1
Oh well. You are on the right path, see what the C compiler generates then mimic or modify that. I suspect it is your ioread that is the problem and not the outer structure (why are you doing a 64 bit thing with a 32 bit read, maybe that is the problem long it is most likely going to be implemented as 64 bit).
The bx lr at the end would try to return from the current function. You probably don't want that. I guess that is also the reason for the crash, given that you don't let the compiler undo whatever setup it has done in the function prologue.
Also, in your asm block you don't use any locals so you don't need to adjust the stack pointer, and you don't need a new stack frame either. The mov r0, r2 is also broken because you have loaded the value into r3. You can delete the mov if you just load directly into whatever register you want. Furthermore, in gcc inline asm you should tell the compiler what registers you modify. Not doing so can also lead to a crash because you might overwrite values the compiler relies upon.
Not sure what's the point in doing this in assembly. If you insist on that, the following should work somewhat better:
asm volatile(
"ldr r3, =v1\n"
"ldr r2, [r3, #0]\n"
"ldr r3, =v2\n"
"str r2, [r3, #0]\n" ::: "r2", "r3"
);