1. #define timers ((dual_timers *)0x03FF6000)
This is a memory map definition used in an ARM Microcontroller
where the structure definition is
2. struct dual_timers
{
special_register TMOD;
special_register TDATA0;
special_register TDATA1;
special_register TCNT0;
special_register TCNT1;
};
What the meaning of(dual_timers *)0x03FF6000) ?, is it type casting .
if it is typecasting please explain its influence in the code.
How would the compiler see the definition 'timers' after this?
This has been asked and answered countless times here.
First off the structure thing is a bad idea, not portable not reliable, even though it is used as often as it isnt in vendors code. Little time bombs waiting to go off and have you pay them for support perhaps.
Your define is just elementary C. It is a typecast, I have this address happens to be hardcoded, in C programming class we might have used the name of some other pointer and likely not the define
unsigned int *bob;
unsigned char *ted = (unsigned char *)bob;
(yet another programming trick you should never use). And you can spin that around as a define
#define ted (unsigned char *)bob
Or something to that effect. bob is just an address with a human readable name.
For this to work you need a volatile in there (which it isnt?) and they have yet another typedef somewhere that defines dual_timers so they dont have to keep typing volatile unsigned int or volatile uint32_t or volatile uint8_t or whatever size the registers are. The volatile is because you know but the compiler doesnt that you are pointing at hardware not ram, you need the compiler to perform all of the loads and stores and not optimize any out.
In addition you need the compiler to perform the right sized loads and stores, if it is a register that can only be accessed with 32 bit wide transactions, you need the compiler to implement this with the right instructions. And no matter what you do that is not a guarantee, this programming style can and if you are unlucky will fail for you. It is a very wide spread practice, but it is not foolproof. It and even worse than making pointers to absolute addresses is using structures across a compile domain, hardware is a separate compile domain from your code. You cannot guarantee no matter how many compiler specific directives you find, that that code will remain working as time goes on and compilers are upgraded or if god forbid you try to compile on some other computer. It may work 99.9999% of the time but that time that it fails is a massive failure that earthquake once in a zillion years that wipes out all of Tokyo. As you see in kernel drivers using an abstraction makes for portable code, in bare metal you can implement that abstraction in assembly language and guarantee the correct instruction is used. It can cost you some cycles, so you can create a define/typedef just like the one you are asking about for the abstraction, but your code is not forced into that and a complete re-write of your code is not required if you need to port that code or work around a chip errata, etc. the latter is my personal opinion and style based on decades of experience in bare metal programming.
The define is just an elementary C typedef nothing special or fancy just read it like any other C syntax to understand what it is doing. The struct is a way of applying offsets to that address, so if we assume that all of these registers are 32 bit then the "desire" is to have accesses to TMOD be at address 0x03FF6000+0x00, accesses to TDATA0 be at address 0x03FF6000+0x04, TDATA1 0x03FF6000+0x08 and so no. But again there is nothing here that insures that is actually going to happen nor does it insure that 32 bit loads or stores are used. A simple disassembly of the code will show these addresses being generated for these accesses.
I assume you tried using code like this to see what it did:
typedef volatile unsigned int special_register;
typedef struct
{
special_register TMOD;
special_register TDATA0;
special_register TDATA1;
special_register TCNT0;
special_register TCNT1;
} dual_timers;
#define timers ((dual_timers *)0x03FF6000)
unsigned int fun ( void )
{
timers->TMOD=5;
timers->TDATA0|=1;
timers->TCNT0=timers->TCNT1;
return(timers->TDATA1);
}
for arm as you mentioned producing
00000000 <fun>:
0: e3a02005 mov r2, #5
4: e59f301c ldr r3, [pc, #28] ; 28 <fun+0x28>
8: e5832000 str r2, [r3]
c: e5932004 ldr r2, [r3, #4]
10: e3822001 orr r2, r2, #1
14: e5832004 str r2, [r3, #4]
18: e5932010 ldr r2, [r3, #16]
1c: e583200c str r2, [r3, #12]
20: e5930008 ldr r0, [r3, #8]
24: e12fff1e bx lr
28: 03ff6000 mvnseq r6, #0
Yes it is type casting. It basically says that starting from address 0x03FF6000 you can consider that there is a dual_timers structure.
In this context, I guess that special_register is defined as something like volatile unsigned uint32_t.
This is a typical way of easily accessing the registers of a microncontroller. For accessing the register TDATA0 for example, in your code you will need to use timers->TDATA0
It means that there is a pointer to the structure dual_timers and the value of the pointer is 0x03FF6000, i.e. it is pointing to the structure located at 0x03FF6000.
The compiler (in fact preprocessor) sees the expression (dual_timers *)0x03FF6000) every time it looks at the word timers. For you it looks like timers->TDATA0 but for the compiler it looks like (dual_timers *)0x03FF6000)->TDATA0, take TDATA0 field of dual_timers structure located at 0x03FF6000.
Related
I want to know the way variables are initialized :
#include <stdio.h>
int main( void )
{
int ghosts[3];
for(int i =0 ; i < 3 ; i++)
printf("%d\n",ghosts[i]);
return 0;
}
this gets me random values like -12 2631 131 .. where did they come from?
For example with GCC on x86-64 Linux: https://godbolt.org/z/MooEE3ncc
I have a guess to answer my question, it could be wrong anyways:
The registers of the memory after they are 'emptied' get random voltages between 0 and 1, these values get 'rounded' to 0 or 1, and these random values depend on something?! Maybe the way registers are made? Maybe the capacity of the memory comes into play somehow? And maybe even the temperature?!!
Your computer doesn't reboot or power cycle every time you run a new program. Every bit of storage in memory or registers your program can use has a value left there by some previous instruction, either in this program or in the OS before it started this program.
If that was the case, e.g. for a microcontroller, yes, each bit of storage might settle into a 0 or 1 state during the voltage fluctuations of powering on, except in storage engineered to power up in a certain state. (DRAM is more likely to be 0 on power-up, because its capacitors will have discharged). But you'd also expect there to be internal CPU logic that does some zeroing or setting of things to guaranteed state before fetching and executing the first instruction of code from the reset vector (a memory address); system designers normally arrange for there to be ROM at that physical address, not RAM, so they can put non-random bytes of machine-code there. Code that executes at that address should probably assume random values for all registers.
But you're writing a simple user-space program that runs under an OS, not the firmware for a microcontroller, embedded system, or mainstream motherboard, so power-up randomness is long in the past by the time anything loads your program.
Modern OSes zero registers on process startup, and zero memory pages allocated to user-space (including your stack space), to avoid information leaks of kernel data and data from other processes. So the values must come from something that happened earlier inside your process, probably from dynamic linker code that ran before main and used some stack space.
Reading the value of a local variable that's never been initialized or assigned is not actually undefined behaviour (in this case because it couldn't have been declared register int ghosts[3], that's an error (Godbolt) because ghosts[i] effectively uses the address) See (Why) is using an uninitialized variable undefined behavior? In this case, all the C standard has to say is that the value is indeterminate. So it does come down to implementation details, as you expected.
When you compile without optimization, compilers don't even notice the UB because they don't track usage across C statements. (This means everything is treated somewhat like volatile, only loading values into registers as needed for a statement, then storing again.)
In the example Godbolt link I added to your question, notice that -Wall doesn't produce any warnings at -O0, and just reads from the stack memory it chose for the array without ever writing it. So your code is observing whatever stale value was in memory when the function started. (But as I said, that must have been written earlier inside this program, by C startup code or dynamic linking.)
With gcc -O2 -Wall, we get the warning we'd expect: warning: 'ghosts' is used uninitialized [-Wuninitialized], but it does still read from stack space without writing it.
Sometimes GCC will invent a 0 instead of reading uninitialized stack space, but it happens not in this case. There's zero guarantee about how it compiles the compiler sees the use-uninitialized "bug" and can invent any value it wants, e.g. reading some register it never wrote instead of that memory. e.g. since you're calling printf, GCC could have just left ESI uninitialized between printf calls, since that's where ghost[i] is passed as the 2nd arg in the x86-64 System V calling convention.
Most modern CPUs including x86 don't have any "trap representations" that would make an add instruction fault, and even if it did the C standard doesn't guarantee that the indeterminate value isn't a trap representation. But IA-64 did have a Not A Thing register result from bad speculative loads, which would trap if you tried to read it. See comments on the trap representation Q&A - Raymond Chen's article: Uninitialized garbage on ia64 can be deadly.
The ISO C rule about it being UB to read uninitialized variables that were candidates for register might be aimed at this, but with optimization enabled you could plausibly still run into this anyway if the taking of the address happens later, unless the compiler takes steps to avoid it. But ISO C defect report N1208 proposes saying that an indeterminate value can be "a value that behaves as if it were a trap representation" even for types that have no trap representations. So it seems that part of the standard doesn't fully cover ISAs like IA-64, the way real compilers can work.
Another case that's not exactly a "trap representation": note that only some object-representations (bit patterns) are valid for _Bool in mainstream ABIs, and violating that can crash your program: Does the C++ standard allow for an uninitialized bool to crash a program?
That's a C++ question, but I verified that GCC will return garbage without booleanizing it to 0/1 if you write _Bool b[2] ; return b[0]; https://godbolt.org/z/jMr98547o. I think ISO C only requires that an uninitialized object has some object-representation (bit-pattern), not that it's a valid one for this object (otherwise that would be a compiler bug). For most integer types, every bit-pattern is valid and represents an integer value. Besides reading uninitialized memory, you can cause the same problem using (unsigned char*) or memcpy to write a bad byte into a _Bool.
An uninitialized local doesn't have "a value"
As shown in the following Q&As, when compiling with optimization, multiple reads of the same uninitialized variable can produce different results:
Is uninitialized local variable the fastest random number generator?
What happens to a declared, uninitialized variable in C? Does it have a value?
The other parts of this answer are primarily about where a value comes from in un-optimized code, when the compiler doesn't really "notice" the UB.
The registers of the memory after they are 'emptied' get random voltages between 0 and 1,
Nothing so mysterious. You are just seeing what was written to those memory locations last time they were used.
When memory is released it is not cleared or emptied. The system just knows that its free and the next time somebody needs memory it just gets handed over, the old contents are still there. Its like buying an old car and looking in the glove compartment, the contents are not mysterious, its just a surprise to find a cigarette lighter and one sock.
Sometimes in a debugging environment freed memory is cleared to some identifiable value so that its easy to recognize that you are dealing with uninitialized memory. For examples 0xccccccccccc or maybe 0xdeadbeefDeadBeef
Maybe a better analogy. You are eating in a self serve restaurant that never cleans its plates, when a customer has finished they put the plates back on the 'free' pile. When you go to serve yourself you pick up the top plate from the free pile. You should clean the plate otherwise you get what was left there by previous customer
I am going to use a platform that is easy to see what is going on. The compilers and platforms work the same way independent of architecture, operating system, etc. There are exceptions of course...
In main am going to call this function:
test();
Which is:
extern void hexstring ( unsigned int );
void test ( void )
{
unsigned int x[3];
hexstring(x[0]);
hexstring(x[1]);
hexstring(x[2]);
}
hexstring is just a printf("%008X\n",x).
Build it (not using x86, using something that is overall easier to read for this demonstration)
test.c: In function ‘test’:
test.c:7:2: warning: ‘x[0]’ is used uninitialized in this function [-Wuninitialized]
7 | hexstring(x[0]);
| ^~~~~~~~~~~~~~~
test.c:8:2: warning: ‘x[1]’ is used uninitialized in this function [-Wuninitialized]
8 | hexstring(x[1]);
| ^~~~~~~~~~~~~~~
test.c:9:2: warning: ‘x[2]’ is used uninitialized in this function [-Wuninitialized]
9 | hexstring(x[2]);
| ^~~~~~~~~~~~~~~
The disassembly of the compiler output shows
00010134 <test>:
10134: e52de004 push {lr} ; (str lr, [sp, #-4]!)
10138: e24dd014 sub sp, sp, #20
1013c: e59d0004 ldr r0, [sp, #4]
10140: ebffffdc bl 100b8 <hexstring>
10144: e59d0008 ldr r0, [sp, #8]
10148: ebffffda bl 100b8 <hexstring>
1014c: e59d000c ldr r0, [sp, #12]
10150: e28dd014 add sp, sp, #20
10154: e49de004 pop {lr} ; (ldr lr, [sp], #4)
10158: eaffffd6 b 100b8 <hexstring>
We can see that the stack area is allocated:
10138: e24dd014 sub sp, sp, #20
But then we go right into reading and printing:
1013c: e59d0004 ldr r0, [sp, #4]
10140: ebffffdc bl 100b8 <hexstring>
So whatever was on the stack. Stack is just memory with a special hardware pointer.
And we can see the other two items in the array are also read (load) and printed.
So whatever was in that memory at this time is what gets printed. Now the environment I am in likely zeroed the memory (including stack) before we got there:
00000000
00000000
00000000
Now I am optimizing this code to make it easier to read, which adds a few challenges.
So what if we did this:
test2();
test();
In main and:
void test2 ( void )
{
unsigned int y[3];
y[0]=1;
y[1]=2;
y[2]=3;
}
test2.c: In function ‘test2’:
test2.c:5:15: warning: variable ‘y’ set but not used [-Wunused-but-set-variable]
5 | unsigned int y[3];
|
and we get:
00000000
00000000
00000000
but we can see why:
00010124 <test>:
10124: e52de004 push {lr} ; (str lr, [sp, #-4]!)
10128: e24dd014 sub sp, sp, #20
1012c: e59d0004 ldr r0, [sp, #4]
10130: ebffffe0 bl 100b8 <hexstring>
10134: e59d0008 ldr r0, [sp, #8]
10138: ebffffde bl 100b8 <hexstring>
1013c: e59d000c ldr r0, [sp, #12]
10140: e28dd014 add sp, sp, #20
10144: e49de004 pop {lr} ; (ldr lr, [sp], #4)
10148: eaffffda b 100b8 <hexstring>
0001014c <test2>:
1014c: e12fff1e bx lr
test didn't change but test2 is dead code as one would expect when optimized, so it did not actually touch the stack. But what if we:
test2.c
void test3 ( unsigned int * );
void test2 ( void )
{
unsigned int y[3];
y[0]=1;
y[1]=2;
y[2]=3;
test3(y);
}
test3.c
void test3 ( unsigned int *x )
{
}
Now
0001014c <test2>:
1014c: e3a01001 mov r1, #1
10150: e3a02002 mov r2, #2
10154: e3a03003 mov r3, #3
10158: e52de004 push {lr} ; (str lr, [sp, #-4]!)
1015c: e24dd014 sub sp, sp, #20
10160: e28d0004 add r0, sp, #4
10164: e98d000e stmib sp, {r1, r2, r3}
10168: eb000001 bl 10174 <test3>
1016c: e28dd014 add sp, sp, #20
10170: e49df004 pop {pc} ; (ldr pc, [sp], #4)
00010174 <test3>:
10174: e12fff1e bx lr
test2 is actually putting stuff on the stack. Now the calling conventions generally require that the stack pointer is back where it started when you were called, which means function a might move the pointer and read/write some data in that space, call function b move the pointer, read/write some data in that space, and so on. Then when each function returns it does not make sense usually to clean up, you just move the pointer back and return whatever data you wrote to that memory remains.
So if test 2 writes a few things to the stack memory space and then returns then another function is called at the same level as test2. Then the stack pointer is at the same address when test() is called as when test2() was called, in this example. So what happens?
00000001
00000002
00000003
We have managed to control what test() is printing out. Not magic.
Now rewind back to the 1960s and then work forward to the present, particularly 1980s and later.
Memory was not always cleaned up before your program ran. As some folks here are implying if you were doing banking on a spreadsheet then you closed that program and opened this program...back in the day...you would almost expect to see some data from that spreadsheet program, maybe the binary maybe the data, maybe something else, due to the nature of the operating systems use of memory it may be a fragment of the last program you ran, and a fragment of the one before that, and a fragment of a program still running that just did a free(), and so on.
Naturally, once we started to get connected to each other and hackers wanted to take over and send themselves your info or do other bad things, you can see how trivial it would be to write a program to look for passwords or bank accounts or whatever.
So not only do we have protections today to prevent one program sniffing around in another programs space, we generally assume that, today, before our program gets some memory that was used by some other program, it is wiped.
But if you disassemble even a simple hello world printf program you will see that there is a fair amount of bootstrap code that happens before main() is called. As far as the operating system is concerned, all of that code is part of our one program so even if (let's assume) memory were zeroed or cleaned before the OS loads and launches our program. Before main, within our program, we are using the stack memory to do stuff, leaving behind values, that a function like test() will see.
You may find that each time you run the same binary, one compile many runs, that the "random" data is the same. Now you may find that if you add some other shared library call or something to the overall program, then maybe, maybe, that shared library stuff causes extra code pre-main to happen to try to be able to call the shared code, or maybe as the program runs it takes different paths now because of a side effect of a change to the overall binary and now the random values are different but consistent.
There are explanations why the values could be different each time from the same binary as well.
There is no ghost in the machine though. Stack is just memory, not uncommon when a computer boots to wipe that memory once if for no other reason than to set the ecc bits. After that that memory gets reused and reused and reused and reused. And depending on the overall architecture of the operating system. How the compiler builds your application and shared libraries. And other factors. What happens to be in memory where the stack pointer is pointing when your program runs and you read before you write (as a rule never read before you write, and good that compilers are now throwing warnings) is not necessarily random and the specific list of events that happened to get to that point, were not just random but controlled, are not values that you as the programmer may have predicted. Particularly if you do this at the main() level as you have. But be it main or seventeen levels of nested function calls, it is still just some memory that may or may not contain some stuff from before you got there. Even if the bootloader zeros memory, that is still a written zero that was left behind from some other program that came before you.
There are no doubt compilers that have features that relate to the stack that may do more work like zero at the end of the call or zero up front or whatever for security or some other reason someone thought of.
I would assume today that when an operating system like Windows or Linux or macOS runs your program it is not giving you access to some stale memory values from some other program that came before (spreadsheet with my banking information, email, passwords, etc). But you can trivially write a program to try (just malloc() and print or do the same thing you did but bigger to look at the stack). I also assume that program A does not have a way to get into program B's memory that is running concurrently. At least not at the application level. Without hacking (malloc() and print is not hacking in my use of the term).
The array ghosts is uninitialized, and because it was declared inside of a function and is not static (formally, it has automatic storage duration), its values are indeterminate.
This means that you could read any value, and there's no guarantee of any particular value.
For example: I created a simple C program that prints "Hello, World", compiled it and it created an executable that had a size of 39.8Kb.
following this question I was able to create the equivalent but written in Assembly the size of this program was 39.6Kb.
This surprised me greatly as I expected the assembly program to be smaller than the C program. As the question indicated it uses a C header and the gcc compiler. Would this make the assembly program bigger or is it normal for them to be both roughly the same size?
Using the strip command I reduced both files. This removed debug code and now both have very similar file sizes. Both 18.5Kb.
test.c:
If your hand written code is on par with a compiled function, then sure they are going to be similar in size, they are doing the same thing and if you can compete with a compiler you will be the same or similar.
Now your file sizes indicate you are looking at the wrong thing all together. The file you are looking at while called a binary has a ton of other stuff in it. You want to compare apples to apples in this context then compare the size of the functions, the machine code, not the size of the container that holds the functions plus debug info plus strings plus a number of other things.
Your experiment is flawed but the results very loosely indicate the expected result. But that is if you are producing code in the same way. The odds of that are slim so saying that no you shouldnt expect similar results unless you are producing code in the same way.
take this simple function
unsigned int fun ( unsigned int a, unsigned int b)
{
return(a+b+1);
}
the same compiler produced this:
00000000 <fun>:
0: e52db004 push {r11} ; (str r11, [sp, #-4]!)
4: e28db000 add r11, sp, #0
8: e24dd00c sub sp, sp, #12
c: e50b0008 str r0, [r11, #-8]
10: e50b100c str r1, [r11, #-12]
14: e51b2008 ldr r2, [r11, #-8]
18: e51b300c ldr r3, [r11, #-12]
1c: e0823003 add r3, r2, r3
20: e2833001 add r3, r3, #1
24: e1a00003 mov r0, r3
28: e28bd000 add sp, r11, #0
2c: e49db004 pop {r11} ; (ldr r11, [sp], #4)
30: e12fff1e bx lr
and this
00000000 <fun>:
0: e2811001 add r1, r1, #1
4: e0810000 add r0, r1, r0
8: e12fff1e bx lr
because of different settings. 13 instructions vs 3, over 4 times larger.
A human might generate this directly from the C, nothing fancy
add r0,r0,r1
add r0,r0,#1
bx lr
not sure from order of operations if you technically have to add the one to b before adding that sum to a. Or if it doesnt matter. I went left to right the compiler went right to left.
so you could say that the compiler and my assembly produced the same number of bytes of binary, or you could say that the compiler produced something over 4 times larger.
Take the above and expand that into a real program that does useful things.
Exercise to the reader (the OP, please dont spoil it) to figure out why the compiler can produce two different correct solutions that are so different in size.
EDIT
.exe, elf and other "binary" formats as mentioned can contain debug information, ascii strings that contain names of functions/labels that make for pretty debug screens. Which are part of the "binary" in that they are part of the baggage but are not machine code nor data used when executing that program, at least not the stuff I am mentioning. You can without changing the machine code nor data the program needs, manipulate the size of your .exe or other file format using compiler settings, so the same compiler-assembler-linker or assembler-linker path can make the binary file in some senses of that word larger or smaller by including or not this additional baggage. So that is part of understanding file sizes and why perhaps even if your hello world programs were different sizes, the overall file might be around the same size, if one is 10 bytes longer but the .exe is 40K then that 10 bytes is in the noise. But if I understand your question, that 10 bytes is what you are interested in knowing how it compares between compiled and hand written C.
Also note that compilers are made by humans, so the output they produce is on par with what at least those humans can produce, other humans can do better, many do worse depending on your definition of better and worse.
the size 39+ Kb absolute not related to compiler and language used (c/c++ or asm) different optimizations, debug information, etc - can change size of this tinny code on say 1000 bytes. but not more. i for test build next program
#include <Windows.h>
#include <stdio.h>
void ep(void*)
{
ExitProcess(printf("Hello, World"));
}
linker options:
/INCREMENTAL:NO /NOLOGO /MANIFEST:NO /NODEFAULTLIB
/SUBSYSTEM:CONSOLE /OPT:REF /OPT:ICF /LTCG /ENTRY:"ep" /MACHINE:X64 kernel32.lib msvcrt.lib
and got size 2560 bytes exe for both x86/x64.
in what different ? in /NODEFAULTLIB and my version of msvcrt.lib - which is pure import library.
the rest 35kb+ size you give by used static linked c runtime. even if you write program on asm - you need use some lib for link to printf. and your lib containing some code which is static linked with your code. in this code this 35kb.
task is not c++ vs asm - no different here. task in use c-runtime or not use
I agree with old_time but I also did a quick test for ground truth. With VS-2017 Pro, I get similar results (~37KB) on the size of the executable, but only if I look in the debug output folder. After building for release, it's closer to ~9KB. Much of that difference is in the size of the static libraries needed to call into the OS/C-runtime DLL's.
EDIT: Despite the fact that most modern C compilers can match or out-perform most hand written assembly code, the hand written variety can be smaller by virtue of the fact that it doesn't have to have all that C run-time over-head, but the difference is rarely enough to warrant the extra development and maintenance costs of assembler code, particularly for non-trivial applications. There's a reason that most of the modern OS kernels are written predominantly in C or other high-level languages with only pin-hole assembler optimizations in a handful of critical functions.
Trivial "hello world" class programs are not a good comparison for C vs assembler. There's just not enough opportunities for the compiler or the human to do much in the way of optimization. Write a math or data processing library and application and compare those. I'd be willing to bet the compiler will kick your but.
I am newbie. I have difficulties with understanding memory ARM memory map.
I have found example of simple sorting algorithm
AREA ARM, CODE, READONLY
CODE32
PRESERVE8
EXPORT __sortc
; r0 = &arr[0]
; r1 = length
__sortc
stmfd sp!, {r2-r9, lr}
mov r4, r1 ; inner loop counter
mov r3, r4
sub r1, r1, #1
mov r9, r1 ; outer loop counter
outer_loop
mov r5, r0
mov r4, r3
inner_loop
ldr r6, [r5], #4
ldr r7, [r5]
cmp r7, r6
; swap without swp
strls r6, [r5]
strls r7, [r5, #-4]
subs r4, r4, #1
bne inner_loop
subs r9, r9, #1
bne outer_loop
ldmfd sp!, {r2-r9, pc}^
END
And this assembly should be called this way from C code
#define MAX_ELEMENTS 10
extern void __sortc(int *, int);
int main()
{
int arr[MAX_ELEMENTS] = {5, 4, 1, 3, 2, 12, 55, 64, 77, 10};
__sortc(arr, MAX_ELEMENTS);
return 0;
}
As far as I understand this code creates array of integers on the stack and calls _sortc function which implemented in assembly. This function takes this values from the stack and sorts them and put back on the stack. Am I right ?
I wonder how can I implement this example using only assembly.
For example defining array of integers
DCD 3, 7, 2, 8, 5, 7, 2, 6
BTW Where DCD declared variables are stored in the memory ??
How can I operate with values declared in this way ? Please explain how can I implement this using assembly only without any C code, even without stack, just with raw data.
I am writing for ARM7TDMI architecture
AREA ARM, CODE, READONLY - this marks start of section for code in the source.
With similar AREA myData, DATA, READWRITE you can start section where it's possible to define data like data1 DCD 1,2,3, this will compile as three words with values 1, 2, 3 in consecutive bytes, with label data1 pointing to the first byte of first word. (some AREA docs from google).
Where these will land in physical memory after loading executable depends on how the executable is linked (linker is using a script file which is helping him to decide which AREA to put where, and how to create symbol table for dynamic relocation done by the executable loader, by editing the linker script you can adjust where the code and data land, but normally you don't need to do that).
Also the linker script and assembler directives can affect size of available stack, and where it is mapped in physical memory.
So for your particular platform: google for memory mappings on web and check the linker script (for start just use linker option to produce .map file to see where the code and data are targeted to land).
So you can either declare that array in some data area, then to work with it, you load symbol data1 into register ("load address of data1"), and use that to fetch memory content from that address.
Or you can first put all the numbers into the stack (which is set probably to something reasonable by the OS loader of your executable), and operate in the code with the stack pointer to access the numbers in it.
You can even DCD some values into CODE area, so those words will end between the instructions in memory mapped as read-only by executable loader. You can read those data, but writing to them will likely cause crash. And of course you shouldn't execute them as instructions by accident (forgetting to put some ret/jump instruction ahead of DCD).
without stack
Well, this one is tricky, you have to be careful to not use any call/etc. and to have interrupts disabled, etc.. basically any thing what needs stack.
When people code a bootloader, usually they set up some temporary stack ASAP in first few instructions, so they can use basic stack functionality before setting up whole environment properly, or loading OS. A space for that temporary stack is often reserved somewhere in/after the code, or an unused memory space according to defined machine state after reset.
If you are down to the metal, without OS, usually all memory is writeable after reset, so you can then intermix code and data as you wish (just jumping around the data, not executing them by accident), without using AREA definitions.
But you should make your mind, whether you are creating application in user space of some OS (so you have things like stack and data areas well defined and you can use them for your convenience), or you are creating boot loader code which has to set it all up for itself (more difficult, so I would suggest at first going into user land of some OS, having C wrapper around with clib initialized is often handy too, so you can call things like printf from ASM for convenient output).
How can I operate with values declared in this way
It doesn't matter in machine code, which way the values were declared. All that matters is, if you have address of the memory, and if you know the structure, how the data are stored there. Then you can work with them in any way you want, using any instruction you want. So body of that asm example will not change, if you allocate the data in ASM, you will just pass the pointer as argument to it, like the C does.
edit: some example done blindly without testing, may need further syntax fixing to work for OP (or maybe there's even some bug and it will not work at all, let me know in comments if it did):
AREA myData, DATA, READWRITE
SortArray
DCD 5, 4, 1, 3, 2, 12, 55, 64, 77, 10
SortArrayEnd
AREA ARM, CODE, READONLY
CODE32
PRESERVE8
EXPORT __sortasmarray
__sortasmarray
; if "add r0, pc, #SortArray" fails (code too far in memory from array)
; then this looks like some heavy weight way of loading any address
; ldr r0, =SortArray
; ldr r1, =SortArrayEnd
add r0, pc, #SortArray ; address of array
; calculate array size from address of end
; (as I couldn't find now example of thing like "equ $-SortArray")
add r1, pc, #SortArrayEnd
sub r1, r1, r0
mov r1, r1, lsr #2
; do a direct jump instead of "bl", so __sortc returning
; to lr will actually return to called of this
b __sortc
; ... rest of your __sortc assembly without change
You can call it from C code as:
extern void __sortasmarray();
int main()
{
__sortasmarray();
return 0;
}
I used among others this Introducing ARM assembly language to refresh my ARM asm memory, but I'm still worried this may not work as is.
As you can see, I didn't change any thing in the __sortc. Because there's no difference in accessing stack memory, or "dcd" memory, it's the same computer memory. Once you have the address to particular word, you can ldr/str it's value with that address. The __sortc receives address of first word in array to sort in both cases, from there on it's just memory for it, without any context how that memory was defined in source, allocated, initialized, etc. As long as it's writeable, it's fine for __sortc.
So the only "dcd" related thing from me is loading array address, and the quick search for ARM examples shows it may be done in several ways, this add rX, pc, #label way is optimal, but does work only for +-4k range? There's also pseudo instruction ADR rX, #label doing this same thing, and maybe switching to other in case of range problem? For any range it looks like ldr rX, = label form is used, although I'm not sure if it's pseudo instruction or how it works, check some tutorials and disassembly the machine code to see how it was compiled.
It's up to you to learn all the ARM assembly peculiarities and how to load addresses of arrays, I don't need ARM ASM at the moment, so I didn't dig into those details.
And there should be some equ way to define length of array, instead of calculating it in code from end address, but I couldn't find any example, and I'm not going to read full Assembler docs to learn about all it's directives (in gas I think ArrayLength equ ((.-SortArray)/4) would work).
I found some code in FreeRTOS (FreeRTOSV7.4.0\FreeRTOS\Source\tasks.c):
void vTaskSuspendAll( void )
{
/* A critical section is not required as the variable is of type
portBASE_TYPE. */
++uxSchedulerSuspended;
}
It is explicitly said no need to protect due to the type is "portBASE_TYPE", which is a "long" type. My understood is that it assumes the self-increment to this type is atomic. But after I disassembled it I could not find any proof, its a plain load->add->store. Then is it a problem?
void vTaskSuspendAll( void )
{
/* A critical section is not required as the variable is of type
portBASE_TYPE. */
++uxSchedulerSuspended;
4dc: 4b03 ldr r3, [pc, #12] ; (4ec <vTaskSuspendAll+0x10>)
4de: f8d3 2118 ldr.w r2, [r3, #280] ; 0x118
4e2: 1c50 adds r0, r2, #1
4e4: f8c3 0118 str.w r0, [r3, #280] ; 0x118
4e8: 4770 bx lr
4ea: bf00 nop
4ec: 00000000 .word 0x00000000
000004f0 <xTaskGetTickCount>:
return xAlreadyYielded;
}
It's not atomic, as you've documented. But it could still be "thread safe" in a less strict sense: a long can't be in an inconsistent state. The extent of the danger here is that if n threads call vTaskSuspendAll then uxSchedulerSuspended will be incremented by anywhere between 1 and n.
But this could be perfectly fine if the variable is something that doesn't need to be perfect, like a tracker for how many times the user asked to suspend. There's "thread safe" meaning "this operation produces the same result, no matter how its calls are interleaved" and there's "thread safe" meaning "nothing explodes if you call this from multiple threads".
No, incrementing values in C is not guaranteed to be atomic. You need to provide synchronization, or use a system-specific library to perform atomic increments/decrements.
The operation is not atomic, but nowhere does it say it is. However, the code is thread safe, but you would have to be very familiar with what the code was doing, and how it fitted into the design of the scheduler to know that. It does not matter if other tasks modify the variable between the load and store because when the executing task next runs it will find the variable in the same state as when the original load was performed (so the modify and write portions are still consistent and valid).
As a previous posted notes, the long cannot be in an inconsistent state because it is the base type of the architecture on which it is running. Consider however what would happen if the code was running on an 8 bit machine (or 16 bit) and the variable was 32 bit. Then it would not be thread safe because the full 32 bits would be modified byte or word at a time, rather than all at once. In that scenario, one byte might be loaded into a register, modified, then written back to RAM (leaving the other three bytes unmodified) when a context switch occurs. If the next task that executed read the same variable it would read one byte that had been modified and three bytes that had not - and you have a major problem.
Having r1,r3 and r4 of type uint32x4_t loaded into NEON registers I have the following code:
r3 = veorq_u32(r0,r3);
r4 = r1;
r1 = vandq_u32(r1,r3);
r4 = veorq_u32(r4,r2);
r1 = veorq_u32(r1,r0);
And I was just wondering whether GCC actually translates r4 = r1 into the vmov instruction. Looking at the disassembled code I wasn't surprised that it didn't. (moreover I can't figure out what the generated assembly code actually does)
Skimming through ARM's NEON intrinsics reference I couldn't find any simple vector->vector assignment intrinsic.
What's the easiest way to achieve this? I'm not sure how an inlined assembly code would look like since I don't know in which registers were r1 and r4 assigned by vld1q_u32. I don't need an actual swap, just assignment.
C has a concept of an abstract machine. Assignments and other operations are described in terms of this abstract machine. The assignment r4 = r1; says to assign r4 the value of r1 in the abstract machine.
When the compiler generates instructions for a program, it generally does not exactly mimic everything that occurs in the abstract machine. It translates the operations that occur in the abstract machine into processor instructions that get the same results. The compiler will skip things like move instructions if it can figure out that it can get the same results without them.
In particular, the compiler might not keep r1 in the same place every time. It might load it from memory into some register R7 the first time you need it. But then it might implement your statement r1 = vandq_u32(r1,r3); by putting the result in R8 while keeping the original value of r1 in R7. Then, when you later have r4 = veorq_u32(r4,r2);, the compiler can use the value in R7, because it still contains that value that r4 would have (from the r4 = r1; statement) in the abstract machine.
Even if you explicitly wrote a vmov intrinsic, the compiler might not issue an instruction for it, as long as it issues instructions that get the same result in the end.