Below is a fragment of code I'm using for an embedded system. I pass the -ffunction-sections and -fdata-sections options to gcc:
#define FAST_DATA __attribute__((section(".fast.data")))
int a1 = 1;
int a2 = 1;
FAST_DATA int a3 = 1;
FAST_DATA int a4 = 1;
The linker will allocate these symbols as below (map file):
.data.a1 0x20000020 0x4 ./main.o
0x20000020 a1
.data.a2 0x20000024 0x4 ./main.o
0x20000024 a2
.fast.data 0x10000010 0x8 ./main.o
0x10000010 a4
0x10000014 a3
If for example I don't use the variable a2, the linker will discard it (I pass --gc-sections to ld).
But if I use a3 and don't use a4, then a4 will not be discarded. I guess that's because it is placed in the same section as a3.
If I define a3 and a4 in separate .c files, they will be put in two different sections, with the same name .fast.data, but for each file. The garbage collector will work as expected.
Is there any way to tell gcc to append the symbol name even when using __attribute__((section("...")))?
For a4 in my case that would result in .fast.data.a4.
In the linker script I will catch all *(.fast.data*).
I have a large code base using custom sections a lot and manual modifications to each declaration would not be desirable.
If no one else has a better idea, here is a kludge for you:
#define DECLARE_FAST_DATA(type, name) \
__attribute__((section(".fast.data." #name))) type name
usage:
int a1 = 1;
int a2 = 1;
DECLARE_FAST_DATA(int, a3) = 1;
DECLARE_FAST_DATA(int, a4);
This uses the standard C features of "stringification" and "string literal concatenation" to synthesize the section attribute you want.
What about extending your macro?
#define FAST_DATA(_a,_b,_c) \
__attribute__((section(".fast.data." #_b))) _a _b = _c
Related
I'm quite new to inline assembly, so I need your help to be sure that I use it correctly.
I need to add assembly code inside my C code that is compiled with the Risc-v toolchain. Please consider the following code:
int bar = 0xFF00;
int main(){
volatile int result;
int k;
k = funct();
int* ptr;
ptr = &bar;
asm volatile (".insn r 0x33, 0, 0, a4, a5, a3":
"=m"(*ptr), "=r"(result):
[a5] "m"(*ptr), [a3] "r"(k) :
);
}
...
What I want to do is bar = bar+k. Actually, I want to change the content of the memory location that bar resides in. But the code that I wrote gets the address of bar and adds it to k. Does anybody know what the problem is?
Unfortunately, you have misunderstood the syntax.
In the assembler string, you can either refer to an argument using %0, %1, where the number is the n:th argument passed to the asm directive. Alternatively, you can use the symbolic name, %[myname] which refers to the argument in the form [myname]"r"(k).
Note that the symbolic name is the same as using the number, the name itself doesn't imply anything. In you example, one could get the impression that you are forcing the code to use a specific processor register. (There is another syntax for that, if you really need to use it.)
For example, if you write something like:
int bar = 0xFF00;
int main(){
volatile int result;
int k;
k = funct();
int* ptr;
ptr = &bar;
asm volatile (".insn r 0x33, 0, 0, %[res], %[res], %[ptr]":
[res]"+r"(result) : [ptr]"r"(ptr));
}
The IAR compiler will emit the following. As you can see a0 has been assigned the result variable (using the symbolic name res) and a1 assigned the variable ptr (here, the symbolic name is the same as the variable name).
\ 000014 0001'2503 lw a0, 0x0(sp)
\ 000018 0000'05B7 lui a1, %hi(bar)
\ 00001C 0005'8593 addi a1, a1, %lo(bar)
\ 000020 00B5'0533 .insn r 0x33, 0, 0, a0, a0, a1
\ 000024 00A1'2023 sw a0, 0x0(sp)
You can read more about the IAR inline assembly syntax in the book "IAR C/C++ Development Guide Compiling and linking for RISC-V", in chapter "Assembler Language Interface". The book is provided as a PDF, which you can access from within IAR Embedded Workbench.
Based on the snippet provided in your question, I tried the following code with the IAR C/C++ Compiler for RISC-V:
int funct();
int funct() { return 0xA5; } // stub
int bar = 0xFF00;
int main() {
int k = funct();
int* ptr = &bar;
asm volatile (".insn r 0x33, 0, 0, %[res], %[ptr], %[k]"
: [res]"=r"(*ptr)
: [ptr]"r"(*ptr), [k]"r"(k));
}
In this case, the .insn directive will generate add r,r,r which is effectively *ptr = *ptr + k.
In an earlier version of this answer it was assumed that there would be a requirement to be explicit about which registers to use. For that, explicit register selectors were used as the IAR compiler simply allows it (e.g., "a3", ="a3", "a4", "a5", etc.). At that point, as noted by #PeterCordes in the comments, GCC offered a different set of constraints and would require a different solution. However, if there is no need to be explicit about the registers, it is better to let the compiler decide which ones can be used directly. It will generally impose less overhead.
Considering the following code which many comes mostly from Bluedroid stack
#include <stdint.h>
#include <assert.h>
#define STREAM_TO_UINT16(u16, p) {u16 = ((uint16_t)(*(p)) + (((uint16_t)(*((p) + 1))) << 8)); (p) += 9;}
void func(uint8_t *param) {
uint8_t *stream = param;
uint16_t handle, handle2;
*stream = 5;
STREAM_TO_UINT16(handle, stream);
STREAM_TO_UINT16(handle2, stream);
assert(handle);
assert(handle2);
*stream = 7;
}
.file "opt.c"
.text
.align 4
.global func
.type func, #function
func:
entry sp, 32
movi.n a8, 5
s8i a8, a2, 0
movi.n a8, 7
s8i a8, a2, 18
retw.n
.size func, .-func
.ident "GCC: (crosstool-NG esp-2020r3) 8.4.0"
When it is compiled with NDEBUG, then assert() resolved to nothing and "handle" is optimized out with -O2 or's' or '3' . As a result, the macro is not expanded and the pointer is not incremented.
I know that I can make "handle" volatile as one option to solve the issue and I agree adding variable modification in macros is dangerous, but this is not my code, this is Bluedroid.
Well first, is this borderline a gcc bug and then is there a way to tell gcc to not optimize out unused variable?
Oops ... no I just re-read the ISA of the eXtensa and I was wrong, the value of a8 is stored where a2 points, with offset, so this is correct. I need to look somewhere else b/c the core of the problem is that as soon as I set NDEBUG, my bluedroid stacks (this is on esp32) stops working, so I was searching for differences and looking where the compiler was whining (unused variables). Thanks for taking the time to answer.
On Windows data can be loaded from DLLs, but it requires indirection through a pointer in the import address table. As a result, the compiler must know if an object that is being accessed is being imported from a DLL by using the __declspec(dllimport) type specifier.
This is unfortunate because it means a that a header for a Windows library designed to be used as either a static library or a dynamic library needs to know which version of the library the program is linking to. This requirement is not applicable to functions, which are transparently emulated for DLLs with a stub function calling the real function, whose address is stored in the import address table.
On Linux the dynamic linker (ld.so) copies the values of all linked data objects from a shared object into a private mapped region for each process. This doesn't require indirection because the address of the private mapped region is local to the module, so its address is decided when the program is linked (and in the case of position independent executables, relative addressing is used).
Why doesn't Windows do the same? Is there a situation where a DLL might be loaded more than once, and thus require multiple copies of linked data? Even if that was the case, it wouldn't be applicable to read only data.
It seems that the MSVCRT handles this issue by defining the _DLL macro when targeting the dynamic C runtime library (with the /MD or /MDd flag), then using that in all standard headers to conditionally declare all exported symbols with __declspec(dllimport). I suppose you could reuse this macro if you only supported statically linking when using the static C runtime and dynamically linking when using the dynamic C runtime.
References:
LNK4217 - Russ Keldorph's WebLog (emphasis mine)
__declspec(dllimport) can be used on both code and data, and its semantics are subtly different between the two. When applied to a routine call, it is purely a performance optimization. For data, it is required for correctness.
[...]
Importing data
If you export a data item from a DLL, you must declare it with __declspec(dllimport) in the code that accesses it. In this case, instead of generating a direct load from memory, the compiler generates a load through a pointer, resulting in one additional indirection. Unlike calls, where the linker will fix up the code correctly whether the routine was declared __declspec(dllimport) or not, accessing imported data requires __declspec(dllimport). If omitted, the code will wind up accessing the IAT entry instead of the data in the DLL, probably resulting in unexpected behavior.
Importing into an Application Using __declspec(dllimport)
Using __declspec(dllimport) is optional on function declarations, but the compiler produces more efficient code if you use this keyword. However, you must use `__declspec(dllimport) for the importing executable to access the DLL's public data symbols and objects.
Importing Data Using __declspec(dllimport)
When you mark the data as __declspec(dllimport), the compiler automatically generates the indirection code for you.
Importing Using DEF Files (interesting historical notes about accessing the IAT directly)
How do I share data in my DLL with an application or with other DLLs?
By default, each process using a DLL has its own instance of all the DLLs global and static variables.
Linker Tools Warning LNK4217
What happens when you get dllimport wrong? (seems to be unaware of data semantics)
How do I export data from a DLL?
CRT Library Features (documents the _DLL macro)
Linux and Windows use different strategies for accessing data stored in dynamic libraries.
On Linux, an undefined reference to an object is resolved to a library at link time. The linker finds the size of the object and reserves space for it in the .bss or the .rdata segment of the executable. When executed, the dynamic linker (ld.so) resolves the symbol to a dynamic library (again), and copies the object from the dynamic library to the process's memory.
On Windows, an undefined reference to an object is resolved to an import library at link time, and no space is reserved for it. When the module is executed, the dynamic linker resolves the symbol to a dynamic library, and creates a copy on write memory map in the process, backed by a shared data segment in the dynamic library.
The advantage of a copy on write memory map is that if the linked data is unchanged, then it can be shared with other processes. In practice this is a trifling benefit which greatly increases complexity, both for the toolchain and programs using dynamic libraries. For objects which are actually written this is always less efficient.
I suspect, although I have no evidence, that this decision was made for a particular and now outdated use case. Perhaps it was common practice to use large (for the time) read only objects in dynamic libraries on 16-bit Windows (in official Microsoft programs or otherwise). Either way, I doubt anyone at Microsoft has the expertise and time to change it now.
In order to investigate the issue I created a program which writes to an object from a dynamic library. It writes one byte per page (4096 bytes) in the object, then writes the entire object, then retries the initial one byte per page write. If the object is reserved for the process before main is called, the first and third loops should take approximately the same time, and the second loop should take longer than both. If the object is a copy on write map to a dynamic library, the first loop should take at least as long as the second, and the third should take less time than both.
The results are consistent with my hypothesis, and analyzing the disassembly confirms that Linux accesses the dynamic library data at a link time address, relative to the program counter. Surprisingly, Windows not only indirectly accesses the data, the pointer to the data and its length are reloaded from the import address table every loop iteration, with optimizations enabled. This was tested with Visual Studio 2010 on Windows XP, so maybe things have changed, although I wouldn't think that it has.
Here are the results for Linux:
$ dd bs=1M count=16 if=/dev/urandom of=libdat.dat
$ xxd -i libdat.dat libdat.c
$ gcc -O3 -g -shared -fPIC libdat.c -o libdat.so
$ gcc -O3 -g -no-pie -L. -ldat dat.c -o dat
$ LD_LIBRARY_PATH=. ./dat
local = 0x1601060
libdat_dat = 0x601040
libdat_dat_len = 0x601020
dirty= 461us write= 12184us retry= 456us
$ nm dat
[...]
0000000000601040 B libdat_dat
0000000000601020 B libdat_dat_len
0000000001601060 B local
[...]
$ objdump -d -j.text dat
[...]
400693: 8b 35 87 09 20 00 mov 0x200987(%rip),%esi # 601020 <libdat_dat_len>
[...]
4006a3: 31 c0 xor %eax,%eax # zero loop counter
4006a5: 48 8d 15 94 09 20 00 lea 0x200994(%rip),%rdx # 601040 <libdat_dat>
4006ac: 0f 1f 40 00 nopl 0x0(%rax) # align loop for efficiency
4006b0: 89 c1 mov %eax,%ecx # store data offset in ecx
4006b2: 05 00 10 00 00 add $0x1000,%eax # add PAGESIZE to data offset
4006b7: c6 04 0a 00 movb $0x0,(%rdx,%rcx,1) # write a zero byte to data
4006bb: 39 f0 cmp %esi,%eax # test loop condition
4006bd: 72 f1 jb 4006b0 <main+0x30> # continue loop if data is left
[...]
Here are the results for Windows:
$ cl /Ox /Zi /LD libdat.c /link /EXPORT:libdat_dat /EXPORT:libdat_dat_len
[...]
$ cl /Ox /Zi dat.c libdat.lib
[...]
$ dat.exe # note low resolution timer means retry is too small to measure
local = 0041EEA0
libdat_dat = 1000E000
libdat_dat_len = 1100E000
dirty= 20312us write= 3125us retry= 0us
$ dumpbin /symbols dat.exe
[...]
9000 .data
1000 .idata
5000 .rdata
1000 .reloc
17000 .text
[...]
$ dumpbin /disasm dat.exe
[...]
004010BA: 33 C0 xor eax,eax # zero loop counter
[...]
004010C0: 8B 15 8C 63 42 00 mov edx,dword ptr [__imp__libdat_dat] # store data pointer in edx
004010C6: C6 04 02 00 mov byte ptr [edx+eax],0 # write a zero byte to data
004010CA: 8B 0D 88 63 42 00 mov ecx,dword ptr [__imp__libdat_dat_len] # store data length in ecx
004010D0: 05 00 10 00 00 add eax,1000h # add PAGESIZE to data offset
004010D5: 3B 01 cmp eax,dword ptr [ecx] # test loop condition
004010D7: 72 E7 jb 004010C0 # continue loop if data is left
[...]
Here is the source code used for both tests:
#include <stdio.h>
#ifdef _WIN32
#include <windows.h>
typedef FILETIME time_l;
time_l time_get(void) {
FILETIME ret; GetSystemTimeAsFileTime(&ret); return ret;
}
long long int time_diff(time_l const *c1, time_l const *c2) {
return 1LL*c2->dwLowDateTime/100-c1->dwLowDateTime/100+c2->dwHighDateTime*100000-c1->dwHighDateTime*100000;
}
#else
#include <unistd.h>
#include <time.h>
#include <stdlib.h>
typedef struct timespec time_l;
time_l time_get(void) {
time_l ret; clock_gettime(CLOCK_MONOTONIC, &ret); return ret;
}
long long int time_diff(time_l const *c1, time_l const *c2) {
return 1LL*c2->tv_nsec/1000-c1->tv_nsec/1000+c2->tv_sec*1000000-c1->tv_sec*1000000;
}
#endif
#ifndef PAGESIZE
#define PAGESIZE 4096
#endif
#ifdef _WIN32
#define DLLIMPORT __declspec(dllimport)
#else
#define DLLIMPORT
#endif
extern DLLIMPORT unsigned char volatile libdat_dat[];
extern DLLIMPORT unsigned int libdat_dat_len;
unsigned int local[4096];
int main(void) {
unsigned int i;
time_l t1, t2, t3, t4;
long long int d1, d2, d3;
t1 = time_get();
for(i=0; i < libdat_dat_len; i+=PAGESIZE) {
libdat_dat[i] = 0;
}
t2 = time_get();
for(i=0; i < libdat_dat_len; i++) {
libdat_dat[i] = 0xFF;
}
t3 = time_get();
for(i=0; i < libdat_dat_len; i+=PAGESIZE) {
libdat_dat[i] = 0;
}
t4 = time_get();
d1 = time_diff(&t1, &t2);
d2 = time_diff(&t2, &t3);
d3 = time_diff(&t3, &t4);
printf("%-15s= %18p\n%-15s= %18p\n%-15s= %18p\n", "local", local, "libdat_dat", libdat_dat, "libdat_dat_len", &libdat_dat_len);
printf("dirty=%9lldus write=%9lldus retry=%9lldus\n", d1, d2, d3);
return 0;
}
I sincerely hope someone else benefits from my research. Thanks for reading!
I wrote a bash script to determine the size of gcc's datatypes (e.g. ./sizeof int double outputs the respective sizes of int and double) by wrapping each of its arguments in the following P() macro and then compiling and running the code.
#define P(x) printf("sizeof(" #x ") = %u\n", (unsigned int)sizeof(x))
The problem is that this is relative slow (it takes a whole second!), especially the linking step (since compiling with -c or -S takes virtually no time, and so does running the outputted binary). One second is not really that slow by itself, but if I were to use this script in other scripts, it would add up.
Is there a faster, less roundabout way to find out what sizes gcc uses for datatypes?
You can achieve the functionality for standard types using the GCC's preprocessor only. For standard types there are predefined macros:
__SIZEOF_INT__
__SIZEOF_LONG__
__SIZEOF_LONG_LONG__
__SIZEOF_SHORT__
__SIZEOF_POINTER__
__SIZEOF_FLOAT__
__SIZEOF_DOUBLE__
__SIZEOF_LONG_DOUBLE__
__SIZEOF_SIZE_T__
__SIZEOF_WCHAR_T__
__SIZEOF_WINT_T__
__SIZEOF_PTRDIFF_T__
So, by using code like the following:
#define TYPE_TO_CHECK __SIZEOF_INT__
#define VAL_TO_STRING(x) #x
#define V_TO_S(x) VAL_TO_STRING(x)
#pragma message V_TO_S(TYPE_TO_CHECK)
#error "terminate"
you will be able to get the value of __SIZEOF_INT__ from the preprocessor itself without even starting the compilation. In your script you can define the TYPE_TO_CHECK (with -D) to whatever you need and pass it to gcc. Of course you will get some junk output, but I believe you can deal with that.
You can use the 'negative array size' trick that autoconf (see: AC_COMPUTE_INT) uses. That way, you don't need to link or execute code. Therefore, it also works when cross compiling. e.g.,
int n[1 - 2 * !(sizeof(double) == 8)];
fails to compile if: sizeof(double) != 8
The downside is, you might have to pass -DCHECK_SIZE=8 or something similar in the command line, since it might take more than one pass to detect an unusual value. So, I'm not sure if this will be any faster in general - but you might be able to take advantage of it.
Edit: If you are using gcc exclusively, I think #wintermute's comment is probably the best solution.
Here are three possible solutions.
The first one will work with any type whose size is less than 256. On my system, it takes about 0.04s (since it doesn't need headers or libraries other than the basic runtime). One downside is that it will only do one at a time, because of the small size of the output channel. Another problem is that it doesn't compensate for slow linking on some systems (notably MinGW):
howbig() {
gcc -x c - <<<'int main() { return sizeof ('$*'); }' && ./a.out
echo $?
}
$ time howbig "struct { char c; union { double d; int i[3];};}"
24
real 0m0.041s
user 0m0.031s
sys 0m0.014s
$ time howbig unsigned long long
8
real 0m0.044s
user 0m0.035s
sys 0m0.009s
If you wanted to be able to do larger types, you could get the size one byte at a time, at the cost of a couple more centiseconds:
howbig2 ()
{
gcc -x c - <<< 'int main(int c,char**v) {
return sizeof ('$*')>>(8*(**++v&3)); }' &&
echo $((0x$(printf %02x $(./a.out 3;echo $?) $(./a.out 2;echo $?) \
$(./a.out 1;echo $?) $(./a.out 0;echo $?)) ))
}
$ time howbig2 struct '{double d; long long u[12];}([973])'
101192
real 0m0.054s
user 0m0.036s
sys 0m0.019s
If you are compiling for x86, the following will probably work, although I'm not in a position to test it thoroughly on a wide variety of architectures and platforms. It avoids the link step (notoriously slow on MinGW, for example), by analyzing the compiled assembly output. (It would probably be slightly more robust to analyze the compiled object binary, but I fear that binutils on MinGW are also slow.) Even on Ubuntu, it is significantly faster:
howbig3 () {
gcc -S -o - -x c - <<< 'int hb(void) { return sizeof ('$*'); }' |
awk '$1~/movl/&&$3=="%eax"{print substr($2,2,length($2)-2)}'
}
$ time howbig3 struct '{double d; long long u[12];}([973])'
101192
real 0m0.020s
user 0m0.017s
sys 0m0.004s
Using nm with no code
Just make your thing a global variable. nm can report its size.
// getsize.c
struct foo {
char str[3];
short s; // expect padding galore...
int i;
} my_struct;
Compile but don't link, then use nm:
$ gcc -c getsize.c
$ nm getsize.o --format=posix
my_struct C 000000000000000c 000000000000000c
Note that the last column is the size (in hex), here is how we can get it:
$ nm test.o -P | cut -d ' ' -f 4
000000000000000c
# or in decimal
$ printf %d 0x`nm test.o -P | cut -d ' ' -f 4`
12
Using objdump with no code
If nm doesn't work for some reason, you can store the size itself in a global variable.
Start with this C file:
// getsize.c
struct foo { char str[3]; short s; int i; };
unsigned long my_sizeof = sizeof(struct foo);
Now we have to find the value of this variable from the object file.
$ gcc -c sizeof.c
$ objdump -Sj .data sizeof.o
test.o: file format elf64-x86-64
Disassembly of section .data:
0000000000000000 <my_sizeof>:
0: 0c 00 00 00 00 00 00 00 ........
Darn, little endian! You could write a script to parse this, but the following solution (assuming GCC extensions) will force it to always be big endian:
// getsize.c
struct foo { char str[3]; short s; int i; };
struct __attribute__ ((scalar_storage_order("big-endian"))) {
unsigned long v;
} my_sizeof = { sizeof(struct foo) };
This yields:
0000000000000000 <my_sizeof>:
0: 00 00 00 00 00 00 00 0c ........
Watch out! You can't just strip out all non-hex characters because sometimes the "...." stuff on the right will be valid ASCII. But the first one should always be a .. The following command keeps things between the : and the first ..
$ gcc -c sizeof.c
$ objdump -Sj .data sizeof.o |
sed '$!d # keep last line only
s/\s//g # remove tabs and spaces
s/.*:\([^.]*\)\..*/\1/' # only keep between : and .'
000000000000000c
If you happen to be in an IDE like VS2019, you can just type char foo[sizeof(MyType)] anywhere in the code, hover over foo and get the answer :)
In code reviews I ask for option (1) below to be used as it results in a symbol being created (for debugging) whereas (2) and (3) do not appear to do so at least for gcc and icc. However (1) is not a true const and cannot be used on all compilers as an array size. Is there a better option that includes debug symbols and is truly const for C?
Symbols:
gcc f.c -ggdb3 -g ; nm -a a.out | grep _sym
0000000100000f3c s _symA
0000000100000f3c - 04 0000 STSYM _symA
Code:
static const int symA = 1; // 1
#define symB 2 // 2
enum { symC = 3 }; // 3
GDB output:
(gdb) p symA
$1 = 1
(gdb) p symB
No symbol "symB" in current context.
(gdb) p symC
No symbol "symC" in current context.
And for completeness, the source:
#include <stdio.h>
static const int symA = 1;
#define symB 2
enum { symC = 3 };
int main (int argc, char *argv[])
{
printf("symA %d symB %d symC %d\n", symA, symB, symC);
return (0);
}
The -ggdb3 option should be giving you macro debugging information. But this is a different kind of debugging information (it has to be different - it tells the debugger how to expand the macro, possibly including arguments and the # and ## operators) so you can't see it with nm.
If your goal is to have something that shows up in nm, then I guess you can't use a macro. But that's a silly goal; you should want to have something that actually works in a debugger, right? Try print symC in gdb and see if it works.
Since macros can be redefined, gdb requires the program to be stopped at a location where the macro existed so it can find the correct definition. In this program:
#include <stdio.h>
int main(void)
{
#define X 1
printf("%d\n", X);
#undef X
printf("---\n");
#define X 2
printf("%d\n", X);
}
If you break on the first printf and print X you'll get the 1; next to the second printf and gdb will tell you that there is no X; next again and it will show the 2.
Also the gdb command info macro foo can be useful, if foo is a macro that takes arguments and you want to see its definition rather than expand it with a specific set of arguments. And if a macro expands to something that's not an expression, gdb can't print it so info macro is the only thing you can do with it.
For better inspection of the raw debugging information, try objdump -W instead of nm.
However (1) is not a true const and cannot be used on all compilers as an array size.
This can be used as array size on all compilers that support C99 and latter (gcc, clang). For others (like MSVC) you have only the last two options.
Using option 3 is preferred 2. enums are different from #define constants. You can use them for debugging. You can use enum constants as l-value as well unlike #define constants.