How to cross compile a c program to aarch64 with cpuid.h? - c

I'm trying to cross-compile a simple C program to aarch64 (arm64) from a 64bit Ubuntu Linux. Can someone please help me why i'm getting this error.
It says 'cpuid.h' is not found. I've tried compiling it on the 64bit linux, it works fine. But when using aarch64-linux-gnu-gcc it is giving errors.
I'm getting the following error.
aarch64-linux-gnu-gcc -O1 -fno-stack-protector -march=armv8-a test.c -o test
test.c:4:10: fatal error: cpuid.h: No such file or directory
4 | #include <cpuid.h>
| ^~~~~~~~~
compilation terminated.
The contents of test.c:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <cpuid.h>
// Requires that the user input the CPUID,
// plus the bytes "3" and "Q";
void succeed(char* string) {
printf("Yes, %s is correct!\n", string);
exit(0);
}
void fail(char* string) {
printf("No, %s is not correct.\n", string);
exit(1);
}
void shift_int_to_char(int i, char* buff) {
buff[0] = (i) & 0xFF;
buff[1] = (i >> 8) & 0xFF;
buff[2] = (i >> 16) & 0xFF;
buff[3] = (i >> 24) & 0xFF;
}
int main(int argc, char** argv) {
if (argc != 2) {
printf("Need exactly one argument.\n");
return -1;
}
unsigned int eax, ebx, ecx, edx;
char* buff = malloc(sizeof(char) * 15);
__get_cpuid(0, &eax, &ebx, &ecx, &edx);
shift_int_to_char(ebx, buff);
shift_int_to_char(edx, buff + 4);
shift_int_to_char(ecx, buff + 8);
buff[12] = '3';
buff[13] = 'Q';
buff[14] = '\0';
int correct = (strcmp(buff, argv[1]) == 0);
free(buff);
if (correct) {
succeed(argv[1]);
} else {
fail(argv[1]);
}
}

As explained in the comments, you need to port this program to the Aarch64 architecture, you cannot just compile the code as is. The features implemented in your SoC are exposed through the various AArch64 feature system registers or instruction set attribute registers, for example ID_AA64PFR1_EL1.
By convention, registers with names terminating with _EL1 cannot be accessed from a user-mode program running at EL0. Some support in the Operating System (running at EL1) is therefore required for reading those registers from a user-mode program - I used a Linux 5.15.0 kernel for the purpose of this answer.
On a recent Linux kernel, this can be achieved by using the HWCAP_CPUID API available in hwcaps - more details in this article , ARM64 CPU Feature Registers.
We can compile/execute the example code provided in Appendix I:
/*
* Sample program to demonstrate the MRS emulation ABI.
*
* Copyright (C) 2015-2016, ARM Ltd
*
* Author: Suzuki K Poulose <suzuki.poulose#arm.com>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*/
#include <asm/hwcap.h>
#include <stdio.h>
#include <sys/auxv.h>
#define get_cpu_ftr(id) ({ \
unsigned long __val; \
asm("mrs %0, "#id : "=r" (__val)); \
printf("%-20s: 0x%016lx\n", #id, __val); \
})
int main(void)
{
if (!(getauxval(AT_HWCAP) & HWCAP_CPUID)) {
fputs("CPUID registers unavailable\n", stderr);
return 1;
}
get_cpu_ftr(ID_AA64ISAR0_EL1);
get_cpu_ftr(ID_AA64ISAR1_EL1);
get_cpu_ftr(ID_AA64MMFR0_EL1);
get_cpu_ftr(ID_AA64MMFR1_EL1);
get_cpu_ftr(ID_AA64PFR0_EL1);
get_cpu_ftr(ID_AA64PFR1_EL1);
get_cpu_ftr(ID_AA64DFR0_EL1);
get_cpu_ftr(ID_AA64DFR1_EL1);
get_cpu_ftr(MIDR_EL1);
get_cpu_ftr(MPIDR_EL1);
get_cpu_ftr(REVIDR_EL1);
#if 0
/* Unexposed register access causes SIGILL */
get_cpu_ftr(ID_MMFR0_EL1);
#endif
return 0;
}
Execution:
gcc -o cpu-feature-registers cpu-feature-registers.c
./cpu-feature-registers
ID_AA64ISAR0_EL1 : 0x0000000000010000
ID_AA64ISAR1_EL1 : 0x0000000000000000
ID_AA64MMFR0_EL1 : 0x00000111ff000000
ID_AA64MMFR1_EL1 : 0x0000000000000000
ID_AA64PFR0_EL1 : 0x0000000000000011
ID_AA64PFR1_EL1 : 0x0000000000000000
ID_AA64DFR0_EL1 : 0x0000000000000006
ID_AA64DFR1_EL1 : 0x0000000000000000
MIDR_EL1 : 0x00000000410fd034
MPIDR_EL1 : 0x0000000080000000
REVIDR_EL1 : 0x0000000000000000
You now just need to identify the feature/instructions set register(s) that will provide you with the same kind of information provided by your x86_64 code.

Related

How can I print call trace in a C program [duplicate]

Is there any way to dump the call stack in a running process in C or C++ every time a certain function is called? What I have in mind is something like this:
void foo()
{
print_stack_trace();
// foo's body
return
}
Where print_stack_trace works similarly to caller in Perl.
Or something like this:
int main (void)
{
// will print out debug info every time foo() is called
register_stack_trace_function(foo);
// etc...
}
where register_stack_trace_function puts some sort of internal breakpoint that will cause a stack trace to be printed whenever foo is called.
Does anything like this exist in some standard C library?
I am working on Linux, using GCC.
Background
I have a test run that behaves differently based on some commandline switches that shouldn't affect this behavior. My code has a pseudo-random number generator that I assume is being called differently based on these switches. I want to be able to run the test with each set of switches and see if the random number generator is called differently for each one.
Survey of C/C++ backtrace methods
In this answer I will try to run a single benchmark for a bunch of solutions to see which one runs faster, while also considering other points such as features and portability.
Tool
Time / call
Line number
Function name
C++ demangling
Recompile
Signal safe
As string
C
C++23 <stacktrace> GCC 12.1
7 us
y
y
y
y
n
y
n
Boost 1.74 stacktrace()
5 us
y
y
y
y
n
y
n
Boost 1.74 stacktrace::safe_dump_to
y
n
n
glibc backtrace_symbols_fd
25 us
n
-rdynamic
hacks
y
y
n
y
glibc backtrace_symbols
21 us
n
-rdynamic
hacks
y
n
y
y
GDB scripting
600 us
y
y
y
n
y
n
y
GDB code injection
n
n
y
libunwind
y
libdwfl
4 ms
n
y
libbacktrace
y
Empty cells mean "TODO", not "no".
us: microsecond
Line number: shows actual line number, not just function name + a memory address.
It is usually possible to recover the line number from an address manually after the fact with addr2line. But it is a pain.
Recompile: requires recompiling the program to get your traces. Not recompiling is better!
Signal safe: crucial for the important uses case of "getting a stack trace in case of segfault": How to automatically generate a stacktrace when my program crashes
As string: you get the stack trace as a string in the program itself, as opposed to e.g. just printing to stdout. Usually implies not signal safe, as we don't know the size of the stack trace string size in advance, and therefore requires malloc which is not async signal safe.
C: does it work on a plain-C project (yes, there are still poor souls out there), or is C++ required?
Test setup
All benchmarks will run the following
main.cpp
#include <cstdlib> // strtoul
#include <mystacktrace.h>
void my_func_2(void) {
print_stacktrace(); // line 6
}
void my_func_1(double f) {
(void)f;
my_func_2();
}
void my_func_1(int i) {
(void)i;
my_func_2(); // line 16
}
int main(int argc, char **argv) {
long long unsigned int n;
if (argc > 1) {
n = std::strtoul(argv[1], NULL, 0);
} else {
n = 1;
}
for (long long unsigned int i = 0; i < n; ++i) {
my_func_1(1); // line 27
}
}
This input is designed to test C++ name demangling since my_func_1(int) and my_func_1(float) are necessarily mangled as a way to implement C++ function overload.
We differentiate between the benchmarks by using different -I includes to point to different implementations of print_stacktrace().
Each benchmark is done with a command of form:
time ./stacktrace.out 100000 &>/dev/null
The number of iterations is adjusted for each implementation to produce a total runtime of the order of 1s for that benchmark.
-O0 is used on all tests below unless noted. Stack traces may be irreparably mutilated by certain optimizations. Tail call optimization is a notable example of that: What is tail call optimization? There's nothing we can do about it.
C++23 <stacktrace>
This method was previously mentioned at: https://stackoverflow.com/a/69384663/895245 please consider upvoting that answer.
This is the best solution... it's portable, fast, shows line numbers and demangles C++ symbols. This option will displace every other alternative as soon as it becomes more widely available, with the exception perhaps only of GDB for one-offs without the need or recompilation.
cpp20_stacktrace/mystacktrace.h
#include <iostream>
#include <stacktrace>
void print_stacktrace() {
std::cout << std::stacktrace::current();
}
GCC 12.1.0 from Ubuntu 22.04 does not have support compiled in, so for now I built it from source as per: How to edit and re-build the GCC libstdc++ C++ standard library source? and set --enable-libstdcxx-backtrace=yes, and it worked!
Compile with:
g++ -O0 -ggdb3 -Wall -Wextra -pedantic -std=c++23 -o cpp20_stacktrace.out main.cpp -lstdc++_libbacktrace
Sample output:
0# print_stacktrace() at cpp20_stacktrace/mystacktrace.h:5
1# my_func_2() at /home/ciro/main.cpp:6
2# my_func_1(int) at /home/ciro/main.cpp:16
3# at /home/ciro/main.cpp:27
4# at :0
5# at :0
6# at :0
7#
If we try to use GCC 12.1.0 from Ubuntu 22.04:
sudo apt install g++-12
g++-12 -ggdb3 -O2 -std=c++23 -Wall -Wextra -pedantic -o stacktrace.out stacktrace.cpp -lstdc++_libbacktrace
It fails with:
stacktrace.cpp: In function ‘void my_func_2()’:
stacktrace.cpp:6:23: error: ‘std::stacktrace’ has not been declared
6 | std::cout << std::stacktrace::current();
| ^~~~~~~~~~
Checking build options with:
g++-12 -v
does not show:
--enable-libstdcxx-backtrace=yes
so it wasn't compiled in. Bibliography:
How to use <stacktrace> in GCC trunk?
How can I generate a C++23 stacktrace with GCC 12.1?
It does not fail on the include because the header file:
/usr/include/c++/12
has a feature check:
#if __cplusplus > 202002L && _GLIBCXX_HAVE_STACKTRACE
Boost stacktrace
The library has changed quite a lot around Ubuntu 22.04, so make sure your version matches: Boost stack-trace not showing function names and line numbers
The library is pretty much superseded by the more portable C++23 implementation, but remains a very good option for those that are not at that standard version yet, but already have a "Boost clearance".
Documented at: https://www.boost.org/doc/libs/1_66_0/doc/html/stacktrace/getting_started.html#stacktrace.getting_started.how_to_print_current_call_stack
Tested on Ubuntu 22.04, boost 1.74.0, you should do:
boost_stacktrace/mystacktrace.h
#include <iostream>
#define BOOST_STACKTRACE_LINK
#include <boost/stacktrace.hpp>
void print_stacktrace(void) {
std::cout << boost::stacktrace::stacktrace();
}
On Ubuntu 19.10 boost 1.67.0 to get the line numbers we had to instead:
#include <iostream>
#define BOOST_STACKTRACE_USE_ADDR2LINE
#include <boost/stacktrace.hpp>
void print_stacktrace(void) {
std::cout << boost::stacktrace::stacktrace();
}
which would call out to the addr2line executable and be 1000x slower than the newer Boost version.
The package libboost-stacktrace-dev did not exist at all on Ubuntu 16.04.
The rest of this section considers only the Ubuntu 22.04, boost 1.74 behaviour.
Compile:
sudo apt-get install libboost-stacktrace-dev
g++ -O0 -ggdb3 -Wall -Wextra -pedantic -std=c++11 -o boost_stacktrace.out main.cpp -lboost_stacktrace_backtrace
Sample output:
0# print_stacktrace() at boost_stacktrace/mystacktrace.h:7
1# my_func_2() at /home/ciro/main.cpp:7
2# my_func_1(int) at /home/ciro/main.cpp:17
3# main at /home/ciro/main.cpp:26
4# __libc_start_call_main at ../sysdeps/nptl/libc_start_call_main.h:58
5# __libc_start_main at ../csu/libc-start.c:379
6# _start in ./boost_stacktrace.out
Note that the lines are off by one line. It was suggested in the comments that this is because the following instruction address is being considered.
Boost stacktrace header only
What the BOOST_STACKTRACE_LINK does is to require -lboost_stacktrace_backtrace at link time, so we imagine without that it will just work. This would be a good option for devs who don't have the "Boost clearance" to just add as one offs to debug.
TODO unfortunately it didn't so well for me:
#include <iostream>
#include <boost/stacktrace.hpp>
void print_stacktrace(void) {
std::cout << boost::stacktrace::stacktrace();
}
then:
g++ -O0 -ggdb3 -Wall -Wextra -pedantic -std=c++11 -o boost_stacktrace_header_only.out main.cpp
contains the overly short output:
0# 0x000055FF74AFB601 in ./boost_stacktrace_header_only.out
1# 0x000055FF74AFB66C in ./boost_stacktrace_header_only.out
2# 0x000055FF74AFB69C in ./boost_stacktrace_header_only.out
3# 0x000055FF74AFB6F7 in ./boost_stacktrace_header_only.out
4# 0x00007F0176E7BD90 in /lib/x86_64-linux-gnu/libc.so.6
5# __libc_start_main in /lib/x86_64-linux-gnu/libc.so.6
6# 0x000055FF74AFB4E5 in ./boost_stacktrace_header_only.out
which we can't even use with addr2line. Maybe we have to pass some other define from: https://www.boost.org/doc/libs/1_80_0/doc/html/stacktrace/configuration_and_build.html ?
Tested on Ubuntu 22.04. boost 1.74.
Boost boost::stacktrace::safe_dump_to
This is an interesting alternative to boost::stacktrace::stacktrace as it writes the stack trace in a async signal safe manner to a file, which makes it a good option for automatically dumping stack traces on segfaults which is a super common use case: How to automatically generate a stacktrace when my program crashes
Documented at: https://www.boost.org/doc/libs/1_70_0/doc/html/boost/stacktrace/safe_dump_1_3_38_7_6_2_1_6.html
TODO get it to work. All I see each time is a bunch of random bytes. My attempt:
boost_stacktrace_safe/mystacktrace.h
#include <unistd.h>
#define BOOST_STACKTRACE_LINK
#include <boost/stacktrace.hpp>
void print_stacktrace(void) {
boost::stacktrace::safe_dump_to(0, 1024, STDOUT_FILENO);
}
Sample output:
1[FU1[FU"2[FU}2[FUm1#n10[FU
Changes drastically each time, suggesting it is random memory addresses.
Tested on Ubuntu 22.04, boost 1.74.0.
glibc backtrace
This method is quite portable as it comes with glibc itself. Documented at: https://www.gnu.org/software/libc/manual/html_node/Backtraces.html
Tested on Ubuntu 22.04, glibc 2.35.
glibc_backtrace_symbols_fd/mystacktrace.h
#include <execinfo.h> /* backtrace, backtrace_symbols_fd */
#include <unistd.h> /* STDOUT_FILENO */
void print_stacktrace(void) {
size_t size;
enum Constexpr { MAX_SIZE = 1024 };
void *array[MAX_SIZE];
size = backtrace(array, MAX_SIZE);
backtrace_symbols_fd(array, size, STDOUT_FILENO);
}
Compile with:
g++ -O0 -ggdb3 -Wall -Wextra -pedantic -rdynamic -std=c++11 -o glibc_backtrace_symbols_fd.out main.cpp
Sample output with -rdynamic:
./glibc_backtrace_symbols.out(_Z16print_stacktracev+0x47) [0x556e6a131230]
./glibc_backtrace_symbols.out(_Z9my_func_2v+0xd) [0x556e6a1312d6]
./glibc_backtrace_symbols.out(_Z9my_func_1i+0x14) [0x556e6a131306]
./glibc_backtrace_symbols.out(main+0x58) [0x556e6a131361]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f175e7bdd90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7f175e7bde40]
./glibc_backtrace_symbols.out(_start+0x25) [0x556e6a131125]
Sample output without -rdynamic:
./glibc_backtrace_symbols_fd_no_rdynamic.out(+0x11f0)[0x556bd40461f0]
./glibc_backtrace_symbols_fd_no_rdynamic.out(+0x123c)[0x556bd404623c]
./glibc_backtrace_symbols_fd_no_rdynamic.out(+0x126c)[0x556bd404626c]
./glibc_backtrace_symbols_fd_no_rdynamic.out(+0x12c7)[0x556bd40462c7]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f0da2b70d90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f0da2b70e40]
./glibc_backtrace_symbols_fd_no_rdynamic.out(+0x10e5)[0x556bd40460e5]
To get the line numbers without -rdynamic we can use addr2line:
addr2line -C -e glibc_backtrace_symbols_fd_no_rdynamic.out 0x11f0 0x123c 0x126c 0x12c7
addr2line cannot unfortunately handle the function name + offset in function format of when we are not using -rdynamic, e.g. _Z9my_func_2v+0xd.
GDB can however:
gdb -nh -batch -ex 'info line *(_Z9my_func_2v+0xd)' -ex 'info line *(_Z9my_func_1i+0x14)' glibc_backtrace_symbols.out
Line 7 of "main.cpp" starts at address 0x12d6 <_Z9my_func_2v+13> and ends at 0x12d9 <_Z9my_func_1d>.
Line 17 of "main.cpp" starts at address 0x1306 <_Z9my_func_1i+20> and ends at 0x1309 <main(int, char**)>.
A helper to make it more bearable:
addr2lines() (
perl -ne '$m = s/(.*).*\(([^)]*)\).*/gdb -nh -q -batch -ex "info line *\2" \1/;print $_ if $m' | bash
)
Usage:
xsel -b | addr2lines
glibc backtrace_symbols
A version of backtrace_symbols_fd that returns a string rather than printing to a file handle.
glibc_backtrace_symbols/mystacktrace.h
#include <execinfo.h> /* backtrace, backtrace_symbols */
#include <stdio.h> /* printf */
void print_stacktrace(void) {
char **strings;
size_t i, size;
enum Constexpr { MAX_SIZE = 1024 };
void *array[MAX_SIZE];
size = backtrace(array, MAX_SIZE);
strings = backtrace_symbols(array, size);
for (i = 0; i < size; i++)
printf("%s\n", strings[i]);
free(strings);
}
glibc backtrace with C++ demangling hack 1: -export-dynamic + dladdr
I couldn't find a simple way to automatically demangle C++ symbols with glibc backtrace.
https://panthema.net/2008/0901-stacktrace-demangled/
https://gist.github.com/fmela/591333/c64f4eb86037bb237862a8283df70cdfc25f01d3
Adapted from: https://gist.github.com/fmela/591333/c64f4eb86037bb237862a8283df70cdfc25f01d3
This is a "hack" because it requires changing the ELF with -export-dynamic.
glibc_ldl.cpp
#include <dlfcn.h> // for dladdr
#include <cxxabi.h> // for __cxa_demangle
#include <cstdio>
#include <string>
#include <sstream>
#include <iostream>
// This function produces a stack backtrace with demangled function & method names.
std::string backtrace(int skip = 1)
{
void *callstack[128];
const int nMaxFrames = sizeof(callstack) / sizeof(callstack[0]);
char buf[1024];
int nFrames = backtrace(callstack, nMaxFrames);
char **symbols = backtrace_symbols(callstack, nFrames);
std::ostringstream trace_buf;
for (int i = skip; i < nFrames; i++) {
Dl_info info;
if (dladdr(callstack[i], &info)) {
char *demangled = NULL;
int status;
demangled = abi::__cxa_demangle(info.dli_sname, NULL, 0, &status);
std::snprintf(
buf,
sizeof(buf),
"%-3d %*p %s + %zd\n",
i,
(int)(2 + sizeof(void*) * 2),
callstack[i],
status == 0 ? demangled : info.dli_sname,
(char *)callstack[i] - (char *)info.dli_saddr
);
free(demangled);
} else {
std::snprintf(buf, sizeof(buf), "%-3d %*p\n",
i, (int)(2 + sizeof(void*) * 2), callstack[i]);
}
trace_buf << buf;
std::snprintf(buf, sizeof(buf), "%s\n", symbols[i]);
trace_buf << buf;
}
free(symbols);
if (nFrames == nMaxFrames)
trace_buf << "[truncated]\n";
return trace_buf.str();
}
void my_func_2(void) {
std::cout << backtrace() << std::endl;
}
void my_func_1(double f) {
(void)f;
my_func_2();
}
void my_func_1(int i) {
(void)i;
my_func_2();
}
int main() {
my_func_1(1);
my_func_1(2.0);
}
Compile and run:
g++ -fno-pie -ggdb3 -O0 -no-pie -o glibc_ldl.out -std=c++11 -Wall -Wextra \
-pedantic-errors -fpic glibc_ldl.cpp -export-dynamic -ldl
./glibc_ldl.out
output:
1 0x40130a my_func_2() + 41
./glibc_ldl.out(_Z9my_func_2v+0x29) [0x40130a]
2 0x40139e my_func_1(int) + 16
./glibc_ldl.out(_Z9my_func_1i+0x10) [0x40139e]
3 0x4013b3 main + 18
./glibc_ldl.out(main+0x12) [0x4013b3]
4 0x7f7594552b97 __libc_start_main + 231
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x7f7594552b97]
5 0x400f3a _start + 42
./glibc_ldl.out(_start+0x2a) [0x400f3a]
1 0x40130a my_func_2() + 41
./glibc_ldl.out(_Z9my_func_2v+0x29) [0x40130a]
2 0x40138b my_func_1(double) + 18
./glibc_ldl.out(_Z9my_func_1d+0x12) [0x40138b]
3 0x4013c8 main + 39
./glibc_ldl.out(main+0x27) [0x4013c8]
4 0x7f7594552b97 __libc_start_main + 231
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x7f7594552b97]
5 0x400f3a _start + 42
./glibc_ldl.out(_start+0x2a) [0x400f3a]
Tested on Ubuntu 18.04.
glibc backtrace with C++ demangling hack 2: parse backtrace output
Shown at: https://panthema.net/2008/0901-stacktrace-demangled/
This is a hack because it requires parsing.
TODO get it to compile and show it here.
GDB scripting
We can also do this with GDB without recompiling by using: How to do an specific action when a certain breakpoint is hit in GDB?
We setup an empty backtrace function for our testing:
gdb/mystacktrace.h
void print_stacktrace(void) {}
and then with:
main.gdb
start
break print_stacktrace
commands
silent
backtrace
printf "\n"
continue
end
continue
we can run:
gdb -nh -batch -x main.gdb --args gdb.out
Sample output:
Temporary breakpoint 1 at 0x11a7: file main.cpp, line 21.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Temporary breakpoint 1, main (argc=1, argv=0x7fffffffc3e8) at main.cpp:21
warning: Source file is more recent than executable.
21 if (argc > 1) {
Breakpoint 2 at 0x555555555151: file gdb/mystacktrace.h, line 1.
#0 print_stacktrace () at gdb/mystacktrace.h:1
#1 0x0000555555555161 in my_func_2 () at main.cpp:6
#2 0x0000555555555191 in my_func_1 (i=1) at main.cpp:16
#3 0x00005555555551ec in main (argc=1, argv=0x7fffffffc3e8) at main.cpp:27
[Inferior 1 (process 165453) exited normally]
The above can be made more usable with the following Bash function:
gdbbt() (
tmpfile=$(mktemp /tmp/gdbbt.XXXXXX)
fn="$1"
shift
printf '%s' "
start
break $fn
commands
silent
backtrace
printf \"\n\"
continue
end
continue
" > "$tmpfile"
gdb -nh -batch -x "$tmpfile" -args "$#"
rm -f "$tmpfile"
)
Usage:
gdbbt print_stacktrace gdb.out 2
I don't know how to make commands with -ex without the temporary file: Problems adding a breakpoint with commands from command line with ex command
Tested in Ubuntu 22.04, GDB 12.0.90.
GDB code injection
TODO this is the dream! It might allow for both compiled-liked speeds, but without the need to recompile! Either:
with compile code + one of the other options, ideally C++23 <stacktrace>: How to call assembly in gdb? Might already be possible. But compile code is mega-quirky so I'm lazy to even try
a built-in dbt command analogous to dprintf dynamic printf: How to do an specific action when a certain breakpoint is hit in GDB?
libunwind
TODO does this have any advantage over glibc backtrace? Very similar output, also requires modifying the build command, but not part of glibc so requires an extra package installation.
Code adapted from: https://eli.thegreenplace.net/2015/programmatic-access-to-the-call-stack-in-c/
main.c
/* This must be on top. */
#define _XOPEN_SOURCE 700
#include <stdio.h>
#include <stdlib.h>
/* Paste this on the file you want to debug. */
#define UNW_LOCAL_ONLY
#include <libunwind.h>
#include <stdio.h>
void print_trace() {
char sym[256];
unw_context_t context;
unw_cursor_t cursor;
unw_getcontext(&context);
unw_init_local(&cursor, &context);
while (unw_step(&cursor) > 0) {
unw_word_t offset, pc;
unw_get_reg(&cursor, UNW_REG_IP, &pc);
if (pc == 0) {
break;
}
printf("0x%lx:", pc);
if (unw_get_proc_name(&cursor, sym, sizeof(sym), &offset) == 0) {
printf(" (%s+0x%lx)\n", sym, offset);
} else {
printf(" -- error: unable to obtain symbol name for this frame\n");
}
}
puts("");
}
void my_func_3(void) {
print_trace();
}
void my_func_2(void) {
my_func_3();
}
void my_func_1(void) {
my_func_3();
}
int main(void) {
my_func_1(); /* line 46 */
my_func_2(); /* line 47 */
return 0;
}
Compile and run:
sudo apt-get install libunwind-dev
gcc -fno-pie -ggdb3 -O3 -no-pie -o main.out -std=c99 \
-Wall -Wextra -pedantic-errors main.c -lunwind
Either #define _XOPEN_SOURCE 700 must be on top, or we must use -std=gnu99:
Is the type `stack_t` no longer defined on linux?
Glibc - error in ucontext.h, but only with -std=c11
Run:
./main.out
Output:
0x4007db: (main+0xb)
0x7f4ff50aa830: (__libc_start_main+0xf0)
0x400819: (_start+0x29)
0x4007e2: (main+0x12)
0x7f4ff50aa830: (__libc_start_main+0xf0)
0x400819: (_start+0x29)
and:
addr2line -e main.out 0x4007db 0x4007e2
gives:
/home/ciro/main.c:34
/home/ciro/main.c:49
With -O0:
0x4009cf: (my_func_3+0xe)
0x4009e7: (my_func_1+0x9)
0x4009f3: (main+0x9)
0x7f7b84ad7830: (__libc_start_main+0xf0)
0x4007d9: (_start+0x29)
0x4009cf: (my_func_3+0xe)
0x4009db: (my_func_2+0x9)
0x4009f8: (main+0xe)
0x7f7b84ad7830: (__libc_start_main+0xf0)
0x4007d9: (_start+0x29)
and:
addr2line -e main.out 0x4009f3 0x4009f8
gives:
/home/ciro/main.c:47
/home/ciro/main.c:48
Tested on Ubuntu 16.04, GCC 6.4.0, libunwind 1.1.
libunwind with C++ name demangling
Code adapted from: https://eli.thegreenplace.net/2015/programmatic-access-to-the-call-stack-in-c/
unwind.cpp
#define UNW_LOCAL_ONLY
#include <cxxabi.h>
#include <libunwind.h>
#include <cstdio>
#include <cstdlib>
#include <iostream>
void backtrace() {
unw_cursor_t cursor;
unw_context_t context;
// Initialize cursor to current frame for local unwinding.
unw_getcontext(&context);
unw_init_local(&cursor, &context);
// Unwind frames one by one, going up the frame stack.
while (unw_step(&cursor) > 0) {
unw_word_t offset, pc;
unw_get_reg(&cursor, UNW_REG_IP, &pc);
if (pc == 0) {
break;
}
std::printf("0x%lx:", pc);
char sym[256];
if (unw_get_proc_name(&cursor, sym, sizeof(sym), &offset) == 0) {
char* nameptr = sym;
int status;
char* demangled = abi::__cxa_demangle(sym, nullptr, nullptr, &status);
if (status == 0) {
nameptr = demangled;
}
std::printf(" (%s+0x%lx)\n", nameptr, offset);
std::free(demangled);
} else {
std::printf(" -- error: unable to obtain symbol name for this frame\n");
}
}
}
void my_func_2(void) {
backtrace();
std::cout << std::endl; // line 43
}
void my_func_1(double f) {
(void)f;
my_func_2();
}
void my_func_1(int i) {
(void)i;
my_func_2();
} // line 54
int main() {
my_func_1(1);
my_func_1(2.0);
}
Compile and run:
sudo apt-get install libunwind-dev
g++ -fno-pie -ggdb3 -O0 -no-pie -o unwind.out -std=c++11 \
-Wall -Wextra -pedantic-errors unwind.cpp -lunwind -pthread
./unwind.out
Output:
0x400c80: (my_func_2()+0x9)
0x400cb7: (my_func_1(int)+0x10)
0x400ccc: (main+0x12)
0x7f4c68926b97: (__libc_start_main+0xe7)
0x400a3a: (_start+0x2a)
0x400c80: (my_func_2()+0x9)
0x400ca4: (my_func_1(double)+0x12)
0x400ce1: (main+0x27)
0x7f4c68926b97: (__libc_start_main+0xe7)
0x400a3a: (_start+0x2a)
and then we can find the lines of my_func_2 and my_func_1(int) with:
addr2line -e unwind.out 0x400c80 0x400cb7
which gives:
/home/ciro/test/unwind.cpp:43
/home/ciro/test/unwind.cpp:54
TODO: why are the lines off by one?
Tested on Ubuntu 18.04, GCC 7.4.0, libunwind 1.2.1.
Linux kernel
How to print the current thread stack trace inside the Linux kernel?
libdwfl
This was originally mentioned at: https://stackoverflow.com/a/60713161/895245 and it might be the best method, but I have to benchmark a bit more, but please go upvote that answer.
TODO: I tried to minimize the code in that answer, which was working, to a single function, but it is segfaulting, let me know if anyone can find why.
dwfl.cpp: answer reached 30k chars and this was the easiest cut: https://gist.github.com/cirosantilli/f1dd3ee5d324b9d24e40f855723544ac
Compile and run:
sudo apt install libdw-dev libunwind-dev
g++ -fno-pie -ggdb3 -O0 -no-pie -o dwfl.out -std=c++11 -Wall -Wextra -pedantic-errors dwfl.cpp -ldw -lunwind
./dwfl.out
We also need libunwind as that makes results more correct. If you do without it, it runs, but you will see that some of the lines are a bit wrong.
Output:
0: 0x402b72 stacktrace[abi:cxx11]() at /home/ciro/test/dwfl.cpp:65
1: 0x402cda my_func_2() at /home/ciro/test/dwfl.cpp:100
2: 0x402d76 my_func_1(int) at /home/ciro/test/dwfl.cpp:111
3: 0x402dd1 main at /home/ciro/test/dwfl.cpp:122
4: 0x7ff227ea0d8f __libc_start_call_main at ../sysdeps/nptl/libc_start_call_main.h:58
5: 0x7ff227ea0e3f __libc_start_main##GLIBC_2.34 at ../csu/libc-start.c:392
6: 0x402534 _start at ../csu/libc-start.c:-1
0: 0x402b72 stacktrace[abi:cxx11]() at /home/ciro/test/dwfl.cpp:65
1: 0x402cda my_func_2() at /home/ciro/test/dwfl.cpp:100
2: 0x402d5f my_func_1(double) at /home/ciro/test/dwfl.cpp:106
3: 0x402de2 main at /home/ciro/test/dwfl.cpp:123
4: 0x7ff227ea0d8f __libc_start_call_main at ../sysdeps/nptl/libc_start_call_main.h:58
5: 0x7ff227ea0e3f __libc_start_main##GLIBC_2.34 at ../csu/libc-start.c:392
6: 0x402534 _start at ../csu/libc-start.c:-1
Benchmark run:
g++ -fno-pie -ggdb3 -O3 -no-pie -o dwfl.out -std=c++11 -Wall -Wextra -pedantic-errors dwfl.cpp -ldw
time ./dwfl.out 1000 >/dev/null
Output:
real 0m3.751s
user 0m2.822s
sys 0m0.928s
So we see that this method is 10x faster than Boost's stacktrace, and might therefore be applicable to more use cases.
Tested in Ubuntu 22.04 amd64, libdw-dev 0.186, libunwind 1.3.2.
libbacktrace
https://github.com/ianlancetaylor/libbacktrace
Considering the harcore library author, it is worth trying this out, maybe it is The One. TODO check it out.
A C library that may be linked into a C/C++ program to produce symbolic backtraces
As of October 2020, libbacktrace supports ELF, PE/COFF, Mach-O, and XCOFF executables with DWARF debugging information. In other words, it supports GNU/Linux, *BSD, macOS, Windows, and AIX. The library is written to make it straightforward to add support for other object file and debugging formats.
The library relies on the C++ unwind API defined at https://itanium-cxx-abi.github.io/cxx-abi/abi-eh.html This API is provided by GCC and clang.
See also
How can one grab a stack trace in C?
How to make backtrace()/backtrace_symbols() print the function names?
Is there a portable/standard-compliant way to get filenames and linenumbers in a stack trace?
Best way to invoke gdb from inside program to print its stacktrace?
automatic stack trace on failure:
on C++ exception: C++ display stack trace on exception
generic: How to automatically generate a stacktrace when my program crashes
For a linux-only solution you can use backtrace(3) that simply returns an array of void * (in fact each of these point to the return address from the corresponding stack frame). To translate these to something of use, there's backtrace_symbols(3).
Pay attention to the notes section in backtrace(3):
The symbol names may be unavailable
without the use of special linker
options.
For systems using the GNU linker, it is necessary to use the
-rdynamic linker
option. Note that names of "static" functions are not exposed,
and won't be
available in the backtrace.
In C++23, there will be <stacktrace>, and then you can do:
#include <stacktrace>
/* ... */
std::cout << std::stacktrace::current();
Further details:
  • https://en.cppreference.com/w/cpp/header/stacktrace
  • https://en.cppreference.com/w/cpp/utility/basic_stacktrace/operator_ltlt
Is there any way to dump the call stack in a running process in C or C++ every time a certain function is called?
You can use a macro function instead of return statement in the specific function.
For example, instead of using return,
int foo(...)
{
if (error happened)
return -1;
... do something ...
return 0
}
You can use a macro function.
#include "c-callstack.h"
int foo(...)
{
if (error happened)
NL_RETURN(-1);
... do something ...
NL_RETURN(0);
}
Whenever an error happens in a function, you will see Java-style call stack as shown below.
Error(code:-1) at : so_topless_ranking_server (sample.c:23)
Error(code:-1) at : nanolat_database (sample.c:31)
Error(code:-1) at : nanolat_message_queue (sample.c:39)
Error(code:-1) at : main (sample.c:47)
Full source code is available here.
c-callstack at https://github.com/Nanolat
Linux specific, TLDR:
backtrace in glibc produces accurate stacktraces only when -lunwind is linked (undocumented platform-specific feature).
To output function name, source file and line number use #include <elfutils/libdwfl.h> (this library is documented only in its header file). backtrace_symbols and backtrace_symbolsd_fd are least informative.
On modern Linux your can get the stacktrace addresses using function backtrace. The undocumented way to make backtrace produce more accurate addresses on popular platforms is to link with -lunwind (libunwind-dev on Ubuntu 18.04) (see the example output below). backtrace uses function _Unwind_Backtrace and by default the latter comes from libgcc_s.so.1 and that implementation is most portable. When -lunwind is linked it provides a more accurate version of _Unwind_Backtrace but this library is less portable (see supported architectures in libunwind/src).
Unfortunately, the companion backtrace_symbolsd and backtrace_symbols_fd functions have not been able to resolve the stacktrace addresses to function names with source file name and line number for probably a decade now (see the example output below).
However, there is another method to resolve addresses to symbols and it produces the most useful traces with function name, source file and line number. The method is to #include <elfutils/libdwfl.h>and link with -ldw (libdw-dev on Ubuntu 18.04).
Working C++ example (test.cc):
#include <stdexcept>
#include <iostream>
#include <cassert>
#include <cstdlib>
#include <string>
#include <boost/core/demangle.hpp>
#include <execinfo.h>
#include <elfutils/libdwfl.h>
struct DebugInfoSession {
Dwfl_Callbacks callbacks = {};
char* debuginfo_path = nullptr;
Dwfl* dwfl = nullptr;
DebugInfoSession() {
callbacks.find_elf = dwfl_linux_proc_find_elf;
callbacks.find_debuginfo = dwfl_standard_find_debuginfo;
callbacks.debuginfo_path = &debuginfo_path;
dwfl = dwfl_begin(&callbacks);
assert(dwfl);
int r;
r = dwfl_linux_proc_report(dwfl, getpid());
assert(!r);
r = dwfl_report_end(dwfl, nullptr, nullptr);
assert(!r);
static_cast<void>(r);
}
~DebugInfoSession() {
dwfl_end(dwfl);
}
DebugInfoSession(DebugInfoSession const&) = delete;
DebugInfoSession& operator=(DebugInfoSession const&) = delete;
};
struct DebugInfo {
void* ip;
std::string function;
char const* file;
int line;
DebugInfo(DebugInfoSession const& dis, void* ip)
: ip(ip)
, file()
, line(-1)
{
// Get function name.
uintptr_t ip2 = reinterpret_cast<uintptr_t>(ip);
Dwfl_Module* module = dwfl_addrmodule(dis.dwfl, ip2);
char const* name = dwfl_module_addrname(module, ip2);
function = name ? boost::core::demangle(name) : "<unknown>";
// Get source filename and line number.
if(Dwfl_Line* dwfl_line = dwfl_module_getsrc(module, ip2)) {
Dwarf_Addr addr;
file = dwfl_lineinfo(dwfl_line, &addr, &line, nullptr, nullptr, nullptr);
}
}
};
std::ostream& operator<<(std::ostream& s, DebugInfo const& di) {
s << di.ip << ' ' << di.function;
if(di.file)
s << " at " << di.file << ':' << di.line;
return s;
}
void terminate_with_stacktrace() {
void* stack[512];
int stack_size = ::backtrace(stack, sizeof stack / sizeof *stack);
// Print the exception info, if any.
if(auto ex = std::current_exception()) {
try {
std::rethrow_exception(ex);
}
catch(std::exception& e) {
std::cerr << "Fatal exception " << boost::core::demangle(typeid(e).name()) << ": " << e.what() << ".\n";
}
catch(...) {
std::cerr << "Fatal unknown exception.\n";
}
}
DebugInfoSession dis;
std::cerr << "Stacktrace of " << stack_size << " frames:\n";
for(int i = 0; i < stack_size; ++i) {
std::cerr << i << ": " << DebugInfo(dis, stack[i]) << '\n';
}
std::cerr.flush();
std::_Exit(EXIT_FAILURE);
}
int main() {
std::set_terminate(terminate_with_stacktrace);
throw std::runtime_error("test exception");
}
Compiled on Ubuntu 18.04.4 LTS with gcc-8.3:
g++ -o test.o -c -m{arch,tune}=native -std=gnu++17 -W{all,extra,error} -g -Og -fstack-protector-all test.cc
g++ -o test -g test.o -ldw -lunwind
Outputs:
Fatal exception std::runtime_error: test exception.
Stacktrace of 7 frames:
0: 0x55f3837c1a8c terminate_with_stacktrace() at /home/max/src/test/test.cc:76
1: 0x7fbc1c845ae5 <unknown>
2: 0x7fbc1c845b20 std::terminate()
3: 0x7fbc1c845d53 __cxa_throw
4: 0x55f3837c1a43 main at /home/max/src/test/test.cc:103
5: 0x7fbc1c3e3b96 __libc_start_main at ../csu/libc-start.c:310
6: 0x55f3837c17e9 _start
When no -lunwind is linked, it produces a less accurate stacktrace:
0: 0x5591dd9d1a4d terminate_with_stacktrace() at /home/max/src/test/test.cc:76
1: 0x7f3c18ad6ae6 <unknown>
2: 0x7f3c18ad6b21 <unknown>
3: 0x7f3c18ad6d54 <unknown>
4: 0x5591dd9d1a04 main at /home/max/src/test/test.cc:103
5: 0x7f3c1845cb97 __libc_start_main at ../csu/libc-start.c:344
6: 0x5591dd9d17aa _start
For comparison, backtrace_symbols_fd output for the same stacktrace is least informative:
/home/max/src/test/debug/gcc/test(+0x192f)[0x5601c5a2092f]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x92ae5)[0x7f95184f5ae5]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZSt9terminatev+0x10)[0x7f95184f5b20]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(__cxa_throw+0x43)[0x7f95184f5d53]
/home/max/src/test/debug/gcc/test(+0x1ae7)[0x5601c5a20ae7]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe6)[0x7f9518093b96]
/home/max/src/test/debug/gcc/test(+0x1849)[0x5601c5a20849]
In a production version (as well as C language version) you may like to make this code extra robust by replacing boost::core::demangle, std::string and std::cout with their underlying calls.
You can also override __cxa_throw to capture the stacktrace when an exception is thrown and print it when the exception is caught. By the time it enters catch block the stack has been unwound, so it is too late to call backtrace, and this is why the stack must be captured on throw which is implemented by function __cxa_throw. Note that in a multi-threaded program __cxa_throw can be called simultaneously by multiple threads, so that if it captures the stacktrace into a global array that must be thread_local.
You can also make the stack trace printing function async-signal safe, so that you can invoke it directly from your SIGSEGV, SIGBUS signal handlers (which should use their own stacks for robustness). Obtaining function name, source file and line number using libdwfl from a signal handler may fail because it is not async-signal safe or if the address space of the process has been substantially corrupted, but in practice it succeeds 99% of the time (I haven't seen it fail).
To summarize, a complete production-ready library for automatic stacktrace output should:
Capture the stacktrace on throw into thread-specific storage.
Automatically print the stacktrace on unhandled exceptions.
Print the stacktrace in async-signal-safe manner.
Provide a robust signal handler function which uses its own stack that prints the stacktrace in a async-signal-safe manner. The user can install this function as a signal handler for SIGSEGV, SIGBUS, SIGFPE, etc..
The signal handler may as well print the values of all CPU registers at the point of the fault from ucontext_t signal function argument (may be excluding vector registers), a-la Linux kernel oops log messages.
Another answer to an old thread.
When I need to do this, I usually just use system() and pstack
So something like this:
#include <sys/types.h>
#include <unistd.h>
#include <string>
#include <sstream>
#include <cstdlib>
void f()
{
pid_t myPid = getpid();
std::string pstackCommand = "pstack ";
std::stringstream ss;
ss << myPid;
pstackCommand += ss.str();
system(pstackCommand.c_str());
}
void g()
{
f();
}
void h()
{
g();
}
int main()
{
h();
}
This outputs
#0 0x00002aaaab62d61e in waitpid () from /lib64/libc.so.6
#1 0x00002aaaab5bf609 in do_system () from /lib64/libc.so.6
#2 0x0000000000400c3c in f() ()
#3 0x0000000000400cc5 in g() ()
#4 0x0000000000400cd1 in h() ()
#5 0x0000000000400cdd in main ()
This should work on Linux, FreeBSD and Solaris. I don't think that macOS has pstack or a simple equivalent, but this thread seems to have an alternative.
If you are using C, then you will need to use C string functions.
#include <sys/types.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
void f()
{
pid_t myPid = getpid();
/*
length of command 7 for 'pstack ', 7 for the PID, 1 for nul
*/
char pstackCommand[7+7+1];
sprintf(pstackCommand, "pstack %d", (int)myPid);
system(pstackCommand);
}
I've used 7 for the max number of digits in the PID, based on this post.
There is no standardized way to do that. For windows the functionality is provided in the DbgHelp library
You can use the Boost libraries to print the current callstack.
#include <boost/stacktrace.hpp>
// ... somewhere inside the `bar(int)` function that is called recursively:
std::cout << boost::stacktrace::stacktrace();
Man here: https://www.boost.org/doc/libs/1_65_1/doc/html/stacktrace.html
I know this thread is old, but I think it can be useful for other people. If you are using gcc, you can use its instrument features (-finstrument-functions option) to log any function call (entry and exit). Have a look at this for more information: http://hacktalks.blogspot.fr/2013/08/gcc-instrument-functions.html
You can thus for instance push and pop every calls into a stack, and when you want to print it, you just look at what you have in your stack.
I've tested it, it works perfectly and is very handy
UPDATE: you can also find information about the -finstrument-functions compile option in the GCC doc concerning the Instrumentation options: https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html
You can implement the functionality yourself:
Use a global (string)stack and at start of each function push the function name and such other values (eg parameters) onto this stack; at exit of function pop it again.
Write a function that will printout the stack content when it is called, and use this in the function where you want to see the callstack.
This may sound like a lot of work but is quite useful.
Of course the next question is: will this be enough ?
The main disadvantage of stack-traces is that why you have the precise function being called you do not have anything else, like the value of its arguments, which is very useful for debugging.
If you have access to gcc and gdb, I would suggest using assert to check for a specific condition, and produce a memory dump if it is not met. Of course this means the process will stop, but you'll have a full fledged report instead of a mere stack-trace.
If you wish for a less obtrusive way, you can always use logging. There are very efficient logging facilities out there, like Pantheios for example. Which once again could give you a much more accurate image of what is going on.
You can use Poppy for this. It is normally used to gather the stack trace during a crash but it can also output it for a running program as well.
Now here's the good part: it can output the actual parameter values for each function on the stack, and even local variables, loop counters, etc.
You can use the GNU profiler. It shows the call-graph as well! the command is gprof and you need to compile your code with some option.
Is there any way to dump the call stack in a running process in C or C++ every time a certain function is called?
No there is not, although platform-dependent solutions might exist.

gcc's __builtin_memcpy performance with certain number of bytes is terrible compared to clang's

I thought I`d first share this here to have your opinions before doing anything else. I found out while designing an algorithm that the gcc compiled code performance for some simple code was catastrophic compared to clang's.
How to reproduce
Create a test.c file containing this code :
#include <sys/stat.h>
#include <sys/types.h>
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
int main(int argc, char *argv[]) {
const uint64_t size = 1000000000;
const size_t alloc_mem = size * sizeof(uint8_t);
uint8_t *mem = (uint8_t*)malloc(alloc_mem);
for (uint_fast64_t i = 0; i < size; i++)
mem[i] = (uint8_t) (i >> 7);
uint8_t block = 0;
uint_fast64_t counter = 0;
uint64_t total = 0x123456789abcdefllu;
uint64_t receiver = 0;
for(block = 1; block <= 8; block ++) {
printf("%u ...\n", block);
counter = 0;
while (counter < size - 8) {
__builtin_memcpy(&receiver, &mem[counter], block);
receiver &= (0xffffffffffffffffllu >> (64 - ((block) << 3)));
total += ((receiver * 0x321654987cbafedllu) >> 48);
counter += block;
}
}
printf("=> %llu\n", total);
return EXIT_SUCCESS;
}
gcc
Compile and run :
gcc-7 -O3 test.c
time ./a.out
1 ...
2 ...
3 ...
4 ...
5 ...
6 ...
7 ...
8 ...
=> 82075168519762377
real 0m23.367s
user 0m22.634s
sys 0m0.495s
info :
gcc-7 -v
Using built-in specs.
COLLECT_GCC=gcc-7
COLLECT_LTO_WRAPPER=/usr/local/Cellar/gcc/7.3.0/libexec/gcc/x86_64-apple-darwin17.4.0/7.3.0/lto-wrapper
Target: x86_64-apple-darwin17.4.0
Configured with: ../configure --build=x86_64-apple-darwin17.4.0 --prefix=/usr/local/Cellar/gcc/7.3.0 --libdir=/usr/local/Cellar/gcc/7.3.0/lib/gcc/7 --enable-languages=c,c++,objc,obj-c++,fortran --program-suffix=-7 --with-gmp=/usr/local/opt/gmp --with-mpfr=/usr/local/opt/mpfr --with-mpc=/usr/local/opt/libmpc --with-isl=/usr/local/opt/isl --with-system-zlib --enable-checking=release --with-pkgversion='Homebrew GCC 7.3.0' --with-bugurl=https://github.com/Homebrew/homebrew-core/issues --disable-nls
Thread model: posix
gcc version 7.3.0 (Homebrew GCC 7.3.0)
So we get about 23s of user time. Now let's do the same with cc (clang on macOS) :
clang
cc -O3 test.c
time ./a.out
1 ...
2 ...
3 ...
4 ...
5 ...
6 ...
7 ...
8 ...
=> 82075168519762377
real 0m9.832s
user 0m9.310s
sys 0m0.442s
info :
Apple LLVM version 9.0.0 (clang-900.0.39.2)
Target: x86_64-apple-darwin17.4.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
That's more than 2.5x faster !! Any thoughts ?
I replaced the __builtin_memcpy function by memcpy to test things out and this time the compiled code runs in about 34s on both sides - consistent and slower as expected.
It would appear that the combination of __builtin_memcpy and bitmasking is interpreted very differently by both compilers.
I had a look at the assembly code, but couldn't see anything standing out that would explain such a drop in performance as I'm not an asm expert.
Edit 03-05-2018 :
Posted this bug : https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84719.
I find it suspicious that you get different code for memcpy vs __builtin_memcpy. I don't think that's supposed to happen, and indeed I cannot reproduce it on my (linux) system.
If you add #pragma GCC unroll 16 (implemented in gcc-8+) before the for loop, gcc gets the same perf as clang (making block a constant is essential to optimize the code), so essentially llvm's unrolling is more aggressive than gcc's, which can be good or bad depending on cases. Still, feel free to report it to gcc, maybe they'll tweak the unrolling heuristics some day and an extra testcase could help.
Once unrolling is taken care of, gcc does ok for some values (block equals 4 or 8 in particular), but much worse for some others, in particular 3. But that's better analyzed with a smaller testcase without the loop on block. Gcc seems to have trouble with memcpy(,,3), it works much better if you always read 8 bytes (the next line already takes care of the extra bytes IIUC). Another thing that could be reported to gcc.

Using printk to print data on Linux terminal

I currently started learning the Linux Device driver programming in Linux. where I found this small piece of code printing hello world using printk() function.
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>
MODULE_LICENSE("Dual BSD/GPL");
static int hello_init(void)
{
printk(KERN_ALERT "Hello World!!!\n");
return 0;
}
static void hello_exit(void)
{
printk(KERN_ALERT "Goodbye Hello World!!!\n");
}
module_init(hello_init);
module_exit(hello_exit);
After compiling code using make command and load driver using insmod command. I'm not getting the "Hello world" printed on screen instead its printing only on the log file /var/log/kern.log. But I want printk to print on my ubuntu terminal. I'm using ubuntu(14.04). Is it possible?
It isn't possible to redirect kernel logs and massages to gnome-terminal and there you have to use dmesg.
But in a virtual terminal(open one with ctrl+F1-F6) you can redirect them to standard output.
First determine tty number by entering tty command in virtual terminal.The output may be /dev/tty(1-6).
Compile and run this code with the argument you specified.
/*
* setconsole.c -- choose a console to receive kernel messages
*
* Copyright (C) 1998,2000,2001 Alessandro Rubini
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <unistd.h>
#include <sys/ioctl.h>
int main(int argc, char **argv)
{
char bytes[2] = {11,0}; /* 11 is the TIOCLINUX cmd number */
if (argc==2) bytes[1] = atoi(argv[1]); /* the chosen console */
else {
fprintf(stderr, "%s: need a single arg\n",argv[0]); exit(1);
}
if (ioctl(STDIN_FILENO, TIOCLINUX, bytes)<0) { /* use stdin */
fprintf(stderr,"%s: ioctl(stdin, TIOCLINUX): %s\n",
argv[0], strerror(errno));
exit(1);
}
exit(0);
}
For example if your output for tty command was /dev/tty1then type this two command:
gcc setconsole.c -o setconsole
sudo ./setconsole 1
This will set your tty to receive kernel messages.
Then compile and run this code.
/*
* setlevel.c -- choose a console_loglevel for the kernel
*
* Copyright (C) 1998,2000,2001 Alessandro Rubini
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <sys/klog.h>
int main(int argc, char **argv)
{
int level;
if (argc==2) {
level = atoi(argv[1]); /* the chosen console */
} else {
fprintf(stderr, "%s: need a single arg\n",argv[0]); exit(1);
}
if (klogctl(8,NULL,level) < 0) {
fprintf(stderr,"%s: syslog(setlevel): %s\n",
argv[0],strerror(errno));
exit(1);
}
exit(0);
}
There are 8 level of kernel messages as you specify in your code KERN_ALERT is one of them.To make console receive all of them you should run above code with 8 as arguement.
gcc setlevel.c -o setlevel
sudo ./setlevel 8
Now you can insert your module to kernel and see kernel logs in console.
By the way these codes are from ldd3 examples.
printk prints to the kernel log. There is no "screen" as far as the kernel is concerned. If you want to see the output of printk in real time, you can open a terminal and type the following dmesg -w. Note that the -w flag is only supported by recent versions of dmesg (which is provided by the util-linux package).

Modifying Linker Script to make the .text section writable, errors

I am trying to make the .text section writable for a C program. I looked through the options provided in this SO question and zeroed on modifying the linker script to achieve this.
For this I created a writable memory region using
MEMORY { rwx (wx) : ORIGIN = 0x400000, LENGTH = 256K}
and at the section .text added:
.text :
{
*(.text.unlikely .text.*_unlikely)
*(.text.exit .text.exit.*)
*(.text.startup .text.startup.*)
*(.text.hot .text.hot.*)
*(.text .stub .text.* .gnu.linkonce.t.*)
/* .gnu.warning sections are handled specially by elf32.em. */
*(.gnu.warning)
} >rwx
On compiling the code with gcc flag -T and giving my linker file as an argument I am getting an error:
error: no memory region specified for loadable section '.interp'
I am only trying to change the memory permissions for the .text region. Working on Ubuntu x86_64 architecture.
Is there a better way to do this?
Any help is highly appreciated.
Thanks
The Linker Script
Linker Script on pastie.org
In Linux, you can use mprotect() to enable/disable text section write protection from the runtime code; see the Notes section in man 2 mprotect.
Here is a real-world example. First, however, a caveat:
I consider this just a proof of concept implementation, and not something I'd ever use in a real world application. It may look enticing for use in a high-performance library of some sort, but in my experience, changing the API (or the paradigm/approach) of the library usually yields much better results -- and fewer hard-to-debug bugs.
Consider the following six files:
foo1.c:
int foo1(const int a, const int b) { return a*a - 2*a*b + b*b; }
foo2.c:
int foo2(const int a, const int b) { return a*a + b*b; }
foo.h.header:
#ifndef FOO_H
#define FOO_H
extern int foo1(const int a, const int b);
extern int foo2(const int a, const int b);
foo.h.footer:
#endif /* FOO_H */
main.c:
#include <unistd.h>
#include <sys/mman.h>
#include <errno.h>
#include <string.h>
#include <stdio.h>
#include "foo.h"
int text_copy(const void *const target,
const void *const source,
const size_t length)
{
const long page = sysconf(_SC_PAGESIZE);
void *start = (char *)target - ((long)target % page);
size_t bytes = length + (size_t)((long)target % page);
/* Verify sane page size. */
if (page < 1L)
return errno = ENOTSUP;
/* Although length should not need to be a multiple of page size,
* adjust it up if need be. */
if (bytes % (size_t)page)
bytes = bytes + (size_t)page - (bytes % (size_t)page);
/* Disable write protect on target pages. */
if (mprotect(start, bytes, PROT_READ | PROT_WRITE | PROT_EXEC))
return errno;
/* Copy code.
* Note: if the target code is being executed, we're in trouble;
* this offers no atomicity guarantees, so other threads may
* end up executing some combination of old/new code.
*/
memcpy((void *)target, (const void *)source, length);
/* Re-enable write protect on target pages. */
if (mprotect(start, bytes, PROT_READ | PROT_EXEC))
return errno;
/* Success. */
return 0;
}
int main(void)
{
printf("foo1(): %d bytes at %p\n", foo1_SIZE, foo1_ADDR);
printf("foo2(): %d bytes at %p\n", foo2_SIZE, foo2_ADDR);
printf("foo1(3, 5): %d\n", foo1(3, 5));
printf("foo2(3, 5): %d\n", foo2(3, 5));
if (foo2_SIZE < foo1_SIZE) {
printf("Replacing foo1() with foo2(): ");
if (text_copy(foo1_ADDR, foo2_ADDR, foo2_SIZE)) {
printf("%s.\n", strerror(errno));
return 1;
}
printf("Done.\n");
} else {
printf("Replacing foo2() with foo1(): ");
if (text_copy(foo2_ADDR, foo1_ADDR, foo1_SIZE)) {
printf("%s.\n", strerror(errno));
return 1;
}
printf("Done.\n");
}
printf("foo1(3, 5): %d\n", foo1(3, 5));
printf("foo2(3, 5): %d\n", foo2(3, 5));
return 0;
}
function-info.bash:
#!/bin/bash
addr_prefix=""
addr_suffix="_ADDR"
size_prefix=""
size_suffix="_SIZE"
export LANG=C
export LC_ALL=C
nm -S "$#" | while read addr size kind name dummy ; do
[ -n "$addr" ] || continue
[ -n "$size" ] || continue
[ -z "$dummy" ] || continue
[ "$kind" = "T" ] || continue
[ "$name" != "${name#[A-Za-z]}" ] || continue
printf '#define %s ((void *)0x%sL)\n' "$addr_prefix$name$addr_suffix" "$addr"
printf '#define %s %d\n' "$size_prefix$name$size_suffix" "0x$size"
done || exit $?
Remember to make it executable using chmod u+x ./function-info.bash
First, compile the sources using valid sizes but invalid addresses:
gcc -W -Wall -O3 -c foo1.c
gcc -W -Wall -O3 -c foo2.c
( cat foo.h.header ; ./function-info.bash foo1.o foo2.o ; cat foo.h.footer) > foo.h
gcc -W -Wall -O3 -c main.c
The sizes are correct but the addresses are not, because the code is yet to be linked. Relative to the final binary, the object file contents are usually relocated at link time. So, link the sources to get example executable, example:
gcc -W -Wall -O3 main.o foo1.o foo2.o -o example
Extract the correct (sizes and) addresses:
( cat foo.h.header ; ./function-info.bash example ; cat foo.h.footer) > foo.h
Recompile and link,
gcc -W -Wall -O3 -c main.c
gcc -W -Wall -O3 foo1.o foo2.o main.o -o example
and verify that the constants now do match:
mv -f foo.h foo.h.used
( cat foo.h.header ; ./function-info.bash example ; cat foo.h.footer) > foo.h
cmp -s foo.h foo.h.used && echo "Done." || echo "Recompile and relink."
Due to high optimization (-O3) the code that utilizes the constants may change size, requiring a yet another recompile-relink. If the last line outputs "Recompile and relink", just repeat the last two steps, i.e. five lines.
(Note that since foo1.c and foo2.c do not use the constants in foo.h, they obviously do not need to be recompiled.)
On x86_64 (GCC-4.6.3-1ubuntu5), running ./example outputs
foo1(): 21 bytes at 0x400820
foo2(): 10 bytes at 0x400840
foo1(3, 5): 4
foo2(3, 5): 34
Replacing foo1() with foo2(): Done.
foo1(3, 5): 34
foo2(3, 5): 34
which shows that the foo1() function indeed was replaced. Note that the longer function is always replaced with the shorter one, because we must not overwrite any code outside the two functions.
You can modify the two functions to verify this; just remember to repeat the entire procedure (so that you use the correct _SIZE and _ADDR constants in main()).
Just for giggles, here is the generated foo.h for the above:
#ifndef FOO_H
#define FOO_H
extern int foo1(const int a, const int b);
extern int foo2(const int a, const int b);
#define foo1_ADDR ((void *)0x0000000000400820L)
#define foo1_SIZE 21
#define foo2_ADDR ((void *)0x0000000000400840L)
#define foo2_SIZE 10
#define main_ADDR ((void *)0x0000000000400610L)
#define main_SIZE 291
#define text_copy_ADDR ((void *)0x0000000000400850L)
#define text_copy_SIZE 226
#endif /* FOO_H */
You might wish to use a smarter scriptlet, say an awk one that uses nm -S to obtain all function names, addresses, and sizes, and in the header file replaces only the values of existing definitions, to generate your header file. I'd use a Makefile and some helper scripts.
Further notes:
The function code is copied as-is, no relocation etc. is done. (This means that if the machine code of the replacement function contains absolute jumps, the execution continues in the original code. These example functions were chosen, because they're unlikely to have absolute jumps in them. Run objdump -d foo1.o foo2.o to verify from the assembly.)
That is irrelevant if you use the example just to investigate how to modify executable code within the running process. However, if you build runtime-function-replacing schemes on top of this example, you may need to use position independent code for the replaced code (see the GCC manual for relevant options for your architecture) or do your own relocation.
If another thread or signal handler executes the code being modified, you're in serious trouble. You get undefined results. Unfortunately, some libraries start extra threads, which may not block all possible signals, so be extra careful when modifying code that might be run by a signal handler.
Do not assume the compiler compiles the code in a specific way or uses a specific organization. My example uses separate compilation units, to avoid the cases where the compiler might share code between similar functions.
Also, it examines the final executable binary directly, to obtain the sizes and addresses to be modified to modify an entire function implementation. All verifications should be done on the object files or final executable, and disassembly, instead of just looking at the C code.
Putting any code that relies on the address and size constants into a separate compilation unit makes it easier and faster to recompile and relink the binary. (You only need to recompile the code that uses the constants directly, and you can even use less optimization for that code, to eliminate extra recompile-relink cycles, without impacting the overall code quality.)
In my main.c, both the address and length supplied to mprotect() are page-aligned (based on the user parameters). The documents say only the address has to be. Since protections are page-granular, making sure the length is a multiple of the page size does not hurt.
You can read and parse /proc/self/maps (which is a kernel-generated pseudofile; see man 5 proc, /proc/[pid]/maps section, for further info) to obtain the existing mappings and their protections for the current process.
In any case, if you have any questions, I'd be happy to try and clarify the above.
Addendum:
It turns out that using the GNU extension dl_iterate_phdr() you can enable/disable write protection on all text sections trivially:
#define _GNU_SOURCE
#include <unistd.h>
#include <dlfcn.h>
#include <sys/mman.h>
#include <link.h>
static int do_write_protect_text(struct dl_phdr_info *info, size_t size, void *data)
{
const int protect = (data) ? PROT_READ | PROT_EXEC : PROT_READ | PROT_WRITE | PROT_EXEC;
size_t page;
size_t i;
page = sysconf(_SC_PAGESIZE);
if (size < sizeof (struct dl_phdr_info))
return ENOTSUP;
/* Ignore libraries. */
if (info->dlpi_name && info->dlpi_name[0] != '\0')
return 0;
/* Loop over each header. */
for (i = 0; i < (size_t)info->dlpi_phnum; i++)
if ((info->dlpi_phdr[i].p_flags & PF_X)) {
size_t ptr = (size_t)info->dlpi_phdr[i].p_vaddr;
size_t len = (size_t)info->dlpi_phdr[i].p_memsz;
/* Start at the beginning of the relevant page, */
if (ptr % page) {
len += ptr % page;
ptr -= ptr % page;
}
/* and use full pages. */
if (len % page)
len += page - (len % page);
/* Change protections. Ignore unmapped sections. */
if (mprotect((void *)ptr, len, protect))
if (errno != ENOMEM)
return errno;
}
return 0;
}
int write_protect_text(int protect)
{
int result;
result = dl_iterate_phdr(do_write_protect_text, (void *)(long)protect);
if (result)
errno = result;
return result;
}
Here is an example program you can use to test the above write_protect_text() function:
#define _POSIX_C_SOURCE 200809L
int dump_smaps(void)
{
FILE *in;
char *line = NULL;
size_t size = 0;
in = fopen("/proc/self/smaps", "r");
if (!in)
return errno;
while (getline(&line, &size, in) > (ssize_t)0)
if ((line[0] >= '0' && line[0] <= '9') ||
(line[0] >= 'a' && line[0] <= 'f'))
fputs(line, stdout);
free(line);
if (!feof(in) || ferror(in)) {
fclose(in);
return errno = EIO;
}
if (fclose(in))
return errno = EIO;
return 0;
}
int main(void)
{
printf("Initial mappings:\n");
dump_smaps();
if (write_protect_text(0)) {
fprintf(stderr, "Cannot disable write protection on text sections: %s.\n", strerror(errno));
return EXIT_FAILURE;
}
printf("\nMappings with write protect disabled:\n");
dump_smaps();
if (write_protect_text(1)) {
fprintf(stderr, "Cannot enable write protection on text sections: %s.\n", strerror(errno));
return EXIT_FAILURE;
}
printf("\nMappings with write protect enabled:\n");
dump_smaps();
return EXIT_SUCCESS;
}
The example program dumps /proc/self/smaps before and after changing the text section write protection, showing that it indeed does enable/disable write protectio on all text sections (program code). It does not try to alter write protect on dynamically-loaded libraries. This was tested to work on x86-64 using Ubuntu 3.8.0-35-generic kernel.
If you just want to have one executable with a writable .text, you can just link with -N
At least for me, binutils 2.22 , ld -N objectfile.o
will produce a binary that i can happily write around in.
Reading gcc pages, you can pass the linker option from gcc by : gcc -XN source

Where is the implementation of strlen() in GCC?

Can anyone point me to the definition of strlen() in GCC? I've been grepping release 4.4.2 for about a half hour now (while Googling like crazy) and I can't seem to find where strlen() is actually implemented.
You should be looking in glibc, not GCC -- it seems to be defined in strlen.c -- here's a link to strlen.c for glibc version 2.7... And here is a link to the glibc SVN repository online for strlen.c.
The reason you should be looking at glibc and not gcc is:
The GNU C library is used as the C library in the GNU system and most systems with the Linux kernel.
Here's the bsd implementation
size_t
strlen(const char *str)
{
const char *s;
for (s = str; *s; ++s)
;
return (s - str);
}
I realize this question is 4yrs old, but gcc will often include its own copy of strlen if you do not #include <string.h> and none of the answers (including the accepted answer) account for that. If you forget, you will get a warning:
file_name:line_number: warning: incompatible implicit declaration of built-in function 'strlen'
and gcc will inline its copy which on x86 is the repnz scasb asm variant unless you pass -Werror or -fno-builtin. The files related to this are in gcc/config/<platform>/<platform>.{c,md}
It is also controlled by gcc/builtins.c. In case you wondered if and how a strlen() was optimized to a constant, see the function defined as tree c_strlen(tree src, int only_value) in this file. It also controls how strlen (amongst others) is expanded and folded (based on the previously mentioned config/platform)
defined in glibc/string/strlen.c
#include <string.h>
#include <stdlib.h>
#undef strlen
#ifndef STRLEN
# define STRLEN strlen
#endif
/* Return the length of the null-terminated string STR. Scan for
the null terminator quickly by testing four bytes at a time. */
size_t
STRLEN (const char *str)
{
const char *char_ptr;
const unsigned long int *longword_ptr;
unsigned long int longword, himagic, lomagic;
/* Handle the first few characters by reading one character at a time.
Do this until CHAR_PTR is aligned on a longword boundary. */
for (char_ptr = str; ((unsigned long int) char_ptr
& (sizeof (longword) - 1)) != 0;
++char_ptr)
if (*char_ptr == '\0')
return char_ptr - str;
/* All these elucidatory comments refer to 4-byte longwords,
but the theory applies equally well to 8-byte longwords. */
longword_ptr = (unsigned long int *) char_ptr;
/* Bits 31, 24, 16, and 8 of this number are zero. Call these bits
the "holes." Note that there is a hole just to the left of
each byte, with an extra at the end:
bits: 01111110 11111110 11111110 11111111
bytes: AAAAAAAA BBBBBBBB CCCCCCCC DDDDDDDD
The 1-bits make sure that carries propagate to the next 0-bit.
The 0-bits provide holes for carries to fall into. */
himagic = 0x80808080L;
lomagic = 0x01010101L;
if (sizeof (longword) > 4)
{
/* 64-bit version of the magic. */
/* Do the shift in two steps to avoid a warning if long has 32 bits. */
himagic = ((himagic << 16) << 16) | himagic;
lomagic = ((lomagic << 16) << 16) | lomagic;
}
if (sizeof (longword) > 8)
abort ();
/* Instead of the traditional loop which tests each character,
we will test a longword at a time. The tricky part is testing
if *any of the four* bytes in the longword in question are zero. */
for (;;)
{
longword = *longword_ptr++;
if (((longword - lomagic) & ~longword & himagic) != 0)
{
/* Which of the bytes was the zero? If none of them were, it was
a misfire; continue the search. */
const char *cp = (const char *) (longword_ptr - 1);
if (cp[0] == 0)
return cp - str;
if (cp[1] == 0)
return cp - str + 1;
if (cp[2] == 0)
return cp - str + 2;
if (cp[3] == 0)
return cp - str + 3;
if (sizeof (longword) > 4)
{
if (cp[4] == 0)
return cp - str + 4;
if (cp[5] == 0)
return cp - str + 5;
if (cp[6] == 0)
return cp - str + 6;
if (cp[7] == 0)
return cp - str + 7;
}
}
}
}
libc_hidden_builtin_def (strlen)
glibc 2.26 has several hand optimized assembly implementations of strlen
As of glibc-2.26, a quick:
git ls-files | grep strlen.S
in the glibc tree shows a dozen of assembly hand-optimized implementations for all major archs and variations.
In particular, x86_64 alone has 3 variations:
sysdeps/x86_64/multiarch/strlen-avx2.S
sysdeps/x86_64/multiarch/strlen-sse2.S
sysdeps/x86_64/strlen.S
A quick and dirty way to determine which one is used, is to step debug a test program:
#include <assert.h>
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
int main(void) {
size_t size = 0x80000000, i, result;
char *s = malloc(size);
for (i = 0; i < size; ++i)
s[i] = 'a';
s[size - 1] = '\0';
result = strlen(s);
assert(result == size - 1);
return EXIT_SUCCESS;
}
compiled with:
gcc -ggdb3 -std=c99 -O0 a.c
Off the bat:
disass main
contains:
callq 0x555555554590 <strlen#plt>
so the libc version is being called.
After a few si instruction level steps into that, GDB reaches:
__strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:52
52 ../sysdeps/x86_64/multiarch/strlen-avx2.S: No such file or directory.
which tells me that strlen-avx2.S was used.
Then, I further confirm with:
disass __strlen_avx2
and compare the disassembly with the glibc source.
It is not surprising that the AVX2 version was used, since I have an i7-7820HQ CPU with launch date Q1 2017 and AVX2 support, and AVX2 is the most advanced of the assembly implementations, with launch date Q2 2013, while SSE2 is much more ancient from 2004.
This is where a great part of the hardcoreness of glibc comes from: it has a lot of arch optimized hand written assembly code.
Tested in Ubuntu 17.10, gcc 7.2.0, glibc 2.26.
-O3
TODO: with -O3, gcc does not use glibc's strlen, it just generates inline assembly, which is mentioned at: https://stackoverflow.com/a/19885891/895245
Is it because it can optimize even better? But its output does not contain AVX2 instructions, so I feel that this is not the case.
https://www.gnu.org/software/gcc/projects/optimize.html mentions:
Deficiencies of GCC's optimizer
glibc has inline assembler versions of various string functions; GCC has some, but not necessarily the same ones on the same architectures. Additional optab entries, like the ones for ffs and strlen, could be provided for several more functions including memset, strchr, strcpy and strrchr.
My simple tests show that the -O3 version is actually faster, so GCC made the right choice.
Asked at: https://www.quora.com/unanswered/How-does-GCC-know-that-its-builtin-implementation-of-strlen-is-faster-than-glibcs-when-using-optimization-level-O3
Although the original poster may not have known this or been looking for this, gcc internally inlines a number of so-called "builtin" c functions that it defines on its own, including some of the mem*() functions and (depending on the gcc version) strlen. In such cases, the library version is essentially never used, and pointing the person at the version in glibc is not strictly speaking correct. (It does this for performance reasons -- in addition to the improvement that inlining itself produces, gcc "knows" certain things about the functions when it provides them, such as, for example, that strlen is a pure function and that it can thus optimize away multiple calls, or in the case of the mem*() functions that no aliasing is taking place.)
For more information on this, see http://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
Is this what you are looking for? strlen() source. See the git repository for more information. The glibc resources page has links to the git repositories if you want to grab them rather than looking at the web view.
Google Code Search is a good starting point for questions like that. They usually point to various different sources and implementations of a function.
In your particular case: GoogleCodeSearch(strlen)
Google Code Search was completely shut down on March 2013
I realize that this is old question, you can find the linux kernel sources at github here, and the 32 bit implementation for strlen() could be found in strlen_32.c on github. The mentioned file has this implementation.
#include <linux/types.h>
#include <linux/string.h>
#include <linux/module.h>
size_t strlen(const char *s)
{
/* Get an aligned pointer. */
const uintptr_t s_int = (uintptr_t) s;
const uint32_t *p = (const uint32_t *)(s_int & -4);
/* Read the first word, but force bytes before the string to be nonzero.
* This expression works because we know shift counts are taken mod 32.
*/
uint32_t v = *p | ((1 << (s_int << 3)) - 1);
uint32_t bits;
while ((bits = __insn_seqb(v, 0)) == 0)
v = *++p;
return ((const char *)p) + (__insn_ctz(bits) >> 3) - s;
}
EXPORT_SYMBOL(strlen);
You can use this code, the simpler the better !
size_t Strlen ( const char * _str )
{
size_t i = 0;
while(_str[i++]);
return i;
}

Resources