I'm writing a little eBPF program to set a flag whenever the tracepoint consume_skb is triggered. When I try to load it however, the BPF verifier complains, saying R1 type=inv expected=map_ptr when trying to access the map. I tried to follow the samples given in the linux kernel as closely as possible, but I am failing at even reading anything from the map.
eBPF program
#include <uapi/linux/bpf.h>
#include <bpf/bpf_helpers.h>
struct {
__uint(type, BPF_MAP_TYPE_ARRAY);
__type(key, u32);
__type(value, u32);
__uint(max_entries, 1);
} my_map SEC(".maps");
SEC("tracepoint/skb/consume_skb")
int bpf_sys(void *ctx)
{
int index = 0;
int *value;
value = bpf_map_lookup_elem(&my_map, &index);
if (!value)
{
return 1;
}
return 0;
}
char _license[] SEC("license") = "GPL";
loader, until point of failure
int main()
{
int bfd;
unsigned char buf[1024] = {};
struct bpf_insn *insn;
union bpf_attr attr = {};
unsigned char log_buf[4096] = {};
int pfd;
int n;
bfd = open("./test_bpf", O_RDONLY);
if (bfd < 0)
{
printf("open eBPF program error: %s\n", strerror(errno));
exit(-1);
}
n = read(bfd, buf, 1024);
close(bfd);
insn = (struct bpf_insn*)buf;
attr.prog_type = BPF_PROG_TYPE_TRACEPOINT;
attr.insns = (unsigned long)insn;
attr.insn_cnt = n / sizeof(struct bpf_insn);
attr.license = (unsigned long)"GPL";
attr.log_size = sizeof(log_buf);
attr.log_buf = (unsigned long)log_buf;
attr.log_level = 1;
pfd = syscall(SYS_bpf, BPF_PROG_LOAD, &attr, sizeof(attr));
if (pfd < 0)
{
printf("bpf syscall error: %s\n", strerror(errno));
printf("log_buf = %s\n", log_buf);
exit(-1);
}
...
llvm output
Disassembly of section tracepoint/skb/consume_skb:
0000000000000000 bpf_sys:
; {
0: b7 01 00 00 00 00 00 00 r1 = 0
; int index = 0;
1: 63 1a fc ff 00 00 00 00 *(u32 *)(r10 - 4) = r1
2: bf a2 00 00 00 00 00 00 r2 = r10
3: 07 02 00 00 fc ff ff ff r2 += -4
; value = bpf_map_lookup_elem(&my_map, &index);
4: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
6: 85 00 00 00 01 00 00 00 call 1
7: bf 01 00 00 00 00 00 00 r1 = r0
8: b7 00 00 00 01 00 00 00 r0 = 1
; if (!value)
9: 15 01 01 00 00 00 00 00 if r1 == 0 goto +1 <LBB0_2>
10: b7 00 00 00 00 00 00 00 r0 = 0
0000000000000058 LBB0_2:
; }
11: 95 00 00 00 00 00 00 00 exit
Error when loading
bpf syscall error: Permission denied
log_buf = 0: (b7) r1 = 0
1: (63) *(u32 *)(r10 -4) = r1
last_idx 1 first_idx 0
regs=2 stack=0 before 0: (b7) r1 = 0
2: (bf) r2 = r10
3: (07) r2 += -4
4: (18) r1 = 0x0
6: (85) call bpf_map_lookup_elem#1
R1 type=inv expected=map_ptr
processed 6 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
Also, is there a better way to debug BPF error messages?
Issues
Parsing the bytecode
Your loader does not work, at least not directly on the object file compiled from your C code. It just reads from the file, does not parse for an ELF section where the bytecode would be contained. Apparently you're getting the verifier output for the correct eBPF bytecode, though; so I suspect you've been using it in a different way, or with some intermediate step.
Setting up the maps
With regards to your issue, your loader also misses the creation of a map and the ELF relocation of the FD for that map into the bytecode instructions, which would be why you get your error. Typically, when you load a program, several steps occur including:
The loader program (often via libbpf) creates all eBPF maps needed by the eBPF program
Alternatively, it reuses some existing maps. In both cases (existing or new maps), it retrieves a file descriptor for each map (e.g. via their pinned path for existing maps, or returned by the syscall when creating new maps).
Then it inserts this file descriptor into the eBPF bytecode, as an argument to the helper functions used for map access.
At last the loader program loads the updated bytecode into the kernel.
The bytecode goes through the verifier, where the file descriptors are replaced with pointers to the map areas in the kernel memory space.
Your loader is not creating maps (or retrieving FDs for existing, compatible maps), and does not update the bytecode accordingly. So the eBPF program sent to the kernel doesn't say what map it tries to access: we just see, from the verifier logs, 4: (18) r1 = 0x0, where at this stage it should be a memory pointer, such as 4: (18) r1 = 0xffff8ca2c29e3200.
Workarounds
You can obviously extend your loader application to handle maps and ELF relocation. But here are a few other suggestions.
Adding new programs directly amongst the kernel samples is not always ideal. The build system for these samples is rather involved, and relies on a number of header inclusions that are not really standard outside of the kernel repository. On older versions it would also not rely on libbpf but would reimplement most of the loading components, although I think this has improved.
So I would suggest building outside of the kernel repository, if this is not what you have been doing already. In particular, if you're working with C, you should have a look at how libbpf can help handling (loading, attaching, etc.) eBPF programs and other objects.
One of the best starting point would also be libbpf-boostrap, which provides a framework with templates for building eBPF programs and corresponding user space applications very quickly. I would definitely recommend looking at it.
Alternatively, you should be able to load your program with e.g. bpftool prog load <object_file.o> /sys/fs/bpf/<some_path>, although you can't attach it to tracepoints with bpftool only.
Related
Objective
I want to open a file multiple times, but each fd should be only allowed to write to a specific range.
Background
I have an EEPROM which contains multiple "partitions", where each partition holds different information.
I want to avoid that one partition overflows to a different one, or one partition can read other information and misinterpret them.
My Problem
I wanted to use fcntl along with F_OFD_SETLK so that I can lock a specific range on each opened fd.
Locking works as intended and trying to lock an already locked range will result in EAGAIN, which is expected.
What is not so obvious for me is, that I can write to a range that is locked by a different fd.
Question
Is it possible to lock a certain range in a file so that it is not writeable by a different opened fd?
If not, is there a different way to achieve my goal?
Code:
Link to onlinegdb: https://onlinegdb.com/ewE767rbu
#define _GNU_SOURCE 1
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
static void ex(const char *what)
{
perror(what);
exit(1);
}
static void lock_range(int fd, off_t start, off_t len)
{
struct flock ff = {
.l_type = F_WRLCK,
.l_whence = SEEK_SET,
.l_start = start,
.l_len = len,
};
if (fcntl(fd, F_OFD_SETLK, &ff) < 0)
perror("fcntl");
}
static void write_at(int fd, const char *str, off_t offset)
{
if (pwrite(fd, str, strlen(str), offset) < 0)
perror("pwrite");
}
int main()
{
int firstfd = open("/tmp/abc.txt", O_RDWR | O_CREAT | O_TRUNC, 0644);
if (firstfd < 0)
ex("open");
if (ftruncate(firstfd, 0x1000) < 0)
ex("ftruncate");
lock_range(firstfd, 0, 0x800);
lock_range(firstfd, 0, 0x800); // check if I can aquire the lock multiple times
int secondfd = open("/tmp/abc.txt", O_RDWR);
if (secondfd < 0)
ex("open");
lock_range(secondfd, 0, 0x800); // this one fails on purpose
lock_range(secondfd, 0x800, 0);
lock_range(firstfd, 0x801, 1); // and this one fails on purpose
write_at(firstfd, "hallo", 0);
write_at(firstfd, "hallo", 0x900); // this should fail, but doesn't
write_at(secondfd, "welt", 0); // this should fail, but doesn't
write_at(secondfd, "welt", 0x900);
close(firstfd);
close(secondfd);
system("hexdump -C /tmp/abc.txt"); // just for visualization
}
Output:
fcntl: Resource temporarily unavailable
fcntl: Resource temporarily unavailable
00000000 77 65 6c 74 6f 00 00 00 00 00 00 00 00 00 00 00 |welto...........|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000900 77 65 6c 74 6f 00 00 00 00 00 00 00 00 00 00 00 |welto...........|
00000910 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00001000
Please note welto which is hallo overriden by welt. I expected hallo at 0x0 and welt at 0x900.
Locks come in two flavours: mandatory locks, and advisory locks. These are advisory locks. This means they prevent others from obtaining a lock. Period. They don't prevent writes or any other form of modification.
If not, is there a different way to achieve my goal?
Don't ignore failures to obtain a lock.
I'm trying to write a simple socket filter eBPF program that can access the socket buffer data.
#include <linux/bpf.h>
#include <linux/if_ether.h>
#define SEC(NAME) __attribute__((section(NAME), used))
SEC("socket_filter")
int myprog(struct __sk_buff *skb) {
void *data = (void *)(long)skb->data;
void *data_end = (void *)(long)skb->data_end;
struct ethhdr *eth = data;
if ((void*)eth + sizeof(*eth) > data_end)
return 0;
return 1;
}
And I'm compiling using clang:
clang -I./ -I/usr/include/x86_64-linux-gnu/asm \
-I/usr/include/x86_64-linux-gnu/ -O2 -target bpf -c test.c -o test.elf
However when I try to load the program I get the following verifier error:
invalid bpf_context access off=80 size=4
My understanding of this error is that it should be thrown when you try to access context data that hasn't been checked to be within data_end, however my code does do that:
Here is the instructions for my program
0000000000000000 packet_counter:
0: 61 12 50 00 00 00 00 00 r2 = *(u32 *)(r1 + 80)
1: 61 11 4c 00 00 00 00 00 r1 = *(u32 *)(r1 + 76)
2: 07 01 00 00 0e 00 00 00 r1 += 14
3: b7 00 00 00 01 00 00 00 r0 = 1
4: 3d 12 01 00 00 00 00 00 if r2 >= r1 goto +1 <LBB0_2>
5: b7 00 00 00 00 00 00 00 r0 = 0
which would imply that the error is being caused by reading the pointer to data_end? However it only happens if I don't try to check the bounds later.
This is because your BPF program is a “socket filter”, and that such programs are not allowed to do direct packet access (see sk_filter_is_valid_access(), where we return false on trying to read skb->data or skb->data_end for example). I do not know the specific reason why it is not available, although I suspect this would be a security precaution as socket filter programs may be available to unprivileged users.
Your program loads just fine as a TC classifier, for example (bpftool prog load foo.o /sys/fs/bpf/foo type classifier -- By the way thanks for the standalone working reproducer, much appreciated!).
If you want to access data for a socket filter, you can still use the bpf_skb_load_bytes() (or bpf_skb_store_bytes()) helper, which automatically does the check on length. Something like this:
#include <linux/bpf.h>
#define SEC(NAME) __attribute__((section(NAME), used))
static void *(*bpf_skb_load_bytes)(const struct __sk_buff *, __u32,
void *, __u32) =
(void *) BPF_FUNC_skb_load_bytes;
SEC("socket_filter")
int myprog(struct __sk_buff *skb)
{
__u32 foo;
if (bpf_skb_load_bytes(skb, 0, &foo, sizeof(foo)))
return 0;
if (foo == 3)
return 0;
return 1;
}
Regarding your last comment:
However it only happens if I don't try to check the bounds later.
I suspect clang compiles out the assignments for data and data_end if you do not use them in your code, so they are no longer present and no longer a problem for the verifier.
Is there a function in a C lib to print data packets similar to Wireshark format (position then byte by byte)
I looked up their code and they use trees which was too complex for my task. I could also write my own version from scratch but I don't wanna be reinventing the wheel, so I was wondering if there is some code written that I can utilize. Any suggestions of a lib that I can use?
*The data I have is in a buffer of unsigned ints.
0000 01 02 ff 45 a3 00 90 00 00 00 00 00 00
0010 00 00 00 00 00 00 00 00 00 00 00 00 00
0020 00 00 00 00 00 00 00 00 00 00 00 00 00 ... etc
Thanks!
I doubt such a specific function exists in the libC, but the system is rather simple:
for (unsigned k = 0; k < len; k++)
{
if (k % 0x10 == 0)
printf("\n%04x", k);
if (k % 0x4 == 0)
printf(" ");
printf(" %02x", buffer[k] & 0xff);
}
Replace the first modulo by the line length, and the second by the word length and you're good (of course, try to make one a multiple of the other)
EDIT:
As I just noticed you mentioned the data you have is in a buffer of unsigned ints, you will have to cast it to an unsigned char buffer for this part.
Of course, you can do it with an unsigned buffer with bitwise shifts and four prints per loop, but that really makes for cumbersome code where it isn't necessary
i'm in serious trouble with a heap/stack corruption. To be able to set a data breakpoint and find the root of the problem, i want to take two core dumps using gdb and then compare them.
First one when i think the heap and stack are still ok, and a second one shortly before my program crashes.
How can i compare those dumps?
Information about my project:
using gcc 5.x
Plugin for a legacy, 3rd-party-program with RT-support. No sources available for the project (for me).
Legacy Project is C, My Plugin is C++.
Other things i tried:
Using address sanitizers -> won't work because the legacy program wont start with them.
Using undefined behavior sanitizers -> same
Figuring out what memory gets corrupted for data breakpoint -> no success, because the corrupted memory does not belong to my code.
Ran Valgrind -> no errors around my code.
Thank you for your help
Independent from your underlying motivation, I'd like to get into your question. You ask how the difference between two core dumps can be identified. This is going to be lengthy, but will hopefully give you your answer.
A core dump is represented by an ELF file that contains metadata and a specific set of memory regions (on Linux, this can be controlled via /proc/[pid]/coredump_filter) that were mapped into the given process at the time of dump creation.
The obvious way to compare the dumps would be to compare a hex-representation:
$ diff -u <(hexdump -C dump1) <(hexdump -C dump2)
--- /dev/fd/63 2020-05-17 10:01:40.370524170 +0000
+++ /dev/fd/62 2020-05-17 10:01:40.370524170 +0000
## -90,8 +90,9 ##
000005e0 00 00 00 00 00 00 00 00 00 00 00 00 80 1f 00 00 |................|
000005f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
The result is rarely useful because you're missing the context. More specifically, there's no straightforward way to get from the offset of a value change in the file to the offset corresponding to the process virtual memory address space.
So, more context if needed. The optimal output would be a list of VM addresses including before and after values.
Before we can get on that, we need a test scenario that loosely resembles yours. The following application includes a use-after-free memory issue that does not lead to a segmentation fault at first (a new allocation with the same size hides the issue). The idea here is to create a core dump using gdb (generate) during each phase based on break points triggered by the code:
dump1: Correct state
dump2: Incorrect state, no segmentation fault
dump3: Segmentation fault
The code:
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
#include <stdio.h>
int **g_state;
int main()
{
int value = 1;
g_state = malloc(sizeof(int*));
*g_state = &value;
if (g_state && *g_state) {
printf("state: %d\n", **g_state);
}
printf("no corruption\n");
raise(SIGTRAP);
free(g_state);
char **unrelated = malloc(sizeof(int*));
*unrelated = "val";
if (g_state && *g_state) {
printf("state: %d\n", **g_state);
}
printf("use-after-free hidden by new allocation (invalid value)\n");
raise(SIGTRAP);
printf("use-after-free (segfault)\n");
free(unrelated);
int *unrelated2 = malloc(sizeof(intptr_t));
*unrelated2 = 1;
if (g_state && *g_state) {
printf("state: %d\n", **g_state);
}
return 0;
}
Now, the dumps can be generated:
Starting program: test
state: 1
no corruption
Program received signal SIGTRAP, Trace/breakpoint trap.
0x00007ffff7a488df in raise () from /lib64/libc.so.6
(gdb) generate dump1
Saved corefile dump1
(gdb) cont
Continuing.
state: 7102838
use-after-free hidden by new allocation (invalid value)
Program received signal SIGTRAP, Trace/breakpoint trap.
0x00007ffff7a488df in raise () from /lib64/libc.so.6
(gdb) generate dump2
Saved corefile dump2
(gdb) cont
Continuing.
use-after-free (segfault)
Program received signal SIGSEGV, Segmentation fault.
main () at test.c:31
31 printf("state: %d\n", **g_state);
(gdb) generate dump3
Saved corefile dump3
A quick manual inspection shows the relevant differences:
# dump1
(gdb) print g_state
$1 = (int **) 0x602260
(gdb) print *g_state
$2 = (int *) 0x7fffffffe2bc
# dump2
(gdb) print g_state
$1 = (int **) 0x602260
(gdb) print *g_state
$2 = (int *) 0x4008c1
# dump3
$2 = (int **) 0x602260
(gdb) print *g_state
$3 = (int *) 0x1
Based on that output, we can clearly see that *g_state changed but is still a valid pointer in dump2. In dump3, the pointer becomes invalid. Of course, we'd like to automate this comparison.
Knowing that a core dump is an ELF file, we can simply parse it and generate a diff ourselves. What we'll do:
Open a dump
Identify PROGBITS sections of the dump
Remember the data and address information
Repeat the process with the second dump
Compare the two data sets and print the diff
Based on elf.h, it's relatively easy to parse ELF files. I created a sample implementation that compares two dumps and prints a diff that is similar to comparing two hexdump outputs using diff. The sample makes some assumptions (x86_64, mappings either match in terms of address and size or they only exist in dump1 or dump2), omits most error handling and always chooses a simple implementation approach for the sake of brevity.
#include <elf.h>
#include <fcntl.h>
#include <stdio.h>
#include <sys/mman.h>
#include <sys/stat.h>
#define MAX_MAPPINGS 1024
struct dump
{
char *base;
Elf64_Shdr *mappings[MAX_MAPPINGS];
};
unsigned readdump(const char *path, struct dump *dump)
{
unsigned count = 0;
int fd = open(path, O_RDONLY);
if (fd != -1) {
struct stat stat;
fstat(fd, &stat);
dump->base = mmap(NULL, stat.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
Elf64_Ehdr *header = (Elf64_Ehdr *)dump->base;
Elf64_Shdr *secs = (Elf64_Shdr*)(dump->base+header->e_shoff);
for (unsigned secinx = 0; secinx < header->e_shnum; secinx++) {
if (secs[secinx].sh_type == SHT_PROGBITS) {
if (count == MAX_MAPPINGS) {
count = 0;
break;
}
dump->mappings[count] = &secs[secinx];
count++;
}
}
dump->mappings[count] = NULL;
}
return count;
}
#define DIFFWINDOW 16
void printsection(struct dump *dump, Elf64_Shdr *sec, const char mode,
unsigned offset, unsigned sizelimit)
{
unsigned char *data = (unsigned char *)(dump->base+sec->sh_offset);
uintptr_t addr = sec->sh_addr+offset;
unsigned size = sec->sh_size;
data += offset;
if (sizelimit) {
size = sizelimit;
}
unsigned start = 0;
for (unsigned i = 0; i < size; i++) {
if (i%DIFFWINDOW == 0) {
printf("%c%016x ", mode, addr+i);
start = i;
}
printf(" %02x", data[i]);
if ((i+1)%DIFFWINDOW == 0 || i + 1 == size) {
printf(" [");
for (unsigned j = start; j <= i; j++) {
putchar((data[j] >= 32 && data[j] < 127)?data[j]:'.');
}
printf("]\n");
}
addr++;
}
}
void printdiff(struct dump *dump1, Elf64_Shdr *sec1,
struct dump *dump2, Elf64_Shdr *sec2)
{
unsigned char *data1 = (unsigned char *)(dump1->base+sec1->sh_offset);
unsigned char *data2 = (unsigned char *)(dump2->base+sec2->sh_offset);
unsigned difffound = 0;
unsigned start = 0;
for (unsigned i = 0; i < sec1->sh_size; i++) {
if (i%DIFFWINDOW == 0) {
start = i;
difffound = 0;
}
if (!difffound && data1[i] != data2[i]) {
difffound = 1;
}
if ((i+1)%DIFFWINDOW == 0 || i + 1 == sec1->sh_size) {
if (difffound) {
printsection(dump1, sec1, '-', start, DIFFWINDOW);
printsection(dump2, sec2, '+', start, DIFFWINDOW);
}
}
}
}
int main(int argc, char **argv)
{
if (argc != 3) {
fprintf(stderr, "Usage: compare DUMP1 DUMP2\n");
return 1;
}
struct dump dump1;
struct dump dump2;
if (readdump(argv[1], &dump1) == 0 ||
readdump(argv[2], &dump2) == 0) {
fprintf(stderr, "Failed to read dumps\n");
return 1;
}
unsigned sinx1 = 0;
unsigned sinx2 = 0;
while (dump1.mappings[sinx1] || dump2.mappings[sinx2]) {
Elf64_Shdr *sec1 = dump1.mappings[sinx1];
Elf64_Shdr *sec2 = dump2.mappings[sinx2];
if (sec1 && sec2) {
if (sec1->sh_addr == sec2->sh_addr) {
// in both
printdiff(&dump1, sec1, &dump2, sec2);
sinx1++;
sinx2++;
}
else if (sec1->sh_addr < sec2->sh_addr) {
// in 1, not 2
printsection(&dump1, sec1, '-', 0, 0);
sinx1++;
}
else {
// in 2, not 1
printsection(&dump2, sec2, '+', 0, 0);
sinx2++;
}
}
else if (sec1) {
// in 1, not 2
printsection(&dump1, sec1, '-', 0, 0);
sinx1++;
}
else {
// in 2, not 1
printsection(&dump2, sec2, '+', 0, 0);
sinx2++;
}
}
return 0;
}
With the sample implementation, we can re-evaluate our scenario above. A except from the first diff:
$ ./compare dump1 dump2
-0000000000601020 86 05 40 00 00 00 00 00 50 3e a8 f7 ff 7f 00 00 [..#.....P>......]
+0000000000601020 00 6f a9 f7 ff 7f 00 00 50 3e a8 f7 ff 7f 00 00 [.o......P>......]
-0000000000602260 bc e2 ff ff ff 7f 00 00 00 00 00 00 00 00 00 00 [................]
+0000000000602260 c1 08 40 00 00 00 00 00 00 00 00 00 00 00 00 00 [..#.............]
-0000000000602280 6e 6f 20 63 6f 72 72 75 70 74 69 6f 6e 0a 00 00 [no corruption...]
+0000000000602280 75 73 65 2d 61 66 74 65 72 2d 66 72 65 65 20 68 [use-after-free h]
-0000000000602290 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [................]
+0000000000602290 69 64 64 65 6e 20 62 79 20 6e 65 77 20 61 6c 6c [idden by new all]
The diff shows that *gstate (address 0x602260) was changed from 0x7fffffffe2bc to 0x4008c1:
-0000000000602260 bc e2 ff ff ff 7f 00 00 00 00 00 00 00 00 00 00 [................]
+0000000000602260 c1 08 40 00 00 00 00 00 00 00 00 00 00 00 00 00 [..#.............]
The second diff with only the relevant offset:
$ ./compare dump1 dump2
-0000000000602260 c1 08 40 00 00 00 00 00 00 00 00 00 00 00 00 00 [..#.............]
+0000000000602260 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [................]
The diff shows that *gstate (address 0x602260) was changed from 0x4008c1 to 0x1.
There you have it, a core dump diff. Now, whether or not that can prove to be useful in your scenario depends on various factors, one being the timeframe between the two dumps and the activity that takes place within that window. A large diff will possibly be difficult to analyze, so the aim must be to minimize its size by choosing the diff window carefully.
The more context you have, the easier the analysis will turn out to be. For example, the relevant scope of the diff could be reduced by limiting the diff to addresses of the .data and .bss sections of the library in question if changes in there are relevant to your situation.
Another approach to reduce the scope: excluding changes to memory that is not referenced by the library. The relationship between arbitrary heap allocations and specific libraries is not immediately apparent. Based on the the addresses of changes in your initial diff, you could search for pointers in the .data and .bss sections of the library right in the diff implementation. This does not take every possible reference into account (most notably indirect references from other allocations, register and stack references of library-owned threads), but it's a start.
I've found few topics about it here, but none of them explained the issue I have.
I'm simply trying to get access to internal status register of PCIe device, by mapping it to user memory space in linux.
Here is my system configuration:
# uname -a
Linux localhost.localdomain 4.18.13-200.fc28.x86_64 #1 SMP Wed Oct 10 17:29:59 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
# lspci -tv
-[0000:00]-+-00.0 Intel Corporation Device 1980
+-04.0 Intel Corporation Device 19a1
<...>
+-1f.2 Intel Corporation Device 19de
# cat /proc/iomem
df570000-df573fff : 0000:00:1f.2
# lspci -s 00:1f.2 -x
00:1f.2 Memory controller: Intel Corporation Device 19de (rev 11)
00: 86 80 de 19 00 00 00 00 11 00 80 05 00 00 80 00
10: 00 00 57 df 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 d9 15 69 09
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
So, the device is sitting on 00:1f.2 and visible for the system. I try to get access to internal "ERRCORSTS" register with offset 0x110 of memory controller, which shows the error status of individual correctable error sources on a PCI Express device page1673 (here is the manual for my SoC ). The output that I get from my program is:
data = ffffffff
PCI BAR0 0x0000 = 0xffff
It seems that I'm missing something in understanding of linux memory mapping or may be they just changed something in 4.18 kernel, so it's not so easy as it was before.
Could anybody help me out with it, please?
Here is my code:
#include <sys/types.h>
#include <unistd.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#define BASE_ADDR 0xdf570000
#define DATA_OFFSET 0x110
extern int errno;
int main()
{
int i;
int fd = open("/dev/mem",O_RDWR|O_SYNC);
if(fd < 0) {
printf("Can't open /dev/mem\n");
return 1;
}
u_int32_t* mapped_base = (u_int32_t *) mmap(0, 4096UL, PROT_READ|PROT_WRITE, MAP_SHARED, fd, BASE_ADDR);
// Trying to get access to device memory
if(mapped_base == NULL) {
printf("Can't mmap\n");
return 1;
} else {
unsigned int status_register0 = *(int *)(mapped_base + DATA_OFFSET );
printf("data = %lx \n",status_register0);
}
// Trying to get access to DevID
int fb = open("/sys/devices/pci0000:00/0000:00:1f.2/resource0", O_RDWR | O_SYNC);
u_int32_t* ptr = mmap(0, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fb, 0);
printf("PCI BAR0 0x0000 = 0x%4x\n", *((unsigned short *) ptr) );
return 0;
}
This performs pointer arithmetic.
mapped_base + DATA_OFFSET
The offset is automatically multiplied by the element size, which is probably 4 based on
u_int32_t* mapped_base
However, your documentation appears to specify offsets in bytes.
Thus, you need to read 0xdf570110 but are actually reading 0xdf570440
It seems that I messed up the memory, when I experimented with it. So everything works after I reboot the machine.