FPGA Analog-to-Digital Conversion DE1-SOC - c

I have a DE1-SOC(Rev. C) running Linux. I am having problems accessing the onboard ADC. The input to all 8 channels is a 3V Pk-Pk sinusoidal signal. The onboard ADC is an AD7928 12-bit 8-channel ADC. The datasheet says that the ADC can handle bipolar signals, and gives the following circuit diagram:
AD7928 Bipolar Circuit Diagram
All eight channels need to be sampled continually. and the DE1-SOC datasheet specifies to set the channel one register to 1, which activates the automatic update option on the ADC. Here's my first attempt at the code. It compiles and runs but the values aren't correct, as the same signal that's being fed into the ADC is also being measured by my oscilloscope.
#include <inttypes.h>
#include <time.h>
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdint.h>
#include <sys/mman.h>
/* FPGA HPS BRIDGE BASE */
#define LW_BRIDGE_BASE (0xFF200000)
#define HW_REGS_BASE (0xFF200000)
#define HW_REGS_SPAN (0x00200000)
#define HW_REGS_MASK ( HW_REGS_SPAN - 1 )
/* HPS-2-FPGA AXI Bridge */
#define ALT_AXI_FPGASLVS_OFST (0xC0000000) // axi_master
#define HW_FPGA_AXI_SPAN (0x40000000) // Bridge span 1GB
#define HW_FPGA_AXI_MASK ( HW_FPGA_AXI_SPAN - 1 )
/* ADC REGISTER SPAN */
#define ADC_BASE (0x00004000)
/* ADC CHANNEL & UPDATE REGISTERS */
#define ADC_CH0_UPDATE (LW_BRIDGE_BASE+ADC_BASE)
#define ADC_CH1_AUTO_UPDATE (LW_BRIDGE_BASE+ADC_BASE+4) // Write 1 for continual ADC request
#define ADC_CH2 (LW_BRIDGE_BASE+ADC_BASE+8)
#define ADC_CH3 (LW_BRIDGE_BASE+ADC_BASE+12)
#define ADC_CH4 (LW_BRIDGE_BASE+ADC_BASE+16)
#define ADC_CH5 (LW_BRIDGE_BASE+ADC_BASE+20)
#define ADC_CH6 (LW_BRIDGE_BASE+ADC_BASE+24)
#define ADC_CH7 (LW_BRIDGE_BASE+ADC_BASE+28)
/* ADC REGISTER END */
#define ADC_END (0x0000001F)
int main() {
// Defining variables
void *virtual_base;
int fd;
volatile int *h2p_lw_adc_addr;
int i;
//Defining pointer for register
if((fd = open( "/dev/mem",(O_RDWR | O_SYNC ))) == -1) {
printf("ERROR: could not open \"/dev/mem\"...\n");
return(1);
}
virtual_base = mmap(NULL,HW_REGS_SPAN,(PROT_READ | PROT_WRITE),MAP_SHARED,fd,HW_REGS_BASE);
if(virtual_base == MAP_FAILED) {
printf("ERROR: mmap() failed...\n");
close(fd);
return(1);
}
h2p_lw_adc_addr = virtual_base + ((int)(LW_BRIDGE_BASE + ADC_BASE)&(int)(HW_REGS_MASK));
float Vref = 5.0;
float stepSize = Vref/4096.0;
/* Heading & Calculating Step Size/Resolution */
printf("*____________________________________*\n");
printf("* Setting up the AD7928 ADC *\n");
printf("*____________________________________*\n");
printf("Resolution for 5V Vref: %f[mV]\n", stepSize*1000);
// Setting up the ADC for bipolar signal
// ...
// Auto-update all channels continuously
*(int *)(h2p_lw_adc_addr + 4) = 1;
// Sample a single channel
// ...
/* Data Collection Attempt #1 */
int num = 5; // Number of samples?
unsigned int samples[num];
int channel = 16; // channel 4
for (i = 0; i < num; i++){
samples[i] = *(int *)(h2p_lw_adc_addr + channel);
}
if(munmap(virtual_base, HW_REGS_SPAN) != 0) {
printf("ERROR: munmap() failed...\n");
close(fd);
return(1);
}
close(fd);
return 0;
}
It gets cross-compiled using this Makefile:
C_SRC := adc.c
CFLAGS := -g -O0 -Wall
LDFLAGS := -lm
CROSS_COMPILE := arm-linux-gnueabihf-
CC := $(CROSS_COMPILE)gcc
NM := $(CROSS_COMPILE)nm
ifeq ($(or $(COMSPEC),$(ComSpec)),)
RM := rm -rf
else
RM := cs-rm -rf
endif
ELF ?= adc
OBJ := $(patsubst %.c,%.o,$(C_SRC))
.c.o:
$(CC) $(CFLAGS) -c $< -o $#
.PHONY: all
all: $(ELF)
.PHONY:
clean:
$(RM) $(ELF) $(OBJ) $(OBJS) *.map *.objdump
$(ELF): $(OBJ) $(OBJS)
$(CC) $(CFLAGS) $(OBJ) $(OBJS) -o $# $(LDFLAGS)
$(NM) $# > $#.map
I'm a noobie when it comes to ADCs and DSP, but ideally, I would like to be able to continually measure all eight channels, recording the pk-pk (amplitude) of the incoming sine waves in each one, which will eventually be used for post-processing.
As of right now, the output for the five samples is always 0, except when I sample channel 1, then all five samples are 1, like so:
Samples [0]: 1
Samples [1]: 1
Samples [2]: 1
Samples [3]: 1
Samples [4]: 1
Even when I increase the number of samples, it's always 1 for Channel 1 and 0 for all the other channels.
I think my problem is probably a combination of my code and also maybe not having the buffering circuitry? (But I'm not handling the bipolar input only because I can set the DC offset on my signal generator so it's an all positive 3v pk-pk.)
Vref on the ADC is being fed an even 5V DC. I'm pretty lost right now, so any help or pointers would be greatly appreciated.

I bet that your problem is in the following lines:
> volatile int *h2p_lw_adc_addr;
>
> *(int *)(h2p_lw_adc_addr + 4) = 1;
>
> samples[i] = *(int *)(h2p_lw_adc_addr + channel);
Because h2p_lw_adc_addr is pointer to int, you will get wrong addresses from the later two lines.
When you add number N to the int pointer, the result pointer is N * sizeof(int) bigger than the int pointer.
Change the type of h2p_lw_adc_addr to char pointer to get quick fix:
volatile char *h2p_lw_adc_addr;
Or alternatively, you can change the offsets:
*(int *)(h2p_lw_adc_addr + 1) = 1;
int channel = 4; // channel 4
But in that case I propose to use int32_t or uint32_t instead on int:

Related

Raspberry PI 3 UART0 Not Transmitting (Bare-Metal)

Introduction
I have been working on writing my own bare metal code for a Raspberry PI as I build up my bare metal skills and learn about kernel mode operations. However, due to the complexity, amount of documentation errors, and missing/scattered info, it has been extremely difficult to finally bring up a custom kernel on the Raspberry PI. However, I finally got that working.
A very broad overview of what is happening in the bootstrap process
My kernel loads into 0x80000, sends all cores except core 0 into an infinite loop, sets up the Stack Pointer, and calls a C function. I can setup the GPIO pins and turn them on and off. Using some additional circuitry, I can drive LEDs and confirm that my code is executing.
The problem
However when it comes to the UART, I have hit a wall. I am using UART0 (PL011). As far as I can tell, the UART is not outputting, although I could be missing it on my scope since I only have an analog oscilloscope. The code gets stuck when outputting the string. I have determined through hours of reflashing my SD card with different YES/NO questions to my LEDs that it is stuck in an infinite loop waiting for the UART Transmit FIFO Full flag to clear. The UART only accepts 1 byte before becoming full. I can not figure out why it is not transmitting the data out. I am also not sure if I have correctly set my baud-rate, but I don't think that would cause the TX FIFO to stay filled.
Getting a foothold in the code
Here is my code. The execution begins at the very beginning of the binary. It is constructed by being linked with symbol "my_entry_pt" from assembly source "entry.s" in the linker script. That is where you will find the entry code. However, you probably only need to look at the last file which is the C code in "base.c". The rest is just bootstrapping up to that. Please disregard some comments/names which don't make sense. This is a port (of primarily the build infrastructure) from an earlier bare-metal project of mine. That project used a RISC-V development board which uses a memory mapped SPI flash to store the binary code of the program.:
[Makefile]
TUPLE := aarch64-unknown-linux-gnu
CC := $(TUPLE)-gcc
OBJCPY := $(TUPLE)-objcopy
STRIP := $(TUPLE)-strip
CFLAGS := -Wall -Wextra -std=c99 -O2 -march=armv8-a -mtune=cortex-a53 -mlittle-endian -ffreestanding -nostdlib -nostartfiles -Wno-unused-parameter -fno-stack-check -fno-stack-protector
LDFLAGS := -static
GFILES :=
KFILES :=
UFILES :=
# Global Library
#GFILES := $(GFILES)
# Kernel
# - Core (Entry/System Setup/Globals)
KFILES := $(KFILES) ./src/kernel/base.o
KFILES := $(KFILES) ./src/kernel/entry.o
# Programs
# - Init
#UFILES := $(UFILES)
export TUPLE
export CC
export OBJCPY
export STRIP
export CFLAGS
export LDFLAGS
export GFILES
export KFILES
export UFILES
.PHONY: all rebuild clean
all: prog-metal.elf prog-metal.elf.strip prog-metal.elf.bin prog-metal.elf.hex prog-metal.elf.strip.bin prog-metal.elf.strip.hex
rebuild: clean
$(MAKE) all
clean:
rm -f *.elf *.strip *.bin *.hex $(GFILES) $(KFILES) $(UFILES)
%.o: %.c
$(CC) $(CFLAGS) $^ -c -o $#
%.o: %.s
$(CC) $(CFLAGS) $^ -c -o $#
prog-metal.elf: $(GFILES) $(KFILES) $(UFILES)
$(CC) $(CFLAGS) $^ -T ./bare_metal.ld $(LDFLAGS) -o $#
prog-%.elf.strip: prog-%.elf
$(STRIP) -s -x -R .comment -R .text.startup -R .riscv.attributes $^ -o $#
%.elf.bin: %.elf
$(OBJCPY) -O binary $^ $#
%.elf.hex: %.elf
$(OBJCPY) -O ihex $^ $#
%.strip.bin: %.strip
$(OBJCPY) -O binary $^ $#
%.strip.hex: %.strip
$(OBJCPY) -O ihex $^ $#
emu: prog-metal.elf.strip.bin
qemu-system-aarch64 -kernel ./prog-metal.elf.strip.bin -m 1G -cpu cortex-a53 -M raspi3 -serial stdio -display none
emu-debug: prog-metal.elf.strip.bin
qemu-system-aarch64 -kernel ./prog-metal.elf.strip.bin -m 1G -cpu cortex-a53 -M raspi3 -serial stdio -display none -gdb tcp::1234 -S
debug:
$(TUPLE)-gdb -ex "target remote localhost:1234" -ex "layout asm" -ex "tui reg general" -ex "break *0x00080000" -ex "break *0x00000000" -ex "set scheduler-locking step"
[bare_metal.ld]
/*
This is not actually needed (At least not on actual hardware.), but
it explicitly sets the entry point in the .elf file to be the same
as the true entry point to the program. The global symbol my_entry_pt
is located at the start of src/kernel/entry.s. More on this below.
*/
ENTRY(my_entry_pt)
MEMORY
{
/*
This is the memory address where this program will reside.
It is the reset vector.
*/
ram (rwx) : ORIGIN = 0x00080000, LENGTH = 0x0000FFFF
}
SECTIONS
{
/*
Force the linker to starting at the start of memory section: ram
*/
. = 0x00080000;
.text : {
/*
Make sure the .text section from src/kernel/entry.o is
linked first. The .text section of src/kernel/entry.s
is the actual entry machine code for the kernel and is
first in the file. This way, at reset, exection starts
by jumping to this machine code.
*/
src/kernel/entry.o (.text);
/*
Link the rest of the kernel's .text sections.
*/
*.o (.text);
} > ram
/*
Put in the .rodata in the flash after the actual machine code.
*/
.rodata : {
*.o (.rodata);
*.o (.rodata.*);
} > ram
/*
END: Read Only Data
START: Writable Data
*/
.sbss : {
*.o (.sbss);
} > ram
.bss : {
*.o (.bss);
} > ram
section_KHEAP_START (NOLOAD) : ALIGN(0x10) {
/*
At the very end of the space reserved for global variables
in the ram, link in this custom section. This is used to
add a symbol called KHEAP_START to the program that will
inform the C code where the heap can start. This allows the
heap to start right after the global variables.
*/
src/kernel/entry.o (section_KHEAP_START);
} > ram
/*
Discard everything that hasn't be explictly linked. I don't
want the linker to guess where to put stuff. If it doesn't know,
don't include it. If this casues a linking error, good. I want
to know that I need to fix something, rather than a silent failure
that could cause hard to debug issues later. For instance,
without explicitly setting the .sbss and .bss sections above,
the linker attempted to put my global variables after the
machine code in the flash. This would mean that ever access to
those variables would mean read a write to the external SPI flash
IC on real hardware. I do not believe that initialized globals
are possible since there is nothing to initialize them. So I don't
want to, for instance, include the .data section.
*/
/DISCARD/ : {
* (.*);
}
}
[src/kernel/entry.s]
.section .text
.globl my_entry_pt
// This is the Arm64 Kernel Header (64 bytes total)
my_entry_pt:
b end_of_header // Executable code (64 bits)
.align 3, 0, 7
.quad my_entry_pt // text_offset (64 bits)
.quad 0x0000000000000000 // image_size (64 bits)
.quad 0x000000000000000A // flags (1010: Anywhere, 4K Pages, LE) (64 bits)
.quad 0x0000000000000000 // reserved 2 (64 bits)
.quad 0x0000000000000000 // reserved 3 (64 bits)
.quad 0x0000000000000000 // reserved 4 (64 bits)
.int 0x644d5241 // magic (32 bits)
.int 0x00000000 // reserved 5 (32 bits)
end_of_header:
// Check What Core This Is
mrs x0, VMPIDR_EL2
and x0, x0, #0x3
cmp x0, #0x0
// If this is not core 0, go into an infinite loop
bne loop
// Setup the Stack Pointer
mov x2, #0x00030000
mov sp, x2
// Get the address of the C main function
ldr x1, =kmain
// Call the C main function
blr x1
loop:
nop
b loop
.section section_KHEAP_START
.globl KHEAP_START
KHEAP_START:
[src/kernel/base.c]
void pstr(char* str) {
volatile unsigned int* AUX_MU_IO_REG = (unsigned int*)(0x3f201000 + 0x00);
volatile unsigned int* AUX_MU_LSR_REG = (unsigned int*)(0x3f201000 + 0x18);
while (*str != 0) {
while (*AUX_MU_LSR_REG & 0x00000020) {
// TX FIFO Full
}
*AUX_MU_IO_REG = (unsigned int)((unsigned char)*str);
str++;
}
return;
}
signed int kmain(unsigned int argc, char* argv[], char* envp[]) {
char* text = "Test Output String\n";
volatile unsigned int* AUXENB = 0;
//AUXENB = (unsigned int*)(0x20200000 + 0x00);
//*AUXENB |= 0x00024000;
//AUXENB = (unsigned int*)(0x20200000 + 0x08);
//*AUXENB |= 0x00000480;
// Set Baud Rate to 115200
AUXENB = (unsigned int*)(0x3f201000 + 0x24);
*AUXENB = 26;
AUXENB = (unsigned int*)(0x3f201000 + 0x28);
*AUXENB = 0;
AUXENB = (unsigned int*)(0x3f200000 + 0x04);
*AUXENB = 0;
// Set GPIO Pin 14 to Mode: ALT0 (UART0)
*AUXENB |= (04u << ((14 - 10) * 3));
// Set GPIO Pin 15 to Mode: ALT0 (UART0)
*AUXENB |= (04u << ((15 - 10) * 3));
AUXENB = (unsigned int*)(0x3f200000 + 0x08);
*AUXENB = 0;
// Set GPIO Pin 23 to Mode: Output
*AUXENB |= (01u << ((23 - 20) * 3));
// Set GPIO Pin 24 to Mode: Output
*AUXENB |= (01u << ((24 - 20) * 3));
// Turn ON Pin 23
AUXENB = (unsigned int*)(0x3f200000 + 0x1C);
*AUXENB = (1u << 23);
// Turn OFF Pin 24
AUXENB = (unsigned int*)(0x3f200000 + 0x28);
*AUXENB = (1u << 24);
// Enable TX on UART0
AUXENB = (unsigned int*)(0x3f201000 + 0x30);
*AUXENB = 0x00000101;
pstr(text);
// Turn ON Pin 24
AUXENB = (unsigned int*)(0x3f200000 + 0x1C);
*AUXENB = (1u << 24);
return 0;
}
Debugging up the this point
So it turns out that all of us were right. My initial trouble shooting in response to #Xiaoyi Chen was wrong. I rebooted back into Raspberry Pi OS to check on a hunch. I was connected to the PI using a 3.3V UART adapter connected to pins 8 (GPIO 14, UART0 TX), 10 (GPIO 15, UART0 RX), and GND(for a common ground of course). I could see the boot messages and a getty login prompt which I could log into. I figured that meant that the PL011 was working, but when I actually checked the process list in htop, I found that getty was actually running on /dev/ttyS0 not /dev/ttyAMA0. /dev/ttyAMA0 was actually being tied to the bluetooth module with the hciattach command in another process listing.
According to the documentation here: https://www.raspberrypi.org/documentation/configuration/uart.md , /dev/ttyS0 is the mini UART while /dev/AMA0 is the PL011, but it also says that UART0 is PL011 and UART1 is the mini UART. Furthermore, the GPIO Pinouts and the BCM2835 documentation say that GPIO Pins 14 and 15 are for the UART0 TX and RX. So something did not add up if I can see the login prompt on pins 14 and 15 when Linux is using the mini UART, but I am supposedly physically connected to the PL011. If I SSHed in and tried to open /dev/ttyAMA0 with minicom, I could see nothing happening. However, if I did the same with /dev/ttyS0, it would conflict with the login terminal. This confirmed to me that /dev/ttyS0 was in fact in use for the boot console.
The Answer
If I set "dtoverlay=disable-bt" in config.txt, the above behavior changed to match expectations. Rebooting the PI made it once again come up with a console on header pins 8 and 10, but checking the process listing showed that this time getty was using /dev/ttyAMA0. If then set "dtoverlay=disable-bt" in config.txt with my custom kernel, the program executed as expected, printing out my string and turned on the second LED. Since the outputs of the PL011 were never actually setup, since it was redirected by some magic, it makes sense that it would not be working as #PMF suggested. This whole deal has just reaffirmed my assertion that the documentation for this so-called "learning computer" is atrocious.
For those who are curious, here are the last few lines from my config.txt:
[all]
dtoverlay=disable-bt
enable_uart=1
core_freq=250
#uart_2ndstage=1
force_eeprom_read=0
disable_splash=1
init_uart_clock=48000000
init_uart_baud=115200
kernel_address=0x80000
kernel=prog-metal.elf.strip.bin
arm_64bit=1
Remaining Questions
A few things still bother me. I could have sworn that I already tried setting "dtoverlay=disable-bt".
Secondly it does seem that this preforms some kind of magic under the hood that is not documented(I know of no documentation for it.) and I do not understand. I can find nothing in the published schematics that redirect the output of GPIO 14 and 15 from the SOC. So ether the schematics are incomplete or there is some proprietary magic taking place inside the SOC which redirects the pins, contradicting the documentation.
I also have questions about precedence when it comes to the config.txt options and setting things up elsewhere.
Anyway, thank you for the help everyone.
My suggestion:
flash your SD card to a rpi distribution to make sure the hardware is still working
if the hardware is good, check the difference of your code with the in-kernel serial driver

Qemu baremetal emulation - how to view UART output?

Question:
How do I get the UART output from a baremetal program run with Qemu?
Background
Here is the command line invocation I have been using:
qemu-system-arm -M xilinx-zynq-a9 -cpu cortex-a9 -nographic -kernel $BUILD_DIR/mm.elf -m 512M -s -S
using machine xilinx-zynq-a9
processor cortex-a9
as this is baremetal, the executable is a self contained ELF file
-m 512M signifies the platform has 512 MiB of RAM
-s is a shortcut for -gdb tcp::1234
-S means freeze CPU at startup
The ELF file I am using (mm.elf) performs a simple matrix multiply operation, and then prints whether it succeeded or failed, and how long it took to run. The ELF was compiled using the Xilinx ARM toolchain. I am using this for software fault injection. Currently I use GDB to ask for the values of the variables which are supposed to be printed. However, as there are many things which could go wrong with printing in the context of fault injection, it would be nice to see what is actually sent over UART.
Related answers:
redirect QEMU window output to terminal running qemu
This has some suggestions I tried, but it isn't applicable because the question was about getting the Linux boot messages in the host terminal window.
How to run a program without an operating system?
This is one isn't very related because it still assumes that the user has a bootloader of some kind. While there must technically be a bootloader for the application to run at all, Xilinx provides this system code in files like boot.S, which are then compiled into the ELF file as code which runs before main.
Things that I have tried:
I tried adding each of these onto the end of my current Qemu command. The results follow the parameters tried.
-serial mon:stdio
nothing
-serial null -serial mon:stdio (because Cortex-A9 has two UARTs)
nothing
the above two with -semihosting added
nothing
-serial stdio
cannot use stdio by multiple character devices
could not connect serial device to character backend 'stdio'
-console=/dev/tty
invalid option
-curses
black screen, no output
Investigation
I looked at the disassembly of the ELF file and verified that the address to which the the UART messages are being written is the same as the Qemu setup expects (info mtree). Base address is 0xe0000000, the same in both places.
Goal
I want to be able to capture the output of messages sent to UART. If this is done by redirecting to stdout, that's fine. If it goes through a TCP socket, that's fine too. The fault injection setup uses Python, and Qemu is running as a subprocess, so it would be easy to get the output from either one of those sources.
Note: when run in the fault injection setup, the Qemu invocation is
qemu-system-arm -M xilinx-zynq-a9 -cpu cortex-a9 -nographic -kernel $BUILD_DIR/mm.elf -m 512M -gdb tcp::3345 -S -monitor telnet::3347,server,nowait
The main differences being 1) the GDB port number is different (so multiple instances can run simultaneously) and 2) Qemu is to be controlled using a telnet connection over a socket, so it can be controlled by the Python script.
You need to initialize the UART prior to attempt outputing any characters.
The UART0 emulation is working fine for example by using a slightly modified version of this program:
/opt/qemu-4.2.0/bin/qemu-system-arm -semihosting --semihosting-config enable=on,target=native -nographic -serial mon:stdio -machine xilinx-zynq-a9 -m 768M -cpu cortex-a9 -kernel hello05.elf
Hello number 1
The output of the git diff command after modifications were made was:
diff --git a/Hello01/Makefile b/Hello01/Makefile
index 4a1b512..8d6d12a 100644
--- a/Hello01/Makefile
+++ b/Hello01/Makefile
## -1,10 +1,10 ##
ARMGNU ?= arm-linux-gnueabihf
-COPS =
+COPS = -g -O0
ARCH = -mcpu=cortex-a9 -mfpu=vfpv3
gcc : hello01.bin
-all : gcc clang
+all : gcc
clean :
rm -f *.o
## -15,8 +15,6 ## clean :
rm -f *.img
rm -f *.bc
-clang: hello02.bin
-
startup.o : startup.s
$(ARMGNU)-as $(ARCH) startup.s -o startup.o
diff --git a/Hello01/hello01.c b/Hello01/hello01.c
index 20cb4a4..14ed2a0 100644
--- a/Hello01/hello01.c
+++ b/Hello01/hello01.c
## -10,16 +10,16 ##
*/
-#define UART1_BASE 0xe0001000
-#define UART1_TxRxFIFO0 ((unsigned int *) (UART1_BASE + 0x30))
+#define UART0_BASE 0xe0000000
+#define UART0_TxRxFIFO0 ((unsigned int *) (UART0_BASE + 0x30))
-volatile unsigned int * const TxRxUART1 = UART1_TxRxFIFO0;
+volatile unsigned int * const TxRxUART0 = UART0_TxRxFIFO0;
void print_uart1(const char *s)
{
while(*s != '\0')
{ /* Loop until end of string */
- *TxRxUART1 = (unsigned int)(*s); /* Transmit char */
+ *TxRxUART0 = (unsigned int)(*s); /* Transmit char */
s++; /* Next char */
}
}
## -28,4 +28,4 ## void c_entry()
{
print_uart1("\r\nHello world!");
while(1) ; /*dont exit the program*/
-}
\ No newline at end of file
+}
diff --git a/Hello05/Makefile b/Hello05/Makefile
index 9d3ca23..bc9bb61 100644
--- a/Hello05/Makefile
+++ b/Hello05/Makefile
## -1,5 +1,5 ##
ARMGNU ?= arm-linux-gnueabihf
-COPS =
+COPS = -g -O0
ARCH = -mcpu=cortex-a9 -mfpu=vfpv3
gcc : hello05.bin
diff --git a/Hello05/hello05.c b/Hello05/hello05.c
index 1b92dde..01ce7ee 100644
--- a/Hello05/hello05.c
+++ b/Hello05/hello05.c
## -26,7 +26,7 ##
void c_entry()
{
- init_uart1_RxTx_115200_8N1();
+ init_uart0_RxTx_115200_8N1();
printf("\nHello number %d\n",1);
while(1) ; /*dont exit the program*/
}
diff --git a/Hello05/xuartps.c b/Hello05/xuartps.c
index bdf7ad1..74f68bd 100644
--- a/Hello05/xuartps.c
+++ b/Hello05/xuartps.c
## -16,42 +16,42 ##
void putc(int *p ,char c);
/*
-* Initiate UART1 ( /dev/ttyACM0 on host computer )
+* Initiate UART0 ( /dev/ttyACM0 on host computer )
* 115,200 Baud 8-bit No-Parity 1-stop-bit
*/
-void init_uart1_RxTx_115200_8N1()
+void init_uart0_RxTx_115200_8N1()
{
/* Disable the transmitter and receiver before writing to the Baud Rate Generator */
- UART1->control_reg0=0;
+ UART0->control_reg0=0;
/* Set Baudrate to 115,200 Baud */
- UART1->baud_rate_divider =XUARTPS_BDIV_CD_115200;
- UART1->baud_rate_gen= XUARTPS_BRGR_CD_115200;
+ UART0->baud_rate_divider =XUARTPS_BDIV_CD_115200;
+ UART0->baud_rate_gen= XUARTPS_BRGR_CD_115200;
/*Set 8-bit NoParity 1-StopBit*/
- UART1->mode_reg0 = XUARTPS_MR_PAR_NONE;
+ UART0->mode_reg0 = XUARTPS_MR_PAR_NONE;
/*Enable Rx & Tx*/
- UART1->control_reg0= XUARTPS_CR_TXEN | XUARTPS_CR_RXEN | XUARTPS_CR_TXRES | XUARTPS_CR_RXRES ;
+ UART0->control_reg0= XUARTPS_CR_TXEN | XUARTPS_CR_RXEN | XUARTPS_CR_TXRES | XUARTPS_CR_RXRES ;
}
-void sendUART1char(char s)
+void sendUART0char(char s)
{
/*Make sure that the uart is ready for new char's before continuing*/
- while ((( UART1->channel_sts_reg0 ) & UART_STS_TXFULL) > 0) ;
+ while ((( UART0->channel_sts_reg0 ) & UART_STS_TXFULL) > 0) ;
/* Loop until end of string */
- UART1->tx_rx_fifo= (unsigned int) s; /* Transmit char */
+ UART0->tx_rx_fifo= (unsigned int) s; /* Transmit char */
}
/* "print.h" uses this function for is's printf implementation */
void putchar(char c)
{
if(c=='\n')
- sendUART1char('\r');
- sendUART1char(c);
+ sendUART0char('\r');
+ sendUART0char(c);
}
/* <stdio.h>'s printf uses puts to send chars
## -61,9 +61,9 ## int puts(const char *s)
while(*s != '\0')
{
if(*s=='\n')
- sendUART1char('\r');
+ sendUART0char('\r');
- sendUART1char(*s); /*Send char to the UART1*/
+ sendUART0char(*s); /*Send char to the UART0*/
s++; /* Next char */
}
return 0;
diff --git a/Hello05/xuartps.h b/Hello05/xuartps.h
index fc5008f..64e3b88 100644
--- a/Hello05/xuartps.h
+++ b/Hello05/xuartps.h
## -13,7 +13,7 ##
#define u32 unsigned int
#endif
-#define UART1_BASE 0xe0001000
+#define UART0_BASE 0xe0000000
// Register Description as found in
// B.33 UART Controller (UART) p.1626
struct XUARTPS{
## -34,7 +34,7 ## struct XUARTPS{
u32 Flow_delay_reg0; /* Flow Control Delay Register def=0*/
u32 Tx_FIFO_trigger_level;}; /* Transmitter FIFO Trigger Level Register */
-static struct XUARTPS *UART1=(struct XUARTPS*) UART1_BASE;
+static struct XUARTPS *UART0=(struct XUARTPS*) UART0_BASE;
/*
Page 496
## -87,11 +87,11 ## static struct XUARTPS *UART1=(struct XUARTPS*) UART1_BASE;
#define XUARTPS_MR_CLKS_REF_CLK 0 /* 0: clock source is uart_ref_clk*/
/*
-* Initiate UART1 ( /dev/ttyACM0 on host computer )
+* Initiate UART0 ( /dev/ttyACM0 on host computer )
* 115,200 Baud 8-bit No-Parity 1-stop-bit
*/
-void init_uart1_RxTx_115200_8N1();
-void sendUART1char(char s);
+void init_uart0_RxTx_115200_8N1();
+void sendUART0char(char s);
int puts(const char *s);
//void putc((void*), char);
The command executed from the ZedBoard-BareMetal-Examples/Hello05 directory for building the modified Hello05 example was:
make ARMGNU=/opt/arm/9/gcc-arm-9.2-2019.12-x86_64-arm-none-eabi/bin/arm-none-eabi clean all
This being said, the last comment from your previous post made me think that you may just want to be able to see the output of your program, but not necessarily by using UART0.
If this is the case, using the Angel/Semihosting interface would do the job - I understand you may have attempted to go this way.
Example:
// hello.c:
#include <stdlib.h>
int main(int argc, char** argv)
{
printf("Hello, World!\n");
return EXIT_SUCCESS;
}
gcc command:
/opt/arm/9/gcc-arm-9.2-2019.12-x86_64-arm-none-eabi/bin/arm-none-eabi-gcc -g -O0 --specs=rdimon.specs -o hello.elf hello.c
qemu command:
/opt/qemu-4.2.0/bin/qemu-system-arm -semihosting --semihosting-config enable=on,target=native -nographic -serial mon:stdio -machine xilinx-zynq-a9 -m 768M -cpu cortex-a9 -kernel hello.elf
Outcome:
Hello, World!
Using the semihosting interface would allow you to read/write files, read user input, and to use some of the xUnit testing frameworks available for either C or C++ - I have been for example successfully be using CppUnit with QEMU and the Semihosting interface. at several occasions.
I hope this help.

How to translate neon intrinsics to llvm-IR using llvm-clang on x86

Using clang we can generate IR with compile C program:
clang -S -emit-llvm hello.c -o hello.ll
I would like to translate neon intrinsic to llvm-IR, code like this:
/* neon_example.c - Neon intrinsics example program */
#include <stdint.h>
#include <stdio.h>
#include <assert.h>
#include <arm_neon.h>
/* fill array with increasing integers beginning with 0 */
void fill_array(int16_t *array, int size)
{ int i;
for (i = 0; i < size; i++)
{
array[i] = i;
}
}
/* return the sum of all elements in an array. This works by calculating 4 totals (one for each lane) and adding those at the end to get the final total */
int sum_array(int16_t *array, int size)
{
/* initialize the accumulator vector to zero */
int16x4_t acc = vdup_n_s16(0);
int32x2_t acc1;
int64x1_t acc2;
/* this implementation assumes the size of the array is a multiple of 4 */
assert((size % 4) == 0);
/* counting backwards gives better code */
for (; size != 0; size -= 4)
{
int16x4_t vec;
/* load 4 values in parallel from the array */
vec = vld1_s16(array);
/* increment the array pointer to the next element */
array += 4;
/* add the vector to the accumulator vector */
acc = vadd_s16(acc, vec);
}
/* calculate the total */
acc1 = vpaddl_s16(acc);
acc2 = vpaddl_s32(acc1);
/* return the total as an integer */
return (int)vget_lane_s64(acc2, 0);
}
/* main function */
int main()
{
int16_t my_array[100];
fill_array(my_array, 100);
printf("Sum was %d\n", sum_array(my_array, 100));
return 0;
}
But It doesn't support neon intrinsic, and print error messages like this:
/home/user/llvm-proj/build/bin/../lib/clang/4.0.0/include/arm_neon.h:65:24: error:
'neon_vector_type' attribute is not supported for this target
typedef __attribute__((neon_vector_type(8))) float16_t float16x8_t;
^
I think the reason is my host is on x86, but target is on ARM.
And I have no idea how to Cross-compilation using Clang to translate to llvm-IR(clang version is 4.0 on ubuntu 14.04).
Is there any target option commands or other tools helpful?
And any difference between SSE and neon llvm-IR?
Using ELLCC, a pre-packaged clang based tool chain (http://ellcc.org), I was able to compile and run your program by adding -mfpu=neon:
rich#dev:~$ ~/ellcc/bin/ecc -target arm32v7-linux -mfpu=neon neon.c
rich#dev:~$ ./a.
a.exe a.out
rich#dev:~$ file a.out
a.out: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, BuildID[sha1]=613c22f6bbc277a8d577dab7bb27cd64443eb390, not stripped
rich#dev:~$ ./a.out
Sum was 4950
rich#dev:~$
It was compiled on an x86 and I ran it using QEMU.
Using normal clang, you'll also need the appropriate -target option for ARM. ELLCC uses slightly different -target options.

Xenomai rtdm_clock_read measure

I'm really a newbie when it comes to Xenomai and I want to measure the times between two points.
I want first to send a pulse of 10µs.
After that, I wait till I have an interrupt.
I want to measure the time between the pulse and the interrupt.
I use the 'rtmd_clock_read()' function. So the type that it return is 'nanosecs_abs_t '. When I use this I can't lod the module anymore and get this when I do a 'make'.
WARNING: "__aeabi_uldivmod" [...] undefined!
If I want to run it, with 'insmod', it says this:
Unknow symbol in module
This is my Makefile
EXTRA_CFLAGS := -I /usr/xenomai/include/
ifneq (${KERNELRELEASE},)
obj-m += oef1.o
else
ARCH ?= arm
CROSS_COMPILE ?= /usr/local/cross/rpi/bin/arm-linux-
KERNEL_DIR = /usr/src/linux
MODULE_DIR := $(shell pwd)
CFLAGS := -Wall -g
.PHONY: all
all:: modules
.PHONY: modules
modules:
${MAKE} -C ${KERNEL_DIR} SUBDIRS=${MODULE_DIR} modules
XENOCONFIG=/usr/xenomai/bin/xeno-config
.PHONY: clean
clean::
rm -f *.o .*.o .*.o.* *.ko .*.ko *.mod.* .*.mod.* .*.cmd *~
rm -f Module.symvers Module.markers modules.order
rm -rf .tmp_versions
endif
run: oef1.ko
insmod oef1.ko
stop:
rmmod oef1
this is my .c file.
#include <linux/cdev.h>
#include <linux/device.h>
#include <linux/fs.h>
#include <linux/gpio.h>
#include <linux/module.h>
#include <linux/sched.h>
#include <linux/version.h>
#include <linux/interrupt.h>
#include <asm/uaccess.h>
#include <rtdm/rtdm_driver.h>
//define pins for the ultrasonic sensor
#define TRIGGER 4 //GPIO_0, pin 3 on the pi
#define ECHO 15 //GPIO_1, pin 5 on the pi
//Define the outputs for the leds
///blablabla
//define the time the trigger needs to be high
#define TRIGGER_PERIOD 10000//10us
#define SLEEP_TASK_TRIGGER 1000000000 //1s
//task for sending trigger pulse
static rtdm_task_t trigger_task;
//needed i n multiple operactions
static rtdm_mutex_t periode_mutex;
//timer to get the distance of the sensor
int end = 0;
nanosecs_abs_t time1=0;
nanosecs_abs_t time2=0;
int centimeter=0;
nanosecs_abs_t difTime=0;
//for GPIO interrupt
static rtdm_irq_t irq_rtdm;
static int numero_interruption;
static void triggertask(void * p)
{
printk("first time in trigger");
int value=0;
while(!end)
{
printk("trigger \n");
gpio_set_value(TRIGGER,1);
rtdm_task_sleep(TRIGGER_PERIOD);
gpio_set_value(TRIGGER,0);
rtdm_task_wait_period();
rtdm_mutex_lock(&periode_mutex);
time1 = rtdm_clock_read();
rtdm_mutex_unlock(&periode_mutex);
//printk("tijd1 %d\n",time1);
rtdm_task_sleep(SLEEP_TASK_TRIGGER);
}
}
static int handler_interruption(rtdm_irq_t * irq)
{
printk("irq\n");
//stop timer, get difference
rtdm_mutex_lock(&periode_mutex);
time2 = rtdm_clock_read();
//printk("tijd2 %d\n",time2);
difTime = time2-time1;
centimeter = (difTime/1000)/56;
rtdm_mutex_unlock(&periode_mutex);
printk("centimer: ");
printk("%d",centimeter);
return RTDM_IRQ_HANDLED;
}
static int __init init_sensor(void)
{
int err;
rtdm_printk("initsensor");
//error handling nog toevoegen
numero_interruption = gpio_to_irq(ECHO);
if((err = gpio_request(ECHO, THIS_MODULE->name))!=0)
{
return err;
}
if((err = gpio_direction_input(ECHO)) !=0)
{
gpio_free(ECHO);
return err;
}
irq_set_irq_type(numero_interruption, IRQF_TRIGGER_RISING);
if((err=rtdm_irq_request(& irq_rtdm, numero_interruption,handler_interruption,RTDM_IRQTYPE_EDGE,THIS_MODULE->name,NULL))!= 0)
{
gpio_free(ECHO);
gpio_free(TRIGGER);
return err;
}
rtdm_irq_enable(& irq_rtdm);
rtdm_mutex_init(&periode_mutex);
if((err = gpio_request(TRIGGER,THIS_MODULE->name))!=0)
{
return err;
}
if((err = gpio_direction_output(TRIGGER,1))!=0)
{
gpio_free(TRIGGER);
return err;
}
err = rtdm_task_init(&trigger_task,"send_trigger_task",triggertask,NULL,99,0);
if(err !=0)
{
gpio_free(TRIGGER);
}
return err;
}
static void __exit exit_sensor(void)
{
//stop
end = 1;
rtdm_task_join_nrt(&trigger_task,100);
rtdm_irq_disable(& irq_rtdm);
rtdm_irq_free(& irq_rtdm);
gpio_free(TRIGGER);
gpio_free(ECHO);
}
module_init(init_sensor);
module_exit(exit_sensor);
MODULE_LICENSE("GPL");
thank you guys!
Symbol __aeabi_uldivmod is required for modulo operation on 64-bit integers. Call for this function is automatically generated by gcc when it encounters operations of such sort. While user-space library (libgcc) implements this function, kernel doesn't implement it automatically for all architectures.
When divisor is predefined constant, division/module operation may be replaced with multiplication on magic number and shift. Here you may calculate magic number for your case.

What is the time complexity of crypt function in Linux?

The crypt function described as below in unix for authentication:
char *crypt(const char *key, const char *salt);
Assume that I have the key (of length n), and the salt (of length m), what is the time complexity (order of algorithm) of calling this function?
From the man page of crypt:
salt is a two-character string chosen from the set [a-zA-Z0-9./]. This string is used to perturb the algorithm in one of 4096 different ways.
and
By taking the lowest 7 bits of each of the first eight characters of the key, a 56-bit key is obtained.
The thusly obtained key is then used to encrypt a constant string (using a tweaked DES algorithm), which takes constant time. Therefore, the function has constant run-time for any valid arguments. Note that this truncating leads to very weak passwords.
As commented by melpomene, some implementations provide an extension to the crypt function that allow selecting a more secure mode. For the following, I will assume you are using the crypt function from the GNU C library. The manual says:
For the MD5-based algorithm, the salt should consist of the string $1$, followed by up to 8 characters, terminated by either another $ or the end of the string. The result of crypt will be the salt, followed by a $ if the salt didn't end with one, followed by 22 characters from the alphabet ./0-9A-Za-z, up to 34 characters total. Every character in the key is significant.
Since the length of the salt is fixed by a constant and the cryptographic hash function has time complexity linear in the length of the input, the overall time complexity of the crypt function will be linear in key.
My version of glibc also supports the more secure SHA-256 (selected via $5$) and SHA-512 (selected via $6$) cryptographic hash functions in addition to MD5. These have linear time complexity in the length of their input, too.
Since I cannot make sense out of the task I'm actually supposed to do right now, I have timed the various crypt methods to support the above analysis. Here are the results.
Plotted are the execution times spent in the crypt function against the length of the key string. Each data series is overlayed with a linear regression except for DES where the average value is plotted instead. I am surprised that SHA-512 is actually faster than SHA-256.
The code used for the benchmarks is here (benchmark.c).
#define _GNU_SOURCE /* crypt */
#include <errno.h> /* errno, strerror */
#include <stdio.h> /* FILE, fopen, fclose, fprintf */
#include <stdlib.h> /* EXIT_{SUCCESS,FAILURE}, malloc, free, [s]rand */
#include <string.h> /* size_t, strlen */
#include <assert.h> /* assert */
#include <time.h> /* CLOCKS_PER_SEC, clock_t, clock */
#include <unistd.h> /* crypt */
/* Barrier to stop the compiler from re-ordering instructions. */
#define COMPILER_BARRIER asm volatile("" ::: "memory")
/* First character in the printable ASCII range. */
static const char ascii_first = ' ';
/* Last character in the printable ASCII range. */
static const char ascii_last = '~';
/*
Benchmark the time it takes to crypt(3) a key of length *keylen* with salt
*salt*. The result is written to the stream *ostr* so its computation cannot
be optimized away.
*/
static clock_t
measure_crypt(const size_t keylen, const char *const salt, FILE *const ostr)
{
char * key;
const char * passwd;
clock_t t1;
clock_t t2;
size_t i;
key = malloc(keylen + 1);
if (key == NULL)
return ((clock_t) -1);
/*
Generate a random key. The randomness is extremely poor; never do this in
cryptographic applications!
*/
for (i = 0; i < keylen; ++i)
key[i] = ascii_first + rand() % (ascii_last - ascii_first);
key[keylen] = '\0';
assert(strlen(key) == keylen);
COMPILER_BARRIER;
t1 = clock();
COMPILER_BARRIER;
passwd = crypt(key, salt);
COMPILER_BARRIER;
t2 = clock();
COMPILER_BARRIER;
fprintf(ostr, "%s\n", passwd);
free(key);
return t2 - t1;
}
/*
The program can be called with zero or one arguments. The argument, if
given, will be used as salt.
*/
int
main(const int argc, const char *const *const argv)
{
const size_t keymax = 2000;
const size_t keystep = 100;
const char * salt = ".."; /* default salt */
FILE * devnull = NULL; /* redirect noise to black hole */
int status = EXIT_SUCCESS;
size_t keylen;
if (argc > 1)
salt = argv[1];
devnull = fopen("/dev/null", "r");
if (devnull == NULL)
goto label_catch;
srand((unsigned) clock());
for (keylen = 0; keylen <= keymax; keylen += keystep)
{
clock_t ticks;
double millis;
ticks= measure_crypt(keylen, salt, devnull);
if (ticks < 0)
goto label_catch;
millis = 1.0E3 * ticks / CLOCKS_PER_SEC;
fprintf(stdout, "%16zu %e\n", keylen, millis);
}
goto label_finally;
label_catch:
status = EXIT_FAILURE;
fprintf(stderr, "error: %s\n", strerror(errno));
label_finally:
if (devnull != NULL)
fclose(devnull);
return status;
}
The Gnuplot script used for the regression and plotting is here (plot.gplt).
set terminal 'svg'
set output 'timings.svg'
set xrange [0 : *]
set yrange [0 : *]
set key top left
set title 'crypt(3) benchmarks'
set xlabel 'key length / bytes'
set ylabel 'computation time / milliseconds'
des(x) = a_des
md5(x) = a_md5 + b_md5 * x
sha256(x) = a_sha256 + b_sha256 * x
sha512(x) = a_sha512 + b_sha512 * x
fit des(x) 'timings.des' via a_des
fit md5(x) 'timings.md5' via a_md5, b_md5
fit sha256(x) 'timings.sha256' via a_sha256, b_sha256
fit sha512(x) 'timings.sha512' via a_sha512, b_sha512
plot des(x) w l notitle lc '#75507b' lt 1 lw 2.5, \
'timings.des' w p t 'DES' lc '#5c3566' pt 7 ps 0.8, \
md5(x) w l notitle lc '#cc0000' lt 1 lw 2.5, \
'timings.md5' w p t 'MD5' lc '#a40000' pt 7 ps 0.8, \
sha256(x) w l notitle lc '#73d216' lt 1 lw 2.5, \
'timings.sha256' w p t 'SHA-256' lc '#4e9a06' pt 7 ps 0.8, \
sha512(x) w l notitle lc '#3465a4' lt 1 lw 2.5, \
'timings.sha512' w p t 'SHA-512' lc '#204a87' pt 7 ps 0.8
Finally the Makefile used to hook everything together (GNUmakefile).
CC := gcc
CPPFLAGS :=
CFLAGS := -Wall -O2
LDFLAGS :=
LIBS := -lcrypt
all: benchmark timings.svg timings.png
benchmark: benchmark.o
${CC} -o $# ${CFLAGS} $^ ${LDFLAGS} ${LIBS}
benchmark.o: benchmark.c
${CC} -c ${CPPFLAGS} ${CFLAGS} $<
timings.svg: plot.gplt timings.des timings.md5 timings.sha256 timings.sha512
gnuplot $<
timings.png: timings.svg
convert $< $#
timings.des: benchmark
./$< '$(shell pwgen -ncs 2)' > $#
timings.md5: benchmark
./$< '$$1$$$(shell pwgen -ncs 8)' > $#
timings.sha256: benchmark
./$< '$$5$$$(shell pwgen -ncs 16)' > $#
timings.sha512: benchmark
./$< '$$6$$$(shell pwgen -ncs 16)' > $#
clean:
rm -f benchmark benchmark.o fit.log $(wildcard *.o timings.*)
.PHONY: all clean

Resources