enable/disable cache on intel 64bit machine: CD bit always set? - c

I'm trying to disable all level of cache for my machine Intel(R) Xeon(R) CPU E5-1650 v2 # 3.50GHz in Xen. I wrote a tool to call the following assemble code to disable/enable the cache and show the CR0 register's value.
case XENMEM_disable_cache:
__asm__ __volatile__(
"pushq %%rax\n\t"
"movq %%cr0,%%rax\n\t"
"orq $0x40000000,%%rax\n\t"
"movq %%rax,%%cr0\n\t"
"movq %%cr0, %0\n\t"
"wbinvd\n\t"
"popq %%rax"
: "=r"(cr0)
:
:);
// gdprintk(XENLOG_WARNING, "gdprintk:XENMEM_disable_cache disable cache!
// TODO IMPLEMENT\n");
printk("<1>printk: disable cache! cr0=%#018lx\n", cr0);
rc = 0;
break;
case XENMEM_enable_cache:
__asm__ __volatile__(
"pushq %%rax\n\t"
"movq %%cr0,%%rax\n\t"
"andq $0xffffffffbfffffff,%%rax\n\t" /*~0x4000000*/
"movq %%rax,%%cr0\n\t"
"movq %%cr0, %0\n\t"
"popq %%rax"
: "=r"(cr0)
:
:);
printk("<1>printk: enable cache; cr0=%#018lx\n", cr0);
rc = 0;
break;
case XENMEM_show_cache:
__asm__ __volatile__(
"pushq %%rax\n\t"
"movq %%cr0, %%rax\n\t"
"movq %%rax, %0\n\t"
"popq %%rax"
: "=r"(cr0)
:
:);
// gdprintk(XENLOG_WARNING, "gdprintk:XENMEM_show_cache_status! CR0 value is
// %#018lx\n", cr0);
printk("<1>printk: XENMEM_show_cache_status! CR0 value is %#018lx\n", cr0);
return (long)cr0;
The code can compile and run. After I run the disable cache code, the system becomes extremely slow, which confirms the cache is disabled. In addition, the value of CR0 shows the CD bit is set when I run the disable cache code.
However, when I run the show cache code, the output shows the CD bit of CR0 is 0, no matter I disable/enable cache.
My question is:
Is the CD bit(30bit) of CR0 register always set 1 when cache is disabled?
If not, there must be something wrong with my code, could you please help me point out the error I made?
ANSWER:
The above code only set the CD bit of the CR0 register on the core where the code is running. We need to use the smp_call_function() to call the code on all cores!
My new question is:
If I disable cache and then enable cache using the above code, the CD bit of CR0 is cleared. However, the system's performance is still very very slow, just like when I disable the cache. So it seems to me that enabling the cache code does NOT work? However, since CD bit has been cleared, the enabling cache code should have worked! So the question is: How long should I wait after I enable cache so that I can have the same performance just like the performance before I disable cache?
BTW, when I run the enble cache code, the printk output shows that the CR0's CD bit is 0.

If you're on an SMP system, you should invoke the disable-cache code for every core with smp_call_function(), since it is theoretically possible that your show-cache code is running on a different processor. To use that function, #include <include/linux/smp.h>.
EDIT: smp_call_function() invokes the function pointer it is given only on other cores, not on the current one. Make sure to run the function on all cores by invoking the function yourself on the core that invokes smp_call_function().

Related

Is it possible to use ARM aarch64 compiler to build with inline armv7l assemble code

this is a cpp file with inline ARMv7l asm code like this
"pld [%1, #96] \n"
"vand q8, %q10, %q10 \n"
"vand q9, %q11, %q11 \n"
"vand q10, %q12, %q12 \n"
"vand q11, %q13, %q13 \n"
"vld1.f32 {d0-d1}, [%1]! \n"
"vld1.f32 {d2-d3}, [%2]! \n"
"vld1.f32 {d4-d5}, [%3]! \n"
"vld1.f32 {d6-d7}, [%4]! \n"
"vmul.f32 q12, q0, q9 \n"
"vmla.f32 q12, q1, q8 \n"
"vmul.f32 q13, q2, q9 \n"
"vmla.f32 q13, q3, q8 \n"
"vld1.f32 {d0-d1}, [%1]! \n"
"vld1.f32 {d2-d3}, [%2]! \n"
"vld1.f32 {d4-d5}, [%3]! \n"
"vld1.f32 {d6-d7}, [%4]! \n"
"vmul.f32 q12, q12, q11 \n"
"vmla.f32 q12, q13, q10 \n"
"vst1.f32 {d24-d25}, [%0]! \n"
now i want to compile this cpp as ArmV8a,will got error like this
18s] {standard input}:2158: Error: unknown mnemonic `pld' -- `pld [x9,#96]'
[ 18s] {standard input}:2159: Error: unknown mnemonic `vand' -- `vand q8,q16,q16'
[ 18s] {standard input}:2160: Error: unknown mnemonic `vand' -- `vand q9,q17,q17'
[ 18s] {standard input}:2161: Error: unknown mnemonic `vand' -- `vand q10,q18,q18'
[ 18s] {standard input}:2162: Error: unknown mnemonic `vand' -- `vand q11,q19,q19'
[ 18s] {standard input}:2163: Error: unknown mnemonic `vld1.f32' -- `vld1.f32 {d0-d1},[x9]!'
After checking the Armv8a is compatible to with Armv7l IA32 instructions,
My Question is
is it possible to use armv8a compiler to compile this code ?
if it is not possible , it is need to use armv8a instruction to rewrite this inline asm code ?
Unlike with x86 where one GCC binary supports gcc -m32 / -m64 to build a 32-bit executable with the same compiler that can make a 64-bit executable, you need separate cross-compilers for ARM vs. AArch64.
(If you have clang, clang -target arm -c can at least compile an ARM object file, even with x86 clang: most clang builds support multiple back-end targets besides the one configured as the default.)
But if you're asking about using ARM32 assembly when compiling for AArch64, definitely not; you'll need to translate it yourself. Just like if you had a separate .S file. (GNU C inline asm literally works by emitting text into the .s file that GCC feeds to GAS, just expanding the %operand parts into text determined by GCC's choice for the operand constraint.)
It's usually fairly easy to port SIMD code from ARM32 to AArch64, but as far as getting the tools to accept it directly, you might as well be trying to feed them x86 or MIPS assembly.
If you can get the compiler to make decent asm from C intrinsics, do that because it will let the same code compile for both mode.
But that's unfortunately not always the case for ARM. (Unlike x86 and PowerPC, where compilers generally do a good job with intrinsics, they can do pretty bad with ARM, especially if you need any horizontal stuff or mixing and matching 64-bit halves of 128-bit vectors.)

How do i run assembly (.s) files on VSCode in Linux subsystem

I have just got into ARM programming. Learned a few basics but am having issues with running code. I use VSCode on Linux Subsystem as my IDE.
I have nothing installed on my computer and i would like to run ARM code. I have read online something about "qemu" and "kernel" and stuff, but am not sure what they mean. It would be great if someone provides a detailed walkthrough for such a problem. I do not have a raspberry pi.
For example, how do i run the following division.s file on VSCode?
.global _start
_start:
MOV R1, #X
MOV R2, #Y
MOV R3, #Z
CMP R1, R2 # is x>y ?
BGT _tryx
CMP R2, R3 # is y>z ?
BGT _isy
MOV R4, R3
B _exit
_isy:
MOV R4, R2
B _exit
_tryx:
CMP R1, R3 # is x>z ?
BGT _isx
MOV R4, R3
B _exit
_isx:
MOV R4, R1
_exit:
MOV R0, R4
MOV R7, #1
SWI 0
.data
.equ X, 3
.equ Y, 5
.equ Z, 4
Are there any extensions i need to install? Is there anything i need to download? I have used gcc to compile C code. Can it be used here too?
Thx in advance! :D
Your question is rather a broad one. This being said, a slightly modified version of your program can be executed in WSL using the following procedure:
sudo apt-get install qemu-user
sudo mkdir -p /opt/arm/10
wget 'https://developer.arm.com/-/media/Files/downloads/gnu-a/10.2-2020.11/binrel/gcc-arm-10.2-2020.11-x86_64-arm-none-linux-gnueabihf.tar.xz?revision=d0b90559-3960-4e4b-9297-7ddbc3e52783&la=en&hash=985078B758BC782BC338DB947347107FBCF8EF6B' -O gcc-arm-10.2-2020.11-x86_64-arm-none-linux-gnueabihf.tar.xz
sudo tar Jxf gcc-arm-10.2-2020.11-x86_64-arm-none-linux-gnueabihf.tar.xz -C /opt/arm/10
/tmp/division.s:
# count how often we can take Y from X
.global main
main:
MOV R1, #X
MOV R2, #Y
MOV R3, #0 # Q
_loop:
CMP R1, R2
BLT _exit
SUB R1, R2
ADD R3, #1
B _loop
_exit:
MOV R0, R3
MOV R7, #1
SWI 0
.data
.equ X, 23
.equ Y, 4
Compiling:
/opt/arm/10/gcc-arm-10.2-2020.11-x86_64-arm-none-linux-gnueabihf/bin/arm-none-linux-gnueabihf-gcc -static -o /tmp/division /tmp/division.s
Executing - WSL:
qemu-arm /tmp/division
echo $?
5
Which is the expected result, since 23 div 4 is 5.
Executing - Windows 10:
C:\>c:\Windows\System32\bash -c "qemu-arm /tmp/division; echo $?"
5
C:\>
Or:
C:\>c:\Windows\System32\bash -c "qemu-arm /tmp/division"
C:\>echo %ERRORLEVEL%
5
Note that division.s may have been compiled in Windows 10 as well by downloading/installing gcc-arm-10.2-2020.11-mingw-w64-i686-arm-none-linux-gnueabihf.tar.xz instead of gcc-arm-10.2-2020.11-x86_64-arm-none-linux-gnueabihf.tar.xz - your choice.
I let it to you than to go into the details of using the information above for running your program from vscode, you question being a bit too broad IMHO.
Update: division.s was compiled statically on purpose for avoiding having to specify the locations for any non-needed dynamic libraries.
Compiling it without using the -static option, and executing it would result in the following error message to be displayed:
qemu-arm division
/lib/ld-linux-armhf.so.3: No such file or directory
It can be avoided by telling qemu-arm where to look for required dynamic libraries, /lib/ld-linux-armhf.so.3 in this case:
qemu-arm -L /opt/arm/10/gcc-arm-10.2-2020.11-x86_64-arm-none-linux-gnueabihf/arm-none-linux-gnueabihf/libc division
echo $?
5
That looks like a Linux program. And I guess you have an 64-bit x86 computer.
You want QEMU, specifically the executable qemu-arm which emulates an ARM processor running Linux on another Linux compatible system. In Debian derived Linux distributions it is in the package qemu-arm.
And on ebian derived Linux distributions for a compiler you can install the gcc-arm-linux-gnueabihf package for compiling for ARMv7 and ARMv8 with hardware floating point (Debian port name armhf), or gcc-arm-linux-gnueabi for some older ARMs with no hardware floating point Debian port name armel).
If you wish to install libraries, you need add the architecture to dpkg's list, for example:
dpkg --add-architecture armhf

Just bought STM32F446 but the STM32IDE is not doing what I expect

while (1)
{
/* USER CODE END WHILE */
/* USER CODE BEGIN 3 */
}
I Installed STMCubeMX and built a project for STM32IDE
In the IDE I successfully built the project using: project -> build all
I'm expecting to see the default led to stop blinking given my while loop is completely blank, but it's still blinking like crazy.
Try this, quite minimal, will test your tools and ability to copy the file to the board. I assume this is a NUCLEO board.
.cpu cortex-m0
.thumb
.thumb_func
.global _start
_start:
stacktop: .word 0x20000100
.word reset
.word reset
.word reset
.word reset
.thumb_func
reset:
b .
build
arm-none-eabi-as flash.s -o flash.o
arm-none-eabi-ld -Ttext=0x08000000 flash.o -o flash.elf
arm-none-eabi-objdump -D flash.elf > flash.list
arm-none-eabi-objcopy -O binary flash.elf flash.bin
check build
cat flash.list
Disassembly of section .text:
08000000 <_start>:
8000000: 20000100
8000004: 08000015
8000008: 08000015
800000c: 08000015
8000010: 08000015
08000014 <reset>:
8000014: e7fe b.n 8000014 <reset>
Looks good exactly what we want.
Now copy command line or drag and drop flash.bin to the virtual drive that is mounted when you plug in the NUCLEO board. It will load this into the target mcu on the board. The STM32F446, and should reset it and you will end up in this loop, no blinking user led.
As you make more complicated projects you simply expand on this a bootstrap, a program, linked (,checked), and copied to the virtual flash drive.
I normally do a firmware upgrade of the debug end (stlink plus thumb drive thing) when I get a new NUCLEO board. ST provides a Java-based tool that itself does not update that often, but it checks the board vs I guess a database at their site and can upgrade your flash. Depending on your host OS I have had NUCLEOs that you can only copy to a few times or one time and you have to unplug and replug, newer firmware versions for those and more recent boards that problem has pretty much gone away. It would say no room left on the device, unplug/replug and it would be fine.
Also providing stlink capabilities you can use openocd or other tools to stop and examine the device, with openocd for example when you telnet in to the openocd server you can use mdw 0x08000000 20 and examine the start of the user flash to see if it matches the binary you think you have loaded on the device. Depending on the part you can also write/erase that flash via openocd, but do not assume that support is always there for all ST or other branded parts. Write to ram and run from there (different startup not a vector table) sure, but flashing requires someone to add that support for each part or family into openocd.
As pointed out in the comments either you are not building what you think or you are not actually loading the program into the flash. Examine the binary examine the flash to see what actually happened if anything. The above should avoid all the barriers to success, CMSIS, IDE tools, STMCubeMX, etc. Once you are able to succeed then work your way to the middle from both ends (between this trivial example and your project) and find where the real problem is/was. You should be able to for example use the IDE and all that stuff to build the binary, use the gnu tools to examine that binary, hexdump/whatever to examine the .bin file, and then drag and drop outside the IDE to program.

C Inline Asm Int 0x10

I'm attempting to write a function that prints strings the screen in C. It's for a boot loader so there are no external libraries or anything linked in. Here's my function:
void printString(const char* pStr) {
while(*pStr) {
__asm__ __volatile__ (
"movb 0x0e, %%ah\n"
"movb %[c], %%al\n"
"int $0x10\n"
:
: [c] "r" (*pStr)
: "ax"
);
++pStr;
}
}
When I run this, I don't get any errors in my VM. It just sits there with the cursor in the upper left corner of the screen. Any thoughts? I can produce an objdump -d if anyone thinks it will be helpful.
Okay after some helpful comments, I may just go with full assembly. Something like
Print:
push %ax
movb $0x0E, %ah # Set interrupt code
movb $0x00, %bh # Set page #
.loop:
lodsb # Load next char
test %al, %al # Check for \0
je .done
int $0x10 # Call interrupt
jmp .loop
.done:
pop %ax
ret
That should be 16-bit real mode compatible and can be assembled with GAS, which, as I understand it, works better than GCC for compiling 16-bit programs.
I think you're missing the point. The problem isn't your assembly code; the problem is that "int 10" is a BIOS
If you've already booted to an OS (e.g. Windows or Linux), then your x86 CPU is running in "protected mode"; and you probably don't have access to int 10 from user space ... unless something like a Windows command prompt emulates it for you.
As far as Linux/assembly programming in general, I strongly recommend this (free, on-line, very good) book:
Programming from the Ground Up, Jonathan Bartlett
Thank you for clarifying that you're writing a "boot loader". Strong suggestion1: boot your custom code from a USB stick, or create a virtual DOS floppy to boot a DOS VM (either VMWare or VBox VMs, for example).
Here are some tutorials:
http://www.codeproject.com/Articles/36907/How-to-develop-your-own-Boot-Loader
https://cs.au.dk/~sortie/dopsys/osdev/
http://wiki.osdev.org/Rolling_Your_Own_Bootloader

How to write inline assembly in FreeDOS

I'm trying to write the following program to dump the interrupt vector table using FreeDOS in a virtual machine. I know that DEBUG will allow me to write an assembly program, but how do I create the following IVTDUMP.COM file, save it and run it?
Note: I'd like to try and do it all straight from FreeDOS if possible. I tried using the EDIT command but it errors out and I'm pretty sure I'm missing something.
for
(
address=IDT_255_ADDR;
address>=IDT_001_ADDR;
address=address-IDT_VECTOR_SZ,vector--
)
{
printf("%03d %08p ",vector,address);
__asm
{
PUSH ES
MOV AX,0
MOV ES,AX
MOV BX,address
MOV AX,ES:[BX]
MOV ipAddr,AX
INC BX
INC BX
MOV AX,ES:[BX]
MOV csAddr,AX
POP ES
};
printf("[CS:IP]=[%04X,%04X]\n",csAddr,ipAddr);
}
Things like for, address and printf are not part of assembly. You will have to rewrite that to actual assembly code or copy the macros and assembler you want to use to your freedos environment.
If you want to use debug as included in freedos you can use the a command to start writing assembly instructions, the n command to give a name and w to write the code to the file.
C:\debug
-a
06BC:0100 int 20
06BC:0102
-n ivtdump.com
-rcx 2
-w
Writing 0002 bytes.
-q
C:\
This sample program only exits the program through int 20. (The 2 after rcx indicates the length of the program to write to disk)

Resources