Get EDID info in C (UEFI): read the ES:DI register? - c

I am Developing an OS, I wants to get EDID from monitor, I am found some asm code (https://wiki.osdev.org/EDID) to get edid in ES:DI registers,
mov ax, 0x4f15
mov bl, 0x01
xor cx, cx
xor dx, dx
int 0x10
;AL = 0x4F if function supported
;AH = status (0 is success, 1 is fail)
;ES:DI contains the EDID
How can I get AL, AH, and ES:DI values in C File?
Actually I am developing an 64 bit UEFI OS
LoadGDT:
lgdt [rdi]
mov ax, 0x10
mov ds, ax
mov es, ax
mov fs, ax
mov gs, ax
mov ss, ax
pop rdi
mov rax, 0x08
push rax
push rdi
retfq
GLOBAL LoadGDT
I am able to run these above asm code and get it in c using Global Functions in C,

That page on osdev.org contains code intended to be run when the CPU is in 16-bit real mode.
You can tell not only from the registers involved but also from the fact that int 10h is used.
This is a well-known BIOS interrupt service that is written in 16-bit real-mode code.
If you target UEFI, then your bootloader is actually an UEFI application, which is a PE32(+) image.
If the CPU is 64-bit capable, the firmware will switch into long mode (64-bit mode) and load your bootloader.
Otherwise, it will switch into protected mode (32-bit mode).
In any case, real mode is never used in UEFI.
You can call 16-bit code from protected/long mode with the use of a 16-bit code segment in the GDT/LDT but you cannot call real-mode code (i.e. code written to work with the real-mode segmentation) because segmentation works completely different between the modes.
Plus, in real mode the interrupts are dispatched through the IVT and not the IDT, you would need to get the original entry-point for interrupt 10h.
UEFI protocol EFI_EDID_DISCOVERED_PROTOCOL
Luckily, UEFI has a replacement for most basic services offered by the legacy BIOS interface.
In this case, you can use the EFI_EDID_DISCOVERED_PROTOCOL and eventually apply any override from the platform firmware with the use of EFI_EDID_OVERRIDE_PROTOCOL.
The EFI_EDID_DISCOVERED_PROTOCOL is straightforward to use, it's just a (Size, Data) pair.
typedef struct _EFI_EDID_DISCOVERED_PROTOCOL {
UINT32 SizeOfEdid;
UINT8 *Edid;
} EFI_EDID_DISCOVERED_PROTOCOL;
(from gnu-efi)
The format of the buffer Edid can be found in the VESA specification or even on Wikipedia.
As an example, I wrote a simple UEFI application with gnu-efi and x64_64-w64-mingw32 (a version of GCC and tools that target PEs).
I avoided using uefilib.h in order to use gnu-efi just for the definition of the structures related to EUFI.
The code sucks, it assumes at most 10 handles support the EDID protocol and I wrote only a partial structure for the EDID data (because I got bored).
But this should be enough the get the idea.
NOTE That my VM didn't return any EDID information, so the code is not completely tested!
#include <efi.h>
//You are better off using this lib
//#include <efilib.h>
EFI_GUID gEfiEdidDiscoveredProtocolGuid = EFI_EDID_DISCOVERED_PROTOCOL_GUID;
EFI_SYSTEM_TABLE* gST = NULL;
typedef struct _EDID14 {
UINT8 Signature[8];
UINT16 ManufacturerID;
UINT16 ManufacturerCode;
UINT32 Serial;
UINT8 Week;
UINT8 Year;
UINT8 Major;
UINT8 Minor;
UINT32 InputParams;
UINT8 HSize;
UINT8 VSize;
UINT8 Gamma;
//...Omitted...
} EDID14_RAW;
VOID Print(CHAR16* string)
{
gST->ConOut->OutputString(gST->ConOut, string);
}
VOID PrintHex(UINT64 number)
{
CHAR16* digits = L"0123456789abcdef";
CHAR16 buffer[2] = {0, 0};
for (INTN i = 64-4; i >= 0; i-=4)
{
buffer[0] = digits[(number >> i) & 0xf];
Print(buffer);
}
}
VOID PrintDec(UINT64 number)
{
CHAR16 buffer[21] = {0};
UINTN i = 19;
do
{
buffer[i--] = L'0' + (number % 10);
number = number / 10;
}
while (number && i >= 0);
Print(buffer + i + 1);
}
#define MANUFACTURER_DECODE_LETTER(x) ( L'A' + ( (x) & 0x1f ) - 1 )
EFI_STATUS efi_main(EFI_HANDLE ImageHandle, EFI_SYSTEM_TABLE* SystemTable)
{
EFI_STATUS Status = EFI_SUCCESS;
EFI_HANDLE EDIDHandles[10];
UINTN Size = sizeof(EFI_HANDLE) * 10;
EFI_EDID_DISCOVERED_PROTOCOL* EDID;
gST = SystemTable;
if ( EFI_ERROR( (Status = SystemTable->BootServices->LocateHandle(ByProtocol, &gEfiEdidDiscoveredProtocolGuid, NULL, &Size, EDIDHandles)) ) )
{
Print(L"Failed to get EDID handles: "); PrintHex(Status); Print(L"\r\n");
return Status;
}
for (INTN i = 0; i < Size/sizeof(EFI_HANDLE); i++)
{
if (EFI_ERROR( (SystemTable->BootServices->OpenProtocol(
EDIDHandles[i], &gEfiEdidDiscoveredProtocolGuid, (VOID**)&EDID, ImageHandle, NULL, EFI_OPEN_PROTOCOL_GET_PROTOCOL)) ) )
{
Print(L"Failed to get EDID info for handle "); PrintDec(i); Print(L": "); PrintHex(Status); Print(L"\r\n");
return Status;
}
if (EDID->SizeOfEdid == 0 || EDID->Edid == NULL)
{
Print(L"No EDID data for handle "); PrintDec(i); Print(L"\r\n");
continue;
}
/*
THIS CODE IS NOT TESTED!
! ! ! D O N O T U S E ! ! !
*/
EDID14_RAW* EdidData = (EDID14_RAW*)EDID->Edid;
CHAR16 Manufacturer[4] = {0};
Manufacturer[0] = MANUFACTURER_DECODE_LETTER(EdidData->ManufacturerID >> 10);
Manufacturer[1] = MANUFACTURER_DECODE_LETTER(EdidData->ManufacturerID >> 5);
Manufacturer[2] = MANUFACTURER_DECODE_LETTER(EdidData->ManufacturerID);
Print(L"Manufacturer ID: "); Print(Manufacturer); Print(L"\r\n");
Print(L"Resolution: "); PrintDec(EdidData->HSize); Print(L"X"); PrintDec(EdidData->VSize); Print(L"\r\n");
}
return Status;
}
ACPI
If you don't want to use these UEFI protocols you can use ACPI. Each display output device has a _DDC method that is documented in the ACPI specification and can be used to return the EDID data (either as a buffer of 128 or 256 bytes).
This method is conceptually simple but in practice it requires writing a full-blown ACPI parser (including the AML VM) which is a lot of work.
However, ACPI is necessary for modern OSes and so you can use it, later on, to get the EDID data without having to worry about UEFI protocols.

Related

INT 13 Extension Read in C

i can use extended read functions of bios int 13h well from assembly,
with the below code
; *************************************************************************
; Setup DISK ADDRESS PACKET
; *************************************************************************
jmp strtRead
DAPACK :
db 010h ; Packet Size
db 0 ; Always 0
blkcnt:
dw 1 ; Sectors Count
db_add :
dw 07e00h ; Transfer Offset
dw 0 ; Transfer Segment
d_lba :
dd 1 ; Starting LBA(0 - n)
dd 0 ; Bios 48 bit LBA
; *************************************************************************
; Start Reading Sectors using INT13 Func 42
; *************************************************************************
strtRead:
mov si, OFFSET DAPACK; Load DPACK offset to SI
mov ah, 042h ; Function 42h
mov dl, 080h ; Drive ID
int 013h; Call INT13h
i want to convert this to be a c callable function but i have no idea about how to transfer the parameters from c to asm like drive id , sectors count, buffer segment:offset .... etc.
i am using msvc and masm and working with nothing except bios functions.
can anyone help ?!!
update :
i have tried the below function but always nothing loaded into the buffer ??
void read_sector()
{
static unsigned char currentMBR[512] = { 0 };
struct disk_packet //needed for int13 42h
{
byte size_pack; //size of packet must be 16 or 16+
byte reserved1; //reserved
byte no_of_blocks; //nof blocks for transfer
byte reserved2; //reserved
word offset; //offset address
word segment; //segment address
dword lba1;
dword lba2;
} disk_pack;
disk_pack.size_pack = 16; //set size to 16
disk_pack.no_of_blocks = 1; //1 block ie read one sector
disk_pack.reserved1 = 0; //reserved word
disk_pack.reserved2 = 0; //reserved word
disk_pack.segment = 0; //segment of buffer
disk_pack.offset = (word)&currentMBR[0]; //offset of buffer
disk_pack.lba1 = 0; //lba first 32 bits
disk_pack.lba2 = 0; //last 32 bit address
_asm
{
mov dl, 080h;
mov[disk_pack.segment], ds;
mov si, disk_pack;
mov ah, 42h;
int 13h
; jc NoError; //No error, ignore error code
; mov bError, ah; // Error, get the error code
NoError:
}
}
Sorry to post this as "answer"; I want to post this as "comment" but it is too long...
Different compilers have a different syntax of inline assembly. This means that the correct syntax of the following lines:
mov[disk_pack.segment], ds;
mov si, disk_pack;
... depends on the compiler used. Unfortunately I do not use 16-bit C compilers so I cannot help you in this point.
The next thing I see in your program is the following one:
disk_pack.segment = 0; //segment of buffer
disk_pack.offset = (word)&currentMBR[0]; //offset of buffer
With a 99% chance this will lead to a problem. Instead I would to the following:
struct disk_packet //needed for int13 42h
{
byte size_pack;
byte reserved;
word no_of_blocks; // note that this is 16-bit!
void far *data; // <- This line is the main change!
dword lba1;
dword lba2;
} disk_pack;
...
disk_pack.size_pack = 16;
disk_pack.no_of_blocks = 1;
disk_pack.reserved = 0;
disk_pack.data = &currentMBR[0]; // also note the change here
disk_pack.lba1 = 0;
disk_pack.lba2 = 0;
...
Note that some compilers name the keyword "_far" or "__far" instead of "far".
A third problem is that some (buggy) BIOSes require ES to be equal to the segment value from the disk_pack and a fourth one is that many compilers require the inline assembly code not to modify any registers (AX, CX and DX is normally OK).
These two could be solved the following way:
push ds;
push es;
push si;
mov dl, 080h;
// TODO here: Set ds:si to disk_pack in a compiler-specific way
mov es,[si+6];
mov ah, 42h;
int 13h;
...
pop si;
pop es;
pop ds;
In my opinion the "#pragma pack" should not be neccessary because all elements in the structure are propperly aligned.

Undefined reference to in os kernel linking

i have a problem. I making simple OS kernel with this tutorial: http://wiki.osdev.org/Bare_Bones#Linking_the_Kernel
but,if i want to link files boot.o and kernel.o, gcc compiler returns this error:
boot.o: In function `start':
boot.asm:(.text+0x6): undefined reference to `kernel_main'
collect2.exe: error: ld returned 1 exit status.
sources of files:
boot.asm
; Declare constants used for creating a multiboot header.
MBALIGN equ 1<<0 ; align loaded modules on page boundaries
MEMINFO equ 1<<1 ; provide memory map
FLAGS equ MBALIGN | MEMINFO ; this is the Multiboot 'flag' field
MAGIC equ 0x1BADB002 ; 'magic number' lets bootloader find the header
CHECKSUM equ -(MAGIC + FLAGS) ; checksum of above, to prove we are multiboot
; Declare a header as in the Multiboot Standard. We put this into a special
; section so we can force the header to be in the start of the final program.
; You don't need to understand all these details as it is just magic values that
; is documented in the multiboot standard. The bootloader will search for this
; magic sequence and recognize us as a multiboot kernel.
section .multiboot
align 4
dd MAGIC
dd FLAGS
dd CHECKSUM
; Currently the stack pointer register (esp) points at anything and using it may
; cause massive harm. Instead, we'll provide our own stack. We will allocate
; room for a small temporary stack by creating a symbol at the bottom of it,
; then allocating 16384 bytes for it, and finally creating a symbol at the top.
section .bootstrap_stack
align 4
stack_bottom:
times 16384 db 0
stack_top:
; The linker script specifies _start as the entry point to the kernel and the
; bootloader will jump to this position once the kernel has been loaded. It
; doesn't make sense to return from this function as the bootloader is gone.
section .text
global _start
_start:
; Welcome to kernel mode! We now have sufficient code for the bootloader to
; load and run our operating system. It doesn't do anything interesting yet.
; Perhaps we would like to call printf("Hello, World\n"). You should now
; realize one of the profound truths about kernel mode: There is nothing
; there unless you provide it yourself. There is no printf function. There
; is no <stdio.h> header. If you want a function, you will have to code it
; yourself. And that is one of the best things about kernel development:
; you get to make the entire system yourself. You have absolute and complete
; power over the machine, there are no security restrictions, no safe
; guards, no debugging mechanisms, there is nothing but what you build.
; By now, you are perhaps tired of assembly language. You realize some
; things simply cannot be done in C, such as making the multiboot header in
; the right section and setting up the stack. However, you would like to
; write the operating system in a higher level language, such as C or C++.
; To that end, the next task is preparing the processor for execution of
; such code. C doesn't expect much at this point and we only need to set up
; a stack. Note that the processor is not fully initialized yet and stuff
; such as floating point instructions are not available yet.
; To set up a stack, we simply set the esp register to point to the top of
; our stack (as it grows downwards).
mov esp, stack_top
; We are now ready to actually execute C code. We cannot embed that in an
; assembly file, so we'll create a kernel.c file in a moment. In that file,
; we'll create a C entry point called kernel_main and call it here.
extern kernel_main
call kernel_main
; In case the function returns, we'll want to put the computer into an
; infinite loop. To do that, we use the clear interrupt ('cli') instruction
; to disable interrupts, the halt instruction ('hlt') to stop the CPU until
; the next interrupt arrives, and jumping to the halt instruction if it ever
; continues execution, just to be safe.
cli
.hang:
hlt
jmp .hang
kernel.c
#if !defined(__cplusplus)
#include <stdbool.h> /* C doesn't have booleans by default. */
#endif
#include <stddef.h>
#include <stdint.h>
/* Check if the compiler thinks if we are targeting the wrong operating system. */
#if defined(__linux__)
#error "You are not using a cross-compiler, you will most certainly run into trouble"
#endif
/* This tutorial will only work for the 32-bit ix86 targets. */
#if !defined(__i386__)
#error "This tutorial needs to be compiled with a ix86-elf compiler"
#endif
/* Hardware text mode color constants. */
enum vga_color
{
COLOR_BLACK = 0,
COLOR_BLUE = 1,
COLOR_GREEN = 2,
COLOR_CYAN = 3,
COLOR_RED = 4,
COLOR_MAGENTA = 5,
COLOR_BROWN = 6,
COLOR_LIGHT_GREY = 7,
COLOR_DARK_GREY = 8,
COLOR_LIGHT_BLUE = 9,
COLOR_LIGHT_GREEN = 10,
COLOR_LIGHT_CYAN = 11,
COLOR_LIGHT_RED = 12,
COLOR_LIGHT_MAGENTA = 13,
COLOR_LIGHT_BROWN = 14,
COLOR_WHITE = 15,
};
uint8_t make_color(enum vga_color fg, enum vga_color bg)
{
return fg | bg << 4;
}
uint16_t make_vgaentry(char c, uint8_t color)
{
uint16_t c16 = c;
uint16_t color16 = color;
return c16 | color16 << 8;
}
size_t strlen(const char* str)
{
size_t ret = 0;
while ( str[ret] != 0 )
ret++;
return ret;
}
static const size_t VGA_WIDTH = 80;
static const size_t VGA_HEIGHT = 25;
size_t terminal_row;
size_t terminal_column;
uint8_t terminal_color;
uint16_t* terminal_buffer;
void terminal_initialize()
{
terminal_row = 0;
terminal_column = 0;
terminal_color = make_color(COLOR_LIGHT_GREY, COLOR_BLACK);
terminal_buffer = (uint16_t*) 0xB8000;
for ( size_t y = 0; y < VGA_HEIGHT; y++ )
{
for ( size_t x = 0; x < VGA_WIDTH; x++ )
{
const size_t index = y * VGA_WIDTH + x;
terminal_buffer[index] = make_vgaentry(' ', terminal_color);
}
}
}
void terminal_setcolor(uint8_t color)
{
terminal_color = color;
}
void terminal_putentryat(char c, uint8_t color, size_t x, size_t y)
{
const size_t index = y * VGA_WIDTH + x;
terminal_buffer[index] = make_vgaentry(c, color);
}
void terminal_putchar(char c)
{
terminal_putentryat(c, terminal_color, terminal_column, terminal_row);
if ( ++terminal_column == VGA_WIDTH )
{
terminal_column = 0;
if ( ++terminal_row == VGA_HEIGHT )
{
terminal_row = 0;
}
}
}
void terminal_writestring(const char* data)
{
size_t datalen = strlen(data);
for ( size_t i = 0; i < datalen; i++ )
terminal_putchar(data[i]);
}
void kernel_main()
{
terminal_initialize();
/* Since there is no support for newlines in terminal_putchar yet, \n will
produce some VGA specific character instead. This is normal. */
terminal_writestring("Hello\n");
}
It looks like you’re using GCC on Microsoft® Windows® (for example, with Cygwin), judging from the collect2.exe reference. This means your native executable format, which you appear to be using, prepends an underscore to C identifiers to keep them separate from assembly identifiers, which is something most object formats, but not the ELF format wide-spread under modern Unix, does.
If you change your call to _kernel_main, the link error will likely go away.
But please note this line, quoted from your question:
#error "This tutorial needs to be compiled with a ix86-elf compiler"
You’re violating a basic tenet of the tutorial you’re using. I suggest you get a GNU/Linux or BSD VM for i386 (32-bit), and run the tutorial within that.

Probable instruction Cache Synchronization issue in self modifying code?

A lot of related questions <How is x86 instruction cache synchronized? > mention x86 should properly handle i-cache synchronization in self modifying code. I wrote the following piece of code which toggles a function call on and off from different threads interleaved with its execution. I am using compare and swap operation as an additional guard so that the modification is atomic. But I am getting intermittent crashes (SIGSEGV, SIGILL) and analyzing the core dump makes me suspicious if the processor is trying to execute partially updated instructions. The code and the analysis given below. May be I am missing something here. Let me know if that's the case.
toggle.c
#include <stdio.h>
#include <inttypes.h>
#include <time.h>
#include <pthread.h>
#include <sys/mman.h>
#include <errno.h>
#include <unistd.h>
int active = 1; // Whether the function is toggled on or off
uint8_t* funcAddr = 0; // Address where function call happens which we need to toggle on/off
uint64_t activeSequence = 0; // Byte sequence for toggling on the function CALL
uint64_t deactiveSequence = 0; // NOP byte sequence for toggling off the function CALL
inline int modify_page_permissions(uint8_t* addr) {
long page_size = sysconf(_SC_PAGESIZE);
int code = mprotect((void*)(addr - (((uint64_t)addr)%page_size)), page_size,
PROT_READ | PROT_WRITE | PROT_EXEC);
if (code) {
fprintf(stderr, "mprotect was not successfull! code %d\n", code);
fprintf(stderr, "errno value is : %d\n", errno);
return 0;
}
// If the 8 bytes we need to modify straddles a page boundary make the next page writable too
if (page_size - ((uint64_t)addr)%page_size < 8) {
code = mprotect((void*)(addr-((uint64_t)addr)%page_size+ page_size) , page_size,
PROT_READ | PROT_WRITE | PROT_EXEC);
if (code) {
fprintf(stderr, "mprotect was not successfull! code %d\n", code);
fprintf(stderr, "errno value is : %d\n", errno);
return 0;;
}
}
return 1;
}
void* add_call(void* param) {
struct timespec ts;
ts.tv_sec = 0;
ts.tv_nsec = 50000;
while (1) {
if (!active) {
if (activeSequence != 0) {
int status = modify_page_permissions(funcAddr);
if (!status) {
return 0;
}
uint8_t* start_addr = funcAddr - 8;
fprintf(stderr, "Activating foo..\n");
uint64_t res = __sync_val_compare_and_swap((uint64_t*) start_addr,
*((uint64_t*)start_addr), activeSequence);
active = 1;
} else {
fprintf(stderr, "Active sequence not initialized..\n");
}
}
nanosleep(&ts, NULL);
}
}
int remove_call(uint8_t* addr) {
if (active) {
// Remove gets called first before add so we initialize active and deactive state byte sequences during the first call the remove
if (deactiveSequence == 0) {
uint64_t sequence = *((uint64_t*)(addr-8));
uint64_t mask = 0x0000000000FFFFFF;
uint64_t deactive = (uint64_t) (sequence & mask);
mask = 0x9090909090000000; // We NOP 5 bytes of CALL instruction and leave rest of the 3 bytes as it is
activeSequence = sequence;
deactiveSequence = deactive | mask;
funcAddr = addr;
}
int status = modify_page_permissions(addr);
if (!status) {
return -1;
}
uint8_t* start_addr = addr - 8;
fprintf(stderr, "Deactivating foo..\n");
uint64_t res = __sync_val_compare_and_swap((uint64_t*)start_addr,
*((uint64_t*)start_addr), deactiveSequence);
active = 0;
// fprintf(stderr, "Result : %p\n", res);
}
}
int counter = 0;
void foo(int i) {
// Use the return address to determine where we need to patch foo CALL instruction (5 bytes)
uint64_t* addr = (uint64_t*)__builtin_extract_return_addr(__builtin_return_address(0));
fprintf(stderr, "Foo counter : %d\n", counter++);
remove_call((uint8_t*)addr);
}
// This thread periodically checks if the method is inactive and if so reactivates it
void spawn_add_call_thread() {
pthread_t tid;
pthread_create(&tid, NULL, add_call, (void*)NULL);
}
int main() {
spawn_add_call_thread();
int i=0;
for (i=0; i<1000000; i++) {
// fprintf(stderr, "i : %d..\n", i);
foo(i);
}
fprintf(stderr, "Final count : %d..\n\n\n", counter);
}
Core dump analysis
Program terminated with signal 4, Illegal instruction.
#0 0x0000000000400a28 in main () at toggle.c:123
(gdb) info frame
Stack level 0, frame at 0x7fff7c8ee360:
rip = 0x400a28 in main (toggle.c:123); saved rip 0x310521ed5d
source language c.
Arglist at 0x7fff7c8ee350, args:
Locals at 0x7fff7c8ee350, Previous frame's sp is 0x7fff7c8ee360
Saved registers:
rbp at 0x7fff7c8ee350, rip at 0x7fff7c8ee358
(gdb) disas /r 0x400a28,+30
Dump of assembler code from 0x400a28 to 0x400a46:
=> 0x0000000000400a28 <main+64>: ff (bad)
0x0000000000400a29 <main+65>: ff (bad)
0x0000000000400a2a <main+66>: ff eb ljmpq *<internal disassembler error>
0x0000000000400a2c <main+68>: e7 48 out %eax,$0x48
(gdb) disas /r main
Dump of assembler code for function main:
0x00000000004009e8 <+0>: 55 push %rbp
...
0x0000000000400a24 <+60>: 89 c7 mov %eax,%edi
0x0000000000400a26 <+62>: e8 11 ff ff ff callq 0x40093c <foo>
0x0000000000400a2b <+67>: eb e7 jmp 0x400a14 <main+44>
So as can be seen the instruction pointer seems to positioned within an address inside the CALL instruction and processor is apparently trying to execute that misaligned instruction causing an illegal instruction fault.
I think your problem is that you replaced a 5-byte CALL instruction with 5 1-byte NOPs. Consider what happens when your thread has executed 3 of the NOPs, and then your master thread decides to swap the CALL instruction back in. Your thread's PC will be three bytes in the middle of the CALL instruction and will therefore execute an unexpected and likely illegal instruction.
What you need to do is swap the 5-byte CALL instruction with a 5-byte NOP. You just need to find a multibyte instruction that does nothing (such as or'ing a register against itself) and if you need some extra bytes, prepend some prefix bytes such as a gs override prefix and an address-size override prefix (both of which will do nothing). By using a 5-byte NOP, your thread will be guaranteed to either be at the CALL instruction or past the CALL instruction, but never inside of it.
On 80x86 most calls use a relative displacement, not an absolute address. Essentially its "call the code at here + < displacement >" and not "call the code at < address >".
For 64-bit code, the displacement may be 8 bits or 32-bits. It's never 64-bits.
For example, for a 2-byte "call with 8-bit displacement" instruction, you'd be trashing 6 bytes before the call instruction, the call opcode itself, and the instruction's operand (the displacement).
For another example, for a 5-byte "call with 32-bit displacement" instruction, you'd be trashing 3 bytes before the call instruction, the call opcode itself, and the instruction's operand (the displacement).
However...
These aren't the only way to call. For example, you can call using a function pointer, where the address of the code being called is not in the instruction at all (but may be in a register or be a variable in memory). There's also an optimisation called "tail call optimisation" where a call followed by a ret is replaced with a jmp (likely with some additional stack diddling for passing parameters, cleaning up the caller's local variables, etc).
Essentially; your code is severely broken, you can't cover all the possible corner cases, you shouldn't be doing this to begin with, and you probably should be using a function pointer instead of self modifying code (which would be faster and easier and portable too).

Assembly, draw an image

I need to draw QRCode via Assembly(intel)+C(c99) in DOS. But it seems I have too little memory for it.
I tried to store image as bit array:
image
db 11111110b,
...
But anyway I had no result(Illegal read from 9f208c70, CS:IP 192:9f20734f). Now I don't know what to do. Here is code I used:
module.asm:
[bits 16]
global setpixel
global setVM
global getch
global getPixelBlock
section .text
setVM:
push bp
mov bp, sp
mov ax, [bp+6]
mov ah, 0
int 10h
pop bp
ret
setpixel:
push bp
mov bp,sp
xor bx, bx
mov cx, [bp+6]
mov dx, [bp+10]
mov al, [bp+14]
mov ah, 0ch
int 10h
pop bp
ret
getch:
push bp
mov ah,0
int 16h
mov ah,0
pop bp
ret
getPixelBlock:
push bp
mov cx, [bp+6]
mov bx, image
add bx, cx
mov ax, [bx]
pop bp
ret
section .data
image
db 11111110b,
db 10011011b,
db 11111100b,
db 00010011b,
db 00010000b,
db 01101110b,
db 10110000b,
db 10111011b,
db 01110101b,
db 01100101b,
db 11011011b,
db 10100000b,
db 00101110b,
db 11000001b,
db 01110001b,
db 00000111b,
db 11111010b,
db 10101111b,
db 11100000b,
db 00011000b,
db 00000000b,
db 11010011b,
db 01011111b,
db 01101011b,
db 11100100b,
db 11101000b,
db 00110101b,
db 11001111b,
db 01001111b,
db 11100000b,
db 00011011b,
db 11010001b,
db 00100111b,
db 00000011b,
db 10000000b,
db 00000011b,
db 10001111b,
db 11111010b,
db 00100000b,
db 01010000b,
db 01000110b,
db 01011011b,
db 10111010b,
db 01001111b,
db 01010101b,
db 11010110b,
db 10001110b,
db 00101110b,
db 10010001b,
db 01111011b,
db 00000101b,
db 01100001b,
db 10001111b,
db 11101110b,
db 11000001b
main.c:
__asm(".code16gcc\n");
int run();
int _start()
{
return run();
} // Dirty hack to code as I used to
#include "ASM.inl"
#include "Painter.inl"
int run()
{
setVM(0x10);
_brushSize = 5;
drawLogo(30,30);
uint ret = (uint)getch();
return ret>>5;
}
ASM.inl
#ifndef __ASM_H__
#define __ASM_H__
typedef unsigned short int uint;
typedef unsigned char uchar;
void setpixel(uint x, uint y, uint color);
void setVM(uint vm);
uchar getch();
uchar getPixelBlock(uchar);
#endif /* __ASM_H__ */
Painter.inl:
/**
* You can create other colors by using bitwise or
*/
enum Color {
White = 0b0111,
Black = 0b0000,
Red = 0b0100,
Green = 0b0010,
Blue = 0b0001,
Bright = 0b1000,
};
int _brushSize = 5;
void rect(uint x, uint y, uint width, uint height, uint color)
{
uint i,j;
for (i=x; i<width+x; i++) {
for (j=y; j<height+y; j++) {
setpixel(i,j,color);
}
}
}
uint getColor(uchar element, uchar offset)
{
element = element & (1 << offset) >> offset;
return element ? Black : White;
}
void drawLogo(uint x, uint y)
{
uchar current;
uchar counter = 0;
for (uint i=0; i<21; i++) {
for (uint j=0; j<21; j++) {
counter = i*21+j;
current = getPixelBlock((uchar)counter/8);
rect(x+i*_brushSize, y+j*_brushSize, _brushSize, _brushSize, getColor(current, counter%8));
}
}
}
Compilation script:
#!/bin/bash
nasm -f elf32 module.asm -o module.o
gcc -std=c99 -m32 -ffreestanding -masm=intel -c main.c -o main.o
ld -m elf_i386 -Ttext=0x100 main.o module.o -o os.com
objcopy os.com -O binary
GCC version: 4.8.3 (Gentoo 4.8.3 p1.1, pie-0.5.9)
NASM version: 2.11.05
DOSBox version: 0.74
What I am doing wrong? Maybe I should write directly into graphic memory or something like that? Or maybe I should change gcc optimizations?
The assembly code looks generally alright. You might want to check the interrupt calling sequences against the order of parameters on the stack by setting a breakpoint right on the int 10h and checking the register values. I haven't done that stuff for well over 20 years, and I'm rusty.
You have at least two probable operator precedence problems. I don't think these do the right thing.
element = element & (1 << offset) >> offset;
current = getPixelBlock((uchar)counter/8);
You have a hard-coded 'magic number': 21. I have no idea what that means.
After that, the question is: where did it crash? Time to get that debugger stoked up and paying for itself.
I meant to ask: why on earth write this stuff in assembly? You can easily call int 10h either directly from C, from embedded asm in C, or by a single wrapper function.
The way you define your data with a trailing comma introduces an extra byte with zero value. At least in my assembler!
I think you need to double the value of CURRENT in the DRAWLOGO function in order to synchronize with the data.
The function GETPIXELBLOCK recieves values from 0 To 55 which is 1 more than the data lines available!

Cache size estimation on your system?

I got this program from this link (https://gist.github.com/jiewmeng/3787223).I have been searching the web with the idea of gaining a better understanding of processor caches (L1 and L2).I want to be able to write a program that would enable me to guess the size of L1 and L2 cache on my new Laptop.(just for learning purpose.I know I could check the spec.)
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define KB 1024
#define MB 1024 * 1024
int main() {
unsigned int steps = 256 * 1024 * 1024;
static int arr[4 * 1024 * 1024];
int lengthMod;
unsigned int i;
double timeTaken;
clock_t start;
int sizes[] = {
1 * KB, 4 * KB, 8 * KB, 16 * KB, 32 * KB, 64 * KB, 128 * KB, 256 * KB,
512 * KB, 1 * MB, 1.5 * MB, 2 * MB, 2.5 * MB, 3 * MB, 3.5 * MB, 4 * MB
};
int results[sizeof(sizes)/sizeof(int)];
int s;
/*for each size to test for ... */
for (s = 0; s < sizeof(sizes)/sizeof(int); s++)
{
lengthMod = sizes[s] - 1;
start = clock();
for (i = 0; i < steps; i++)
{
arr[(i * 16) & lengthMod] *= 10;
arr[(i * 16) & lengthMod] /= 10;
}
timeTaken = (double)(clock() - start)/CLOCKS_PER_SEC;
printf("%d, %.8f \n", sizes[s] / 1024, timeTaken);
}
return 0;
}
The output of the program in my machine is as follows.How do I interpret the numbers? What does this program tell me.?
1, 1.07000000
4, 1.04000000
8, 1.06000000
16, 1.13000000
32, 1.14000000
64, 1.17000000
128, 1.20000000
256, 1.21000000
512, 1.19000000
1024, 1.23000000
1536, 1.23000000
2048, 1.46000000
2560, 1.21000000
3072, 1.45000000
3584, 1.47000000
4096, 1.94000000
you need direct access to memory
I am not meaning DMA transfer by this. Memory must be accessed by CPU of course (otherwise you are not measuring CACHEs) but as directly as it can be ... so measurements will probably not be very accurate on Windows/Linux because services and other processes can mess with caches during runtime. Measure many times and average for better results (or use the fastest time or filter it together). For best accuracy use DOS and asm for example
rep + movsb,movsw,movsd
rep + stosb,stosw,stosd
so you measure the memory transfer and not something else like in your code !!!
measure the raw transfer times and plot a graph
x axis is transfer block size
y axis is transfer speed
zones with the same transfer rate are consistent with appropriate CACHE layer
[Edit1] could not find my old source code for this so I busted something right now in C++ for windows:
Time measurement:
//---------------------------------------------------------------------------
double performance_Tms=-1.0, // perioda citaca [ms]
performance_tms= 0.0; // zmerany cas [ms]
//---------------------------------------------------------------------------
void tbeg()
{
LARGE_INTEGER i;
if (performance_Tms<=0.0) { QueryPerformanceFrequency(&i); performance_Tms=1000.0/double(i.QuadPart); }
QueryPerformanceCounter(&i); performance_tms=double(i.QuadPart);
}
//---------------------------------------------------------------------------
double tend()
{
LARGE_INTEGER i;
QueryPerformanceCounter(&i); performance_tms=double(i.QuadPart)-performance_tms; performance_tms*=performance_Tms;
return performance_tms;
}
//---------------------------------------------------------------------------
Benchmark (32bit app):
//---------------------------------------------------------------------------
DWORD sizes[]= // used transfer block sizes
{
1<<10, 2<<10, 3<<10, 4<<10, 5<<10, 6<<10, 7<<10, 8<<10, 9<<10,
10<<10, 11<<10, 12<<10, 13<<10, 14<<10, 15<<10, 16<<10, 17<<10, 18<<10,
19<<10, 20<<10, 21<<10, 22<<10, 23<<10, 24<<10, 25<<10, 26<<10, 27<<10,
28<<10, 29<<10, 30<<10, 31<<10, 32<<10, 48<<10, 64<<10, 80<<10, 96<<10,
112<<10,128<<10,192<<10,256<<10,320<<10,384<<10,448<<10,512<<10, 1<<20,
2<<20, 3<<20, 4<<20, 5<<20, 6<<20, 7<<20, 8<<20, 9<<20, 10<<20,
11<<20, 12<<20, 13<<20, 14<<20, 15<<20, 16<<20, 17<<20, 18<<20, 19<<20,
20<<20, 21<<20, 22<<20, 23<<20, 24<<20, 25<<20, 26<<20, 27<<20, 28<<20,
29<<20, 30<<20, 31<<20, 32<<20,
};
const int N=sizeof(sizes)>>2; // number of used sizes
double pmovsd[N]; // measured transfer rate rep MOVSD [MB/sec]
double pstosd[N]; // measured transfer rate rep STOSD [MB/sec]
//---------------------------------------------------------------------------
void measure()
{
int i;
BYTE *dat; // pointer to used memory
DWORD adr,siz,num; // local variables for asm
double t,t0;
HANDLE hnd; // process handle
// enable priority change (huge difference)
#define measure_priority
// enable critical sections (no difference)
// #define measure_lock
for (i=0;i<N;i++) pmovsd[i]=0.0;
for (i=0;i<N;i++) pstosd[i]=0.0;
dat=new BYTE[sizes[N-1]+4]; // last DWORD +4 Bytes (should be 3 but i like 4 more)
if (dat==NULL) return;
#ifdef measure_priority
hnd=GetCurrentProcess(); if (hnd!=NULL) { SetPriorityClass(hnd,REALTIME_PRIORITY_CLASS); CloseHandle(hnd); }
Sleep(200); // wait to change take effect
#endif
#ifdef measure_lock
CRITICAL_SECTION lock; // lock handle
InitializeCriticalSectionAndSpinCount(&lock,0x00000400);
EnterCriticalSection(&lock);
#endif
adr=(DWORD)(dat);
for (i=0;i<N;i++)
{
siz=sizes[i]; // siz = actual block size
num=(8<<20)/siz; // compute n (times to repeat the measurement)
if (num<4) num=4;
siz>>=2; // size / 4 because of 32bit transfer
// measure overhead
tbeg(); // start time meassurement
asm {
push esi
push edi
push ecx
push ebx
push eax
mov ebx,num
mov al,0
loop0: mov esi,adr
mov edi,adr
mov ecx,siz
// rep movsd // es,ds already set by C++
// rep stosd // es already set by C++
dec ebx
jnz loop0
pop eax
pop ebx
pop ecx
pop edi
pop esi
}
t0=tend(); // stop time meassurement
// measurement 1
tbeg(); // start time meassurement
asm {
push esi
push edi
push ecx
push ebx
push eax
mov ebx,num
mov al,0
loop1: mov esi,adr
mov edi,adr
mov ecx,siz
rep movsd // es,ds already set by C++
// rep stosd // es already set by C++
dec ebx
jnz loop1
pop eax
pop ebx
pop ecx
pop edi
pop esi
}
t=tend(); // stop time meassurement
t-=t0; if (t<1e-6) t=1e-6; // remove overhead and avoid division by zero
t=double(siz<<2)*double(num)/t; // Byte/ms
pmovsd[i]=t/(1.024*1024.0); // MByte/s
// measurement 2
tbeg(); // start time meassurement
asm {
push esi
push edi
push ecx
push ebx
push eax
mov ebx,num
mov al,0
loop2: mov esi,adr
mov edi,adr
mov ecx,siz
// rep movsd // es,ds already set by C++
rep stosd // es already set by C++
dec ebx
jnz loop2
pop eax
pop ebx
pop ecx
pop edi
pop esi
}
t=tend(); // stop time meassurement
t-=t0; if (t<1e-6) t=1e-6; // remove overhead and avoid division by zero
t=double(siz<<2)*double(num)/t; // Byte/ms
pstosd[i]=t/(1.024*1024.0); // MByte/s
}
#ifdef measure_lock
LeaveCriticalSection(&lock);
DeleteCriticalSection(&lock);
#endif
#ifdef measure_priority
hnd=GetCurrentProcess(); if (hnd!=NULL) { SetPriorityClass(hnd,NORMAL_PRIORITY_CLASS); CloseHandle(hnd); }
#endif
delete dat;
}
//---------------------------------------------------------------------------
Where arrays pmovsd[] and pstosd[] holds the measured 32bit transfer rates [MByte/sec]. You can configure the code by use/rem two defines at the start of measure function.
Graphical Output:
To maximize accuracy you can change process priority class to maximum. So create measure thread with max priority (I try it but it mess thing up actually) and add critical section to it so the test will not be uninterrupted by OS as often (no visible difference with and without threads). If you want to use Byte transfers then take account that it uses only 16bit registers so you need to add loop and address iterations.
PS.
If you try this on notebook then you should overheat the CPU to be sure that you measure on top CPU/Mem speed. So no Sleeps. Some stupid loops before measurement will do it but they should run at least few seconds. Also you can synchronize this by CPU frequency measurement and loop while is rising. Stop after it saturates ...
asm instruction RDTSC is best for this (but beware its meaning has slightly changed with new architectures).
If you are not under Windows then change functions tbeg,tend to your OS equivalents
[edit2] further improvements of accuracy
Well after finally solving problem with VCL affecting measurement accuracy which I discover thanks to this question and more about it here, to improve accuracy you can prior to benchmark do this:
set process priority class to realtime
set process affinity to single CPU
so you measure just single CPU on multi-core
flush DATA and Instruction CACHEs
For example:
// before mem benchmark
DWORD process_affinity_mask=0;
DWORD system_affinity_mask =0;
HANDLE hnd=GetCurrentProcess();
if (hnd!=NULL)
{
// priority
SetPriorityClass(hnd,REALTIME_PRIORITY_CLASS);
// affinity
GetProcessAffinityMask(hnd,&process_affinity_mask,&system_affinity_mask);
process_affinity_mask=1;
SetProcessAffinityMask(hnd,process_affinity_mask);
GetProcessAffinityMask(hnd,&process_affinity_mask,&system_affinity_mask);
}
// flush CACHEs
for (DWORD i=0;i<sizes[N-1];i+=7)
{
dat[i]+=i;
dat[i]*=i;
dat[i]&=i;
}
// after mem benchmark
if (hnd!=NULL)
{
SetPriorityClass(hnd,NORMAL_PRIORITY_CLASS);
SetProcessAffinityMask(hnd,system_affinity_mask);
}
So the more accurate measurement looks like this:
Your lengthMod variable doesn't do what you think it does. You want it to limit the size of your data set, but you have 2 problems there -
Doing a bitwise AND with a power of 2 would mask off all bits except the one that's on. If for e.g. lengthMod is 1k (0x400), then all indices lower than 0x400 (meaning i=1 to 63) would simply map to index 0, so you'll always hit the cache. That's probably why the results are so fast. Instead use lengthMod - 1 to create a correct mask (0x400 --> 0x3ff, which would mask just the upper bits and leave the lower ones intact).
Some of the values for lengthMod are not a power of 2, so doing the lengthMod-1 isn't going to work there as some of the mask bits would still be zeros. Either remove them from the list, or use a modulo operation instead of lengthMod-1 altogether. See also my answer here for a similar case.
Another issue is that 16B jumps are probably not enough to skip a cachline as most common CPUs work with 64 byte cachelines, so you get only one miss for every 4 iterations. Use (i*64) instead.

Resources