This can be a silly question as I do not know much about this topic at all... It seems that user applications can talk directly to GPU to render an image, for example using OpenGL, through mesa and libdrm, where the libdrm is a wrapper around various ioctl() calls, as illustrated in this graph. Does it mean that for every new frame of a 3D game, the game application needs to call the ioctl() once (or maybe even twice if KMS needs to be reached)? That sounds like a lot of user-kernel space barrier crossing (thinking about a 120 fps game).
libdrm is an user space wrapper to perform fine grained access of the underlying KMS driver features like modesetting, checking if plane being used is an overlay plane or primary plane etc. libdrm implementations are generally different for various CPU/GPU/OS combinations, as the h/w driver running in kernel tend to support different set of functionalities apart from the standard ones. The standard way of working with libdrm is to open a drm device available in /dev/ node and perform libdrm function calls using the fd returned from open().
More often than not, the display compositor software for a particular OS like X11, wayland, hardware-composer will need to be in control of the drm device, which means non privileged applications have no way of being DRM master. Most of the libdrm mode setting functionalities do not work if the application trying to use them are not the DRM master. Recommended practice instead of using libdrm directly, is to use a standard graphics library like openGL or VULKAN to prepare and render frames in your application.
The number of ioctls required to interact with the kernel DRM module is most likely not the biggest bottleneck you will face when trying to render high FPS applications. The preferred way to run high fps applications while cooperating with the display compositor of the target system is to have
a double or triple buffered setup for rendering, where the next buffer to be rendered is ready to be rendered before the current frame has finished rendered.
Take advantage of h/w acceleration wherever possible, e.g for performing scaling/resizing/image format conversions/color space conversions.
Pre compute and reuse shader elements
Try to reuse texture elements as much as possible instead of computing a lot of textures for every frame being rendered.
Use vector/SIMD/SSEv2,3,4/AVX/neon instructions wherever possible to take advantage of modern CPU pipelines
Related
Let’s say I have created a simple program in C, using GTK, that brings up a label. When this program is run using ./a.out from the command line, I am aware that a new process is forked, execve is called, etc.
But when exactly in the process of running a program is my GUI, with my label, drawn to the screen? At what point is X11 interfaced with? I am struggling to understand exactly when such GUIs are drawn in terms of the steps with the Linux process lifecycle.
To help illustrate my understanding, this is the link I am using to try and understand the general process life cycle - it contains no information on when GUIs are drawn however. http://glennastory.net/?p=870
The system for screen rendering has changed often on Linux. An extensive but probably not this much reliable summary can be found at: https://en.wikipedia.org/wiki/Direct_Rendering_Infrastructure.
The information is all there but it assumes a certain level of knowledge from the reader. I can summarize what I understand from the link and give some further information.
In the classic X Window System architecture the X Server is the only process with exclusive access to the graphics hardware, and therefore the one which does the actual rendering on the framebuffer. All that X clients do is communicate with the X Server to dispatch rendering commands. Those commands are hardware independent, meaning that the X11 protocol provides an API that abstracts the graphics device so the X clients don't need to know or worry about the specifics of the underlying hardware. Any hardware specific code lives inside the Device Dependent X, the part of the X Server that manages each type of video card or graphics adapter and which is also often called the video or graphics driver.
What this says is that the X server is started as a privileged process (root). A non-root process communicates with the X server using xlib. Xlib itself communicates with the X server using socket system calls. This allows to securely communicate with the X server (a form of secure IPC). The socket interface to the X server is device independent. Your non-root process will call the same function regardless of the underlying graphics card.
The device independent portion of the X server will make calls in the device dependent portion. The device dependent portion is basically a user mode driver implementation. It is not the same as a kernel driver because it is actually called by a user mode root process. The kernel driver is a separate entity that is added to the kernel at boot. The kernel driver is either programmed by some third party which reads the graphics card documentation or, more likely, by the graphics card vendor themselves. The kernel driver is nothing more than a fancy character device responding to ioctl system calls and doing PCI read/writes on registers that make the graphics card do DMA operations and interact with the screen.
The user mode portion of the driver (the device dependent portion of the X server) is also implemented by the graphics card vendor. It needs to be like this because the X server doesn't know (and it shouldn't) anything about how the driver works. The X server thus presents generic functions that are called by the device independent portion of it. It is these generic functions that are implemented by the graphics card vendor.
The rise of 3D rendering has shown the limits of this architecture. 3D graphics applications tend to produce large amounts of commands and data, all of which must be dispatched to the X Server for rendering. As the amount of inter-process communication (IPC) between the X client and X Server increased, the 3D rendering performance suffered to the point that X driver developers concluded that in order to take advantage of 3D hardware capabilities of the latest graphics cards a new IPC-less architecture was required. X clients should have direct access to graphics hardware rather than relying on a third party process to do so, saving all the IPC overhead. This approach is called "direct rendering" as opposed to the "indirect rendering" provided by the classical X architecture. The Direct Rendering Infrastructure was initially developed to allow any X client to perform 3D rendering using this "direct rendering" approach.
What this says is that the X server indirect (through sockets) rendering was not good enough for 3D. They thus implemented a direct rendering. After all, if you install a driver on your system, you are completely vulnerable to what it is going to do. You can count on a driver written by hardware vendors to collaborate with the X server to render where the driver should render. This is exactly what happens. There is still kernel and user mode portions of the driver. The kernel portion stays the same but the user mode portion becomes an implementation of a 3D rendering convention such as OpenGL. It is exactly what is stated a bit further:
the DRI client —an X client performing "direct rendering"— needs a hardware specific "driver" able to manage the current video card or graphics adapter in order to render on it. These DRI drivers are typically provided as shared libraries to which the client is dynamically linked. Since DRI was conceived to take advantage of 3D graphics hardware, the libraries are normally presented to clients as hardware accelerated implementations of a 3D API such as OpenGL, provided by either the 3D hardware vendor itself or a third party such as the Mesa 3D free software project.
Afterwards, it says:
the X Server provides an X11 protocol extension —the DRI extension— that the DRI clients use to coordinate with both the windowing system and the DDX driver.[9] As part of the DDX (device dependent X) driver, it's quite common that the X Server process also dynamically links to the same DRI driver that the DRI clients, but to provide hardware accelerated 3D rendering to the X clients using the GLX extension for indirect rendering (for example remote X clients that can't use direct rendering). For 2D rendering, the DDX driver must also take into account the DRI clients using the same graphics device.
The X server still does indirect rendering with the GLX extension.
the access to the video card or graphics adapter is regulated by a kernel component called the Direct Rendering Manager (DRM).[10] Both the X Server's DDX driver and each X client's DRI driver must use DRM to access to the graphics hardware. DRM provides synchronization to the shared resources of the graphics hardware —resources such as the command queue, the card registers, the video memory, the DMA engines, ...— ensuring that the concurrent access of all those multiple competing user space processes don't interfere with each other. DRM also serves as a basic security enforcer that doesn't allow any X client to access the hardware beyond what it needs to perform the 3D rendering.
Like I said earlier, there is a kernel and user mode portion of the graphics card driver. The DRM is partly generic (the same for every graphics card) and partly specific to one graphics card. This is made possible by how PCI devices work (all graphics card today are PCI if not all peripherals). PCI devices have some common registers that must be found on all devices. This represents the generic portion of the DRM. Among the common registers there is BAR registers that point to a device specific portion of the configuration space. This represents the specific portion of the DRM. The graphics card vendor thus provides a kernel module in the form of a character device. The kernel then detects using the PCI IDs of the graphics card that it must use this kernel module to drive it. Then, it will present a virtual character device file to user mode which can be opened to do further ioctl operations on it.
The DRI2 extension provides other core operations for the DRI clients, such as finding out which DRM device and driver should they use (DRI2Connect) or getting authenticated by the X Server in order to be able to use the rendering and buffer facilities of the DRM device (DRI2Authenticate).
What this says is that the DRI clients still need to get authenticated with the X server before they can actually render (make DRM system calls).
At least, this is what I understand from the wikipedia page. I cannot be completely sure but it must be something along those lines.
I'm trying to understand all OS theory. But here a problem, I can't find any information on the net to switch in SVGA (or HDMI) to draw on a monitor. I already know we have 4x4096 KB allocated as video memory for VGA but really, it's limited if we want 1080x720 resolution. So
- How to switch to SVGA (or HDMI) ? Probably a syscall or I/O request ?
- After that, how to re-define the address of the video memory ?
- Bonus question : how to use the technology of accelerated hardware ?
Thank you in advance for your answer and sorry for my eventuals wrongs in English
The only common low-level standard for resolutions bigger than VGA is VESA BIOS Extensions (VBE). I don't know how widely it's covered with the UEFI backward compatibility, because it's almost never used nowadays.
BIOS Extensions are like drivers built into the card itself. To switch the video mode, or to have pointer to VRAM, you have to call the proper VBE service via it's interrupt. The ROM "driver", designed to work with a certain hardware, performs the needed operations and returns the result.
Unfortunately, no hardware acceleration were covered by VBE, so they became more and more obsolete as GPU became more and more important. No suitable replacement was developed, so if you want to work with bare hardware, you must know every video chip (or, at least, a chip family if they're close enough) and write a driver for each one. If PDFs are free, it's easy (I've worked with 3dfx, it's simple: write to the port N, wait until bit R on port M becomes 1 etc).
The problem is, you have to do it for every chip.
You can also read some Linux driver sources, if you want to see how all those ports and I/O are triggered.
I want to make an (extremely simple) operating system. I am currently learning about graphics cards.
This is what I know so far (please correct me if I am wrong):
A graphics card has two modes: a text mode, and a graphics mode.
You can write data to a graphics cards using BIOS (instead of accessing the graphics card directly).
What I want to do is to write directly to the graphics card's video memory without using BIOS (because I want to understand how things work). So I have the following questions:
How do I know what is the base address of the video memory of the graphics card, is this
done by probing the PCI bus to get the base address, or is the base address fixed (just like the COM ports base addresses is fixed for example)?
Are all graphics cards accessed in the same way, or do I have to create device drivers for all available graphics cards?
Edit: I am using x86.
Introduction
Graphics cards are a very complex topic, I'm confident in saying that they are the most complex subsystem you'll find on a PC.
If you ever found yourself lost programming an XHCI (USB 3.0) or an old RTL8239A network interface card then be prepared because this is much more complex.
Graphics controllers are the products of a very competitive marketing - rarely a vendor opens the specifications and when it does, it gives an intentionally poor support.
If you add that the hardware itself deals with: codecs, audio (yes, audio streams too), 3D programmable pipelines, video signals and video outputs, surface formats, media formats, DMA and memory remapping then you can see that it is not an easy task to program a video card.
The better approach, in my opinion, is to "retrace the history" of the video cards.
Start from the MDA then move to CGA then EGA and finally to VGA.
The VGA legacy is still supported, the specifications can be found here or in the first part of this PDF from Intel.
You can program the VGA without the BIOS "easily" - meaning that it is an already well-known and documented hardware architecture (but not necessarily easy to configure).
I don't remember if the previous adapters were subsets of the VGA or not, if not they aren't supported anymore probably.
You can try with a virtual machine or an emulator.
When you are satisfied with the VGA you can move to the SVGA.
Here come the troubles: as Wikipedia confirms, the VGA was the last truly standardised video card/adapter interface:
Unlike VGA—a purely IBM-defined standard—Super VGA was never formally defined.
The organisation VESA standardised a BIOS API called Video BIOS Extensions to allow the use of SVGA cards to driverless OSes but that's not what you were looking for.
You can try reverse engineering a VBE BIOS but I think it will be a nightmare - a senseless stream of writes to IO ports and MMIOs.
Making sense of tenths of configuration registers without any reference is almost impossible.
Note that we are still talking about 1998 technology up to this point.
After the VESA VBE effort, no more standard interfaces have been published - the only reliable way to program a video card with less than 20 years is by signing an NDA with its vendor.
Luckily, recently (actually, not anymore), Intel entered the market with its Intel GFX (a.k.a. Intel HD Graphics) cards.
Intel never aimed to manufacture top-of-the-notch video cards, not even closely - so they can be open about their architecture since that's not their core business.
The result is this marvellous set of Programming Reference Manuals that describe the functionality of their video cards.
Complete with (traditionalistic) minimal information to program them.
In general, hobbyists stops before this point (at the SVGA checkpoint), because the hardware has become very complex and the efforts very huge.
For example, my Haswell integrated video card is documented with 17 PDFs of about 250 pages each (on average).
The display part is documented in a PDF on its own, the framebuffer has disappeared in favour of Display surface and the display part alone of the hardware is this:
While this may not be very comprehensible, it should suffice to get an idea of the numerous technology that a programmer must understand before programming a modern video card.
You can surely take a look at the Linux source code but beware that the Linux kernel is no usually of immediate understanding even for simple controllers - it is not a toy OS, it is a real OS with its own API and interface that must fit the hardware interface (actually the other way around).
Furthermore, only the Intel and AMD video drivers are really open source, the others are either proprietary or just a bunch of undocumented code.
Brief outline of common VGA modes programming
If you just want to program the VGA (a very respectable task indeed!) you can start by setting the video modes 03h (text mode) or 13h (graphics mode).
Video mode 03h
The frame buffer is at 0b8000h (physical address), usually accessed as 0b800h:0000h as it is handy to have a zero offset.
The screen is made up of 80x25 characters, each characters occupy a word (16-bit) in the frame buffer.
The low byte is the character code - the character map used will associate a glyph to a code (e.g. 41h to A).
The high order byte is the attribute byte - the low nibble is the foreground colour, the high nibble is the background colour.
More information can be found in the EGA/CGA/VGA links above.
Video mode 13h
It is a graphical mode with 320x200 pixel, the frame buffer is at 0a0000h (physical address) usually accessed as 0a000h:0000h for the same reason of above.
Each pixel is a single byte, the value of the byte selects the colour of the pixel.
The default palette can be changed by programming the DAC registers (3c7h, 3c8h, 3c9h for the VGA adapter).
Answers
A graphics card has two modes: a text mode, and a graphics mode.
Not necessarily, today this distinction may not exist anymore.
The MDA had only a text mode.
EGA, CGA and VGA and SVGA had both.
The modern approach is to draw the text, however during boot or during particular situations (e.g. BSOD) a basic video driver in text mode is used.
This driver probably uses a BIOS service since the video driver may not be available/reliable.
You can write data to a graphics cards using BIOS
Up to the SVGA era, then BIOS support was discontinued.
How do I know what is the base address of the video memory of the graphics card, is this done by probing the PCI bus to get the base address, or is the base address fixed (just like the COM ports base addresses is fixed for example)?
Video cards have been connected through the history to the ISA, PCI, AGP and PCIe buses.
Only the ISA bus wasn't configurable (at least not from the beginning), the others had configurable BARs (Base Address Registers) per function (the smallest addressable entity in the PCI bus).
In order to get the base address of the MMIO registers of a video card the PCI or PCIe bus must be enumerated and the standard registers in the configuration space must be read/set.
Dealing with PCIe is not as easy as dealing with PCI.
Note that not even the UARTs have a fixed address, they are configured by default to map to the legacy (3f8h, 2f8h, 3e8h and 2e8h) addresses but the hardware was (is?) in a SuperIO chip behind a PCI-to-LPC bridge that emulated a PCI-to-ISA bridge.
With the advent of the Intel platform hub architecture (i.e. the death of the north and south bridge) the SuperIO chip eventually made it into the PCH or moved behind the SPI controller.
Are all graphics cards accessed in the same way, or do I have to create device drivers for all available graphics cards?
Each graphic card is a beautiful vicious creature on its own.
A device driver is needed for each model.
Some driver can be reused for a whole family of models but this is not true in general.
Having only used Direct3D and OpenGL on desktop computer, I have this concept in my head that whenever buffers need to be updated you need to send this data to the GPU with calls such as glBufferData()/glBufferSubData(), and that this type of thing should be minimised at all costs.
Since OpenGL ES works with embedded systems such as phones, and the devices I don't think have dedicated GPU RAM, I was wondering if such an API call does completely different things if compiled on one device (Android) vs on another device (Windows desktop computer). The calls seem the same or similar whether on OpenGL or OpenGL ES, and I was wondering if I write my program on Windows it'll work on a mobile device. My guess is that on a mobile device these function calls will load data onto the system RAM whereas on a desktop it will be sent to the GPU RAM, if a GPU is available.
If this is the case, then is it an advantage for the mobile device in that data is only kept in one place and never sent anywhere else (ie., to the GPU)?
The only significant difference is that in one case there is a DMA transfer over PCIe (desktop), in the other case it's just a memcpy to a driver-owned buffer in system RAM (mobile).
In both cases it can be relatively expensive (e.g. memory allocation, data copy, and possibly a need for cache maintenance on some systems), so it should still be minimized whenever possible. It's far more efficient to upload resources to wherever they need to be at the start of a game level and then just reference them from then onwards.
I am currently working with the BeagleBone Black using Ubuntu and I am trying to find some direction. I have created a c program that listens for SIGIO and runs a read() to get the data on that line. From my research on the internet and looking through some books, it appears that this method is not very efficient in that using a loop listening for a Signal interrupt is bad because of the large amount of context switching (it should be noted that this I/O line will be busy so the SIGIO will trigger at least 4 times a second and this is an asynchronous). It was suggested to use hardware interrupts and have that trigger a response to take the data from the line and place it into a register and will be accessable from the User using Direct Memory Access preferably. So the question remains to be where can I look to get more info on how to do this, I find a lot of info on this topic but most of which just talk about how to OS does interrupts or using Signals, which with a busy line is pretty taxing.
If you are that much concerned about the timings and latency, you should probably use some real time system.
Fortunately, Beaglebone black has real-time processing cores on its SOC, called the PRU (Programmable real-time units).
If you are new to the concept of PRUs, you probably would like to start here and then, once you have understood the need and purpose of the PRUs, that same website has some tutorial to get started.
With the latest software support like remoteproc, rpmsg and Beaglescope project, PRUs can be used quite easily, once you have understood its working.