How to efficiently draw to plain win32 windows using Direct2D and GDI

How to efficiently draw to plain win32 windows using Direct2D and GDI - c

I've been working on a GUI toolkit for my future programming needs. It's basically reinventing the wheel and implementing many controls found in Windows' common controls, QT and other frameworks. It's going to be used by me mainly.
It's main design guidelines are:
implemented in plain C (not C++) and Win32 (GDI + Direct2D) (no other external dependencies)
easy to look at (even for a long time)
customization similar to QT's css-based stylesheets
easy to render (not much complex geometry)
really good performance (no performance issues, even in large GUI projects)
It's been going quite well for now and I have managed to implement quite a few important and trivial controls. Right now, I am building my slider control that can be a rotary slider (like QDial), or a horizontal or vertical bar slider.
While there are no obvious bugs that I have noticed during my testing, I am questioning the way I am rendering the control (using Direct2D and GDI).
Below you can find the commented draw code and the result it produces. I know it's not perfect by any means but it works flawlessly for me. Please do not judge my coding style for this question is really not on that.
static int __Slider_Internal_DCDBufferDraw(Slider *sSlider) {
if (!sSlider->_Base.sDraw)
return ERROR_OK;
/* start timer */
LARGE_INTEGER t1, t2;
QueryPerformanceCounter(&t1);
/* appearance depends on enabled state of the control */
_Bool blIsEnabled = IsWindowEnabled(sSlider->_Base.hwWindow /* control's HWND instance */);
D2D1_ELLIPSE sInnerCircle = {
.point = { __SlR_C /* center of circle, essentially width / 2 */, __SlR_C + __ClH(sSlider) / 6.0f /* center + some offset */ },
.radiusX = __SlR_IR + 0.5f, /* IR = inner radius */
.radiusY = ___SlR_IR + 0.5f
};
D2D1_ELLIPSE sOuterCircle = {
.point = { __SlR_C, __SlR_C },
.radiusX = __SlR_OR + 0.5f, /* OR = outer radius */
.radiusY = __SlR_OR + 0.5f
};
D2D1_BEGIN:
/*
Global struct "gl_sD2D1Renderer" contains a ID2D1Factory, a ID2D1DCRenderTarget (.sDCTarget), and a ID2D1SolidColorBrush (.sDCSCBrush).
Every control uses this DC to draw Direct2D content. Before anything is drawn, the DC is bound. Right now, I draw everything to an control instance-specific
HDC "sSlider->_Base.sDraw->hMemDC" in this function. In my actual WM_PAINT handler, I just BitBlt the memory bitmap. This
(1) removes flickering,
(2) improves draw speed for normal WM_PAINT commands, for example, when the client area of the window is uncovered/moved/etc.
In these cases, I just use the most recent representation without redrawing everything because the control
only changes its appearance in reaction to user input.
The brush is used to basically draw all the color information. It just gets its color changed every time it's needed.
The reason I am using a DC render target is because
(1) of its reusability (can be used for drawing all controls, without having to create separate render targets for each control instance)
(2) GDI compatibility (see "__Slider_Internal_DrawNumbersAndText"'s comment below to learn why I need it)
*/
ID2D1DCRenderTarget_BindDC(gl_sD2D1Renderer.sDCTarget, sSlider->_Base.sDraw->hMemDC, &sSlider->_Base.sClientRect);
ID2D1DCRenderTarget_BeginDraw(gl_sD2D1Renderer.sDCTarget);
ID2D1DCRenderTarget_Clear(gl_sD2D1Renderer.sDCTarget, &colBkgnd);
/* rotate the smaller circle by the current slider position (min ... max) */
D2D1_MATRIX_3X2_F sMatrix;
D2D1MakeRotateMatrix(sSlider->flPos, (D2D1_POINT_2F){ __SlR_C, __SlR_C}, &sMatrix);
ID2D1DCRenderTarget_SetTransform(gl_sD2D1Renderer.sDCTarget, &sMatrix);
/* draw the outer circle */
ID2D1SolidColorBrush_SetColor(gl_sD2D1Renderer.sDCSCBrush, blIsEnabled ? &colBtnSurf : &colBtnDisSurf);
ID2D1DCRenderTarget_FillEllipse(gl_sD2D1Renderer.sDCTarget, &sOuterCircle, (ID2D1Brush *)gl_sD2D1Renderer.sDCSCBrush);
ID2D1SolidColorBrush_SetColor(gl_sD2D1Renderer.sDCSCBrush, &colOutline);
ID2D1DCRenderTarget_DrawEllipse(gl_sD2D1Renderer.sDCTarget, &sOuterCircle, (ID2D1Brush *)gl_sD2D1Renderer.sDCSCBrush, 1.0f, NULL);
/* draw the inner circle */
ID2D1SolidColorBrush_SetColor(gl_sD2D1Renderer.sDCSCBrush, blIsEnabled ? (sSlider->_Base.wState & STATE_CAPTURE || sSlider->_Base.wState & STATE_MINSIDE ? &colBtnSelSurf : &colMark) : &colMarkDis);
ID2D1DCRenderTarget_FillEllipse(gl_sD2D1Renderer.sDCTarget, &sInnerCircle, (ID2D1Brush *)gl_sD2D1Renderer.sDCSCBrush);
ID2D1SolidColorBrush_SetColor(gl_sD2D1Renderer.sDCSCBrush, &colOutline);
ID2D1DCRenderTarget_DrawEllipse(gl_sD2D1Renderer.sDCTarget, &sInnerCircle, (ID2D1Brush *)gl_sD2D1Renderer.sDCSCBrush, 1.0f, NULL);
/* reset the transform */
ID2D1DCRenderTarget_SetTransform(gl_sD2D1Renderer.sDCTarget, &gl_sD2D1Renderer.sIdentityMatrix);
/* draw ticks using Direct2D */
__Slider_Internal_DrawTicks(sSlider, 0); /* draw small ticks */
__Slider_Internal_DrawTicks(sSlider, 1); /* draw big ticks */
/* Call EndDraw, check for render target errors, drop the render target if necessary, recreate it and "goto D2D1_BEGIN;" */
ID2D1DCRenderTarget_SafeEndDraw(gl_sD2D1Renderer.sDCTarget, NULL, NULL);
/*
Draw text using plain GDI (no DirectWrite because there is no functioning C-API.)
I have to do this here because I need to finish rendering the D2D content first. If I render GDI content in between Direct2D calls,
it would just be overdrawn because drawing is actually done in "EndDraw", rather than in "DrawEllipse", "Clear", etc. These calls just
build a batch while "Ellipse" or "TextExtOut" do immediately draw.
*/
__Slider_Internal_DrawNumbersAndText(sSlider);
/* end timer */
QueryPerformanceCounter(&t2);
LARGE_INTEGER freq;
QueryPerformanceFrequency(&freq);
double elapsed = (double)(t2.QuadPart - t1.QuadPart) / (freq.QuadPart / 1000.0);
printf("Draw call of \"%s\" took: %g ms\n", sSlider->_Base.strID, elapsed);
return ERROR_OK; /* 0 */
}
static int __Slider_Internal_DrawTicks(Slider *sSlider, int dwType) {
/* BTC = big tick count */
/* STC = small tick count */
/* check if ticks can be drawn, return if, for instance, not all data is present or tick drawing is disabled */
if (!(dwType ? sSlider->wBTC : sSlider->wSTC) || !(sSlider->wType & (dwType ? SLO_BIGTICKS : SLO_SMALLTICKS)))
return __ERROR_OK;
/* tick color */
ID2D1SolidColorBrush_SetColor(gl_sD2D1Renderer.sDCSCBrush, &colOutline); /* RGB(0, 0, 0) */
float flCurrPos = sSlider->sPosRange.flMin; /* start at the minimum possible angle for this slider */
/* calculate the step, i.e. angle to advance based on requested tick count and valid position (angle) range */
float flStep = (sSlider->sPosRange.flMax - sSlider->sPosRange.flMin) / (dwType ? sSlider->wBTC : sSlider->wSTC);
D2D1_MATRIX_3X2_F sMatrix; /* rotation matrix */
D2D1_POINT_2F sP1, sP2; /* start and end point of the line representing a tick */
D2D1_POINT_2F sCenter = {
__SlR_C + 0.5f,
__SlR_C + 0.5f
};
/* calculate tick dimensions given the type (= small or large) */
__Slider_getTickDimensions(sSlider, &sP1, &sP2, dwType);
int dwCount = 0;
do {
/* prevent drawing over big ticks */
if (!dwType && !(sSlider->wSTC % sSlider->wBTC))
if (!(dwCount % (sSlider->wSTC / sSlider->wBTC)))
goto ADD_STEP; /* just advance, do not draw */
if (sSlider->wType & SLT_RADIAL) { /* only do this if our slider is a rotary knob */
/* use the rotation matrix to draw the ticks in the same manner the inner circle of the slider is drawn */
D2D1MakeRotateMatrix(flCurrPos, sCenter, &sMatrix);
ID2D1DCRenderTarget_SetTransform(gl_sD2D1Renderer.sDCTarget, &sMatrix);
}
ID2D1DCRenderTarget_DrawLine(gl_sD2D1Renderer.sDCTarget, sP1, sP2, (ID2D1Brush *)gl_sD2D1Renderer.sDCSCBrush, 1.0f, NULL);
ADD_STEP:
flCurrPos += flStep; /* advance current position by previously */
} while (dwCount++ < (dwType ? sSlider->wBTC : sSlider->wSTC));
return ERROR_OK;
}
static int __Slider_Internal_DrawNumbersAndText(Slider *sSlider) {
/* only draw numbers if the option is specified */
if (sSlider->wType & SLO_NUMBERS) {
float flPosX, flPosY;
CHAR strString[8] = { 0 }; /* number string buffer */
SIZE sExtends = { 0 };
/* the same as in "DrawTicks" */
float flAngle = sSlider->sPosRange.flMin;
int dwNumber = sSlider->sNRange.dwMin; /* first number in the number range */
float flAStep = (sSlider->sPosRange.flMax - sSlider->sPosRange.flMin) / sSlider->wBTC; /* angle step */
int dwNStep = (sSlider->sNRange.dwMax - sSlider->sNRange.dwMin) / sSlider->wBTC; /* number step */
do {
/* this should be clear what it does */
sprintf_s(strString, 7, "%i", dwNumber);
GetTextExtentPoint32A(sSlider->_Base.sDraw->hMemDC, strString, (int)strlen(strString), &sExtends);
/* calculate text position around the outer circle */
/* gl_flBTL = big tick length, gl_flTDP = pitch between outer circle edge and tick start */
flPosX = cosf(__toRad(flAngle - 90.0f)) /* deg to rad */ * (__SlR_OR + gl_flTDP + gl_flBTL + 10.0f);
flPosY = sinf(__toRad(flAngle - 90.0f)) * (__SlR_OR + gl_flTDP + gl_flBTL + 10.0f);
TextOutA(
sSlider->_Base.sDraw->hMemDC,
(int)(__SlR_C - flPosX),
(int)(__SlR_C - flPosY - sExtends.cy / 2.0f),
strString,
(int)strlen(strString)
);
flAngle += flAStep;
dwNumber += dwNStep;
/* prevent overdrawing first number when 360 degrees range */
if (dwNumber == sSlider->sNRange.dwMax && sSlider->sPosRange.flMin == 0.0f && sSlider->sPosRange.flMax == 360.0f)
break;
} while (dwNumber <= sSlider->sNRange.dwMax);
}
/* draw the main slider text in the middle at the bottom edge of the control */
/* __Cl* = extends of the client area of the window (X = left, Y = top, W = right, H = bottom) */
TextOut(
sSlider->_Base.sDraw->hMemDC,
(__ClW(sSlider) - __ClX(sSlider)) / 2,
(__ClH(sSlider) - __ClY(sSlider)) / 2 + (int)__SlR_OR + 10,
sSlider->_Text.strText,
sSlider->_Text.dwLengthInChars
);
return TEGTK_ERROR_OK; /* 0 */
}
With certain exemplary values given, it produces this result:
While I find the result visually pleasing and its rendering procedure relatively simple, I think it's slow. I have not noticed any performance issues yet; therefore, I have measured the time it takes to complete an entire draw call. Note that this is done every time the slider's appearance changes due to user input.
I have also found that when I move the mouse slowly, the draw calls are way slower than when I move the mouse quickly.
Slow mouse movement:
Fast mouse movement:
The issue is now that I create a separate memory DC for every control instance, which I later draw to using the code above. I have heard that I can only use 10k GDI objects per process. I already use at least 2 per control (a DC and a bitmap). What if I have a really large GUI project with a lot going on? I really do not ever want to run into the limits.
That's why I was thinking of moving the paint code entirely into WM_PAINT and using the DC I get from "BeginPaint()" (so no extra memory DC and bitmap needed). Basically forcing an entire repaint when it gets called. That's where the speed issue comes into play as WM_PAINT can be sent really frequently.
I know I can smartly repaint only what's needed, but the atomic primitive draw calls do not do a lot when it comes to performance. What takes a lot of time is binding the DC and EndDraw.
I now have a dilemma because I want to be both fast but also not using more GDI objects than I absolutely have to. So not using a separate memory buffer is an option if the draw described above is in-fact not slow objectively.
These are my questions:
Is my drawing code actually slow or is it okay if redrawing the control takes like 1-5 ms on average?
What can I do to improve its performance if it's actually slow? (I have tried to buffer as much computational data as I can -- while it essentially doubles the control's memory footprint, it does not really do anything for performance.)
How is the actual redrawing done in commercially available frameworks such as QT and wxWidgets?
I really hope it's clear what I want. If there are any questions, feel free to ask. It's not only about good code, but also about good design. I want to make sure I do not implement major design flaws that early in the project.

Related

OpenGL - Is it efficient to call glBufferSubData (nearly) each frame?

I have a spritesheet that contains a simple sprite animation. I can successfully extract each animation frame sprite and display it as a texture. However, I managed to do that by calling glBufferSubData to change the texture coordinates (before the game loop, at initialization). I want to play a simple sprite animation using these extracted textures, and I guess I will do it by changing the texture coordinates each frame (except if animation is triggered by user input). Anyways, this results in calling glBufferSubData almost every frame (to change texture data), and my question is that is this approach efficient? If not, how can I solve the issue? (In Reddit, I saw a comment saying that the goal must be to minimize the traffic between CPU and GPU memory in modern OpenGL, and I guess my approach violates this goal.) Thanks in advance.
For anyone interested, here is my approach:
void set_sprite_texture_from_spritesheet(Sprite* sprite, const char* path, int x_offset, int y_offset, int sprite_width, int sprite_height)
{
float* uv_coords = get_uv_coords_from_spritesheet(path, x_offset, y_offset, sprite_width, sprite_height);
for (int i = 0; i < 8; i++)
{
/* 8 means that I am changing a total of 8 texture coordinates (2 for each 4 vertices) */
edit_vertex_data_by_index(sprite, &uv_coords[i], (i / 2) * 5 + 3 + (i % 2 != 0));
/*
the last argument in this function gives the index of the desired texture coordinate
(5 is for stride, 3 for offset of texture coordinates in each row)
*/
}
free(uv_coords);
sprite->texture = load_texture(path); /* loads the texture -
since the UV coordinate is
adjusted based on the spritesheet
I am loading the entire spritesheet as
a texture.
*/
}
void edit_vertex_data_by_index(Sprite *sprite, float *data, unsigned int start_index)
{
glBindBuffer(GL_ARRAY_BUFFER, sprite->vbo);
glBufferSubData(GL_ARRAY_BUFFER, start_index * sizeof(float), sizeof(data), data);
glBindBuffer(GL_ARRAY_BUFFER, 0);
/*
My concern is that if I call this almost every frame, it could be not efficient, but I am not sure.
*/
}

Editing buffers is fine. Literally every game has buffers that change every frame. Buffers are how you get the data to the GPU so it can render it! (And uniforms. Your driver is likely to secretly put uniforms in buffers though!)
Yes, you should minimize the amount of buffer updates. You should minimize everything, really. The less stuff the computer does, the faster it can do it! That doesn't mean you should avoid doing stuff entirely. It means you should only do as much stuff as you need to, instead of doing wasteful stuff that you don't need.
Every time you call an OpenGL function, the driver takes some time to check how to process your request, which buffer is bound, that it's big enough, that the GPU isn't using it at the same time, etc. You want to do as few calls as possible, because that way, the driver has to check all this stuff less often.
You are doing 8 separate glBufferSubData calls in this function. If you put the UV coordinates all next to each other in the buffer, you could update them all at once with 1 call. And if you have lots of animated sprites, you should try to put all of their UV coordinates in one big array, and update the whole array in one call - all the sprites at once.
And loading textures from paths is really slow. Maybe your program can load 100 textures per second but that still means you blew half your frame time budget on texture loading. The texture hasn't changed anyway so why would you load it again?

Why is the third vertex of my first triangle always at the origin and the rest of my triangles not being drawn?

I've checked the results of everything and I've tidied up multiple bugs in my draw function already, but I still can't find the reason for the behavior described in the question title. I'm using OpenGL 1.4, 3D textures, vertex arrays, texture coordinate arrays, and glDrawArrays to draw models (from my model API) with textures (from my texture API) to the screen. Through looking at the results of everything (printfs), I've concluded the problem has to be in the block of code that actually draws everything, and not my code that fills these arrays with post animating vertex data (so I'm only posting the former to save on bloating this post).
The current color is used to achieve a per current window brightness effect. The variable msindex is already set to the number of model draw specifications before the loop featured begins. Vertex data and texture coordinate data for every model being drawn are actually all stuffed into one segment, and as you can see below there are glVertexPointer and glTexCoordPointer calls on different parts of the start of it to register this data. The contents of this segment are tightly packed, with three floats for the position of a vertex first and then three floats following for its texture coordinates. There is multitexturing (up to two textures specified much earlier in the model), but both textures share the same texture coordinates (which is why both calls to glTexCoordPointer specify the same location in memory). The while loop is meant to draw each individual specified model according to information for the model draw specification in the miaptr segment. Start is, in my code, the starting 6 float wide index into the overall vertex data segment for the first vertex of the model to be drawn, and count is the number of vertices. In my example case these are just 0 for start and 6 for count (attempted to draw one model with two triangles). Type can be multiple things depending on the model, but in this case it is GL_TRIANGLES. I've tried this with other primitive types, but they all suffer from the same problem. Additionally, the texture being drawn is entirely opaque (and green), the brightness of the target window is always 1, and all the primitives are front facing.
The following is my broken source code:
/* Enable/set global things. */
jgl.Viewport(
(GLint) x, (GLint) y, (GLsizei) width, (GLsizei) height
);
fvals[0] = (jWindowsptr + jGetCurrentWindow())->brightness;
jgl.Color4f(
(GLfloat) fvals[0],
(GLfloat) fvals[0],
(GLfloat) fvals[0],
1
);
jgl.Enable(GL_ALPHA_TEST);
jgl.Enable(GL_CULL_FACE);
jgl.CullFace(GL_BACK);
jgl.Enable(GL_DEPTH_TEST);
jgl.Enable(GL_POINT_SPRITE_ARB);
jgl.EnableClientState(GL_VERTEX_ARRAY);
const GLvoid *vaptrc = vaptr;
jgl.VertexPointer(3, GL_FLOAT, 12, vaptrc);
/* Color clearing is in here so I could see better while testing. */
jgl.Clear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
/* Enable/set per texture unit things. */
jgl.ActiveTexture(GL_TEXTURE0 + 1);
jgl.TexEnvi(
GL_POINT_SPRITE_ARB, GL_COORD_REPLACE_ARB, GL_TRUE
);
jgl.ClientActiveTexture(GL_TEXTURE0 + 1);
jgl.EnableClientState(GL_TEXTURE_COORD_ARRAY);
jgl.TexCoordPointer(3, GL_FLOAT, 12, (vaptrc + 3));
jgl.Enable(GL_TEXTURE_3D);
jgl.ActiveTexture(GL_TEXTURE0);
jgl.TexEnvi(
GL_POINT_SPRITE_ARB, GL_COORD_REPLACE_ARB, GL_TRUE
);
jgl.ClientActiveTexture(GL_TEXTURE0);
jgl.EnableClientState(GL_TEXTURE_COORD_ARRAY);
jgl.TexCoordPointer(3, GL_FLOAT, 12, (vaptrc + 3));
jgl.Enable(GL_TEXTURE_3D);
/* Pass #1. */
jgl.MatrixMode(GL_TEXTURE);
jgl.DepthFunc(GL_LESS);
jgl.AlphaFunc(GL_EQUAL, 1);
const GLfloat *tctm;
while (msindex > 0) {
msindex = msindex - 1;
jgl.ActiveTexture(GL_TEXTURE0);
jgl.BindTexture(
GL_TEXTURE_3D, (miaptr + msindex)->textureids[0]
);
if ((miaptr + msindex)->textureids[0] != 0) {
tctm
= (miaptr
+ msindex)->transformationmatrices[0];
jgl.LoadMatrixf(tctm);
}
jgl.ActiveTexture(GL_TEXTURE0 + 1);
jgl.BindTexture(
GL_TEXTURE_3D, (miaptr + msindex)->textureids[1]
);
if ((miaptr + msindex)->textureids[1] != 0) {
tctm
= (miaptr
+ msindex)->transformationmatrices[1];
jgl.LoadMatrixf(tctm);
}
jgl.DrawArrays(
(miaptr + msindex)->type,
(GLint) (miaptr + msindex)->start,
(GLsizei) (miaptr + msindex)->count
);
}
/* WIP */
/* Disable per texture unit things. */
jgl.ActiveTexture(GL_TEXTURE0 + 1);
jgl.ClientActiveTexture(GL_TEXTURE0 + 1);
jgl.DisableClientState(GL_TEXTURE_COORD_ARRAY);
jgl.Disable(GL_TEXTURE_3D);
jgl.ActiveTexture(GL_TEXTURE0);
jgl.ClientActiveTexture(GL_TEXTURE0);
jgl.DisableClientState(GL_TEXTURE_COORD_ARRAY);
jgl.Disable(GL_TEXTURE_3D);
/* WIP */
/* Disable global things. */
jgl.DisableClientState(GL_VERTEX_ARRAY);
jgl.Disable(GL_POINT_SPRITE_ARB);
jgl.Disable(GL_DEPTH_TEST);
jgl.Disable(GL_CULL_FACE);
jgl.Disable(GL_ALPHA_TEST);

Your description says that you have interleaved vertex attributes, with 3 floats for the position and 3 floats for the texture coordinates per vertex. This also matches the code you posted.
The values you pass as stride to the glVertexPointer() and glTexCoordPointer() does not match this, though. With 6 floats (3 for position + 3 for texture coordinate) per vertex, and a float being 4 bytes large, the stride should be 6 * 4 = 24. So all these calls need to use 24 for the stride:
jgl.VertexPointer(3, GL_FLOAT, 24, vaptrc);
jgl.TexCoordPointer(3, GL_FLOAT, 24, ...);

How to realize the DRAWING processing in Processing?

We all know how to draw a line in Processing.
But when we draw a line, the line is shown immediately.
What if i want to witness the drawing process, namely, to see the line moving forward, gradually completes a whole line.
Here's what i want to realize: to DRAW several lines and curves which finally turn into some pattern.
So how to make that happen? Using array?
Many thanks.

In processing all of the drawing happens in a loop. An easy way to create animated sequences like you describe is to use frameCount to drive it and using the modulus function % is a good way to create a loop. For example, to animate along the x axis:
void draw() {
float x = 50;
float y = 50;
float lineLength = 50;
int framesToAnimate = 60;
line(x,y,x+float(frameCount % framesToAnimate)/framesToAnimate*lineLength, y);
}
Note: strange things will happen if you don't cast / convert to a float
I use this pretty often to animate other features such as the color.
fill(color(127 + sin(float(frameCount)/90)*127, 0, 0, 127));
If you want to get more advanced, setting vectors and coordinates with PVector. There is a pretty good tutorial on Daniel Shiffman's site.

If you want to set your animation independent of frame rate, you can use mills() instead. That will return current time since the sketch started so you can set something to happen in a given time in seconds.
like for example:
long initialTime;
void setup(){
size(400,200);
initialTime = millis();
}
void draw() {
float x = 50;
float y = 50; //set the multiplier to adjust speed
line(x,y,x+(millis()-initialTime)*0.01, y); //10 px/sec
line(x,y+50,x+(millis()-initialTime)*0.05, y+50); //50 px/sec
line(x,y+100,x+(millis()-initialTime)*0.001, y+100); // 1 px/sec
}
There is also some animation libraries, i've seen some impressive results with some, but i never used them. Here a list.

How can this function be optimized? (Uses almost all of the processing power)

I'm in the process of writing a little game to teach myself OpenGL rendering as it's one of the things I haven't tackled yet. I used SDL before and this same function, while still performing badly, didn't go as over the top as it does now.
Basically, there is not much going on in my game yet, just some basic movement and background drawing. When I switched to OpenGL, it appears as if it's way too fast. My frames per second exceed 2000 and this function uses up most of the processing power.
What is interesting is that the program in it's SDL version used 100% CPU but ran smoothly, while the OpenGL version uses only about 40% - 60% CPU but seems to tax my graphics card in such a way that my whole desktop becomes unresponsive. Bad.
It's not a too complex function, it renders a 1024x1024 background tile according to the player's X and Y coordinates to give the impression of movement while the player graphic itself stays locked in the center. Because it's a small tile for a bigger screen, I have to render it multiple times to stitch the tiles together for a full background. The two for loops in the code below iterate 12 times, combined, so I can see why this is ineffective when called 2000 times per second.
So to get to the point, this is the evil-doer:
void render_background(game_t *game)
{
int bgw;
int bgh;
int x, y;
glBindTexture(GL_TEXTURE_2D, game->art_background);
glGetTexLevelParameteriv(GL_TEXTURE_2D, 0, GL_TEXTURE_WIDTH, &bgw);
glGetTexLevelParameteriv(GL_TEXTURE_2D, 0, GL_TEXTURE_HEIGHT, &bgh);
glBegin(GL_QUADS);
/*
* Start one background tile too early and end one too late
* so the player can not outrun the background
*/
for (x = -bgw; x < root->w + bgw; x += bgw)
{
for (y = -bgh; y < root->h + bgh; y += bgh)
{
/* Offsets */
int ox = x + (int)game->player->x % bgw;
int oy = y + (int)game->player->y % bgh;
/* Top Left */
glTexCoord2f(0, 0);
glVertex3f(ox, oy, 0);
/* Top Right */
glTexCoord2f(1, 0);
glVertex3f(ox + bgw, oy, 0);
/* Bottom Right */
glTexCoord2f(1, 1);
glVertex3f(ox + bgw, oy + bgh, 0);
/* Bottom Left */
glTexCoord2f(0, 1);
glVertex3f(ox, oy + bgh, 0);
}
}
glEnd();
}
If I artificially limit the speed by called SDL_Delay(1) in the game loop, I cut the FPS down to ~660 ± 20, I get no "performance overkill". But I doubt that is the correct way to go on about this.
For the sake of completion, these are my general rendering and game loop functions:
void game_main()
{
long current_ticks = 0;
long elapsed_ticks;
long last_ticks = SDL_GetTicks();
game_t game;
object_t player;
if (init_game(&game) != 0)
return;
init_player(&player);
game.player = &player;
/* game_init() */
while (!game.quit)
{
/* Update number of ticks since last loop */
current_ticks = SDL_GetTicks();
elapsed_ticks = current_ticks - last_ticks;
last_ticks = current_ticks;
game_handle_inputs(elapsed_ticks, &game);
game_update(elapsed_ticks, &game);
game_render(elapsed_ticks, &game);
/* Lagging stops if I enable this */
/* SDL_Delay(1); */
}
cleanup_game(&game);
return;
}
void game_render(long elapsed_ticks, game_t *game)
{
game->tick_counter += elapsed_ticks;
if (game->tick_counter >= 1000)
{
game->fps = game->frame_counter;
game->tick_counter = 0;
game->frame_counter = 0;
printf("FPS: %d\n", game->fps);
}
render_background(game);
render_objects(game);
SDL_GL_SwapBuffers();
game->frame_counter++;
return;
}
According to gprof profiling, even when I limit the execution with SDL_Delay(), it still spends about 50% of the time rendering my background.

Turn on VSYNC. That way you'll calculate graphics data exactly as fast as the display can present it to the user, and you won't waste CPU or GPU cycles calculating extra frames inbetween that will just be discarded because the monitor is still busy displaying a previous frame.

First of all, you don't need to render the tile x*y times - you can render it once for the entire area it should cover and use GL_REPEAT to have OpenGL cover the entire area with it. All you need to do is to compute the proper texture coordinates once, so that the tile doesn't get distorted (stretched). To make it appear to be moving, increase the texture coordinates by a small margin every frame.
Now down to limiting the speed. What you want to do is not to just plug a sleep() call in there, but measure the time it takes to render one complete frame:
function FrameCap (time_t desiredFrameTime, time_t actualFrameTime)
{
time_t delay = 1000 / desiredFrameTime;
if (desiredFrameTime > actualFrameTime)
sleep (desiredFrameTime - actualFrameTime); // there is a small imprecision here
}
time_t startTime = (time_t) SDL_GetTicks ();
// render frame
FrameCap ((time_t) SDL_GetTicks () - startTime);
There are ways to make this more precise (e.g. by using the performance counter functions on Windows 7, or using microsecond resolution on Linux), but I think you get the general idea. This approach also has the advantage of being driver independent and - unlike coupling to V-Sync - allowing an arbitrary frame rate.

At 2000 FPS it only takes 0.5 ms to render the entire frame. If you want to get 60 FPS then each frame should take about 16 ms. To do this, first render your frame (about 0.5 ms), then use SDL_Delay() to use up the rest of the 16 ms.
Also, if you are interested in profiling your code (which isn't needed if you are getting 2000 FPS!) then you may want to use High Resolution Timers. That way you could tell exactly how long any block of code takes, not just how much time your program spends in it.

Easy way to display a continuously updating image in C/Linux

I'm a scientist who is quite comfortable with C for numerical computation, but I need some help with displaying the results. I want to be able to display a continuously updated bitmap in a window, which is calculated from realtime data. I'd like to be able to update the image quite quickly (e.g. faster than 1 frame/second, preferably 100 fps). For example:
char image_buffer[width*height*3];//rgb data
initializewindow();
for (t=0;t<t_end;t++)
{
getdata(data);//get some realtime data
docalcs(image_buffer, data);//process the data into an image
drawimage(image_buffer);//draw the image
}
What's the easiest way to do this on linux (Ubuntu)? What should I use for initializewindow() and drawimage()?

If all you want to do is display the data (ie no need for a GUI), you might want to take a look at SDL: It's straight-forward to create a surface from your pixel data and then display it on screen.
Inspired by Artelius' answer, I also hacked up an example program:
#include <SDL/SDL.h>
#include <assert.h>
#include <stdint.h>
#include <stdlib.h>
#define WIDTH 256
#define HEIGHT 256
static _Bool init_app(const char * name, SDL_Surface * icon, uint32_t flags)
{
atexit(SDL_Quit);
if(SDL_Init(flags) < 0)
return 0;
SDL_WM_SetCaption(name, name);
SDL_WM_SetIcon(icon, NULL);
return 1;
}
static uint8_t * init_data(uint8_t * data)
{
for(size_t i = WIDTH * HEIGHT * 3; i--; )
data[i] = (i % 3 == 0) ? (i / 3) % WIDTH :
(i % 3 == 1) ? (i / 3) / WIDTH : 0;
return data;
}
static _Bool process(uint8_t * data)
{
for(SDL_Event event; SDL_PollEvent(&event);)
if(event.type == SDL_QUIT) return 0;
for(size_t i = 0; i < WIDTH * HEIGHT * 3; i += 1 + rand() % 3)
data[i] -= rand() % 8;
return 1;
}
static void render(SDL_Surface * sf)
{
SDL_Surface * screen = SDL_GetVideoSurface();
if(SDL_BlitSurface(sf, NULL, screen, NULL) == 0)
SDL_UpdateRect(screen, 0, 0, 0, 0);
}
static int filter(const SDL_Event * event)
{ return event->type == SDL_QUIT; }
#define mask32(BYTE) (*(uint32_t *)(uint8_t [4]){ [BYTE] = 0xff })
int main(int argc, char * argv[])
{
(void)argc, (void)argv;
static uint8_t buffer[WIDTH * HEIGHT * 3];
_Bool ok =
init_app("SDL example", NULL, SDL_INIT_VIDEO) &&
SDL_SetVideoMode(WIDTH, HEIGHT, 24, SDL_HWSURFACE);
assert(ok);
SDL_Surface * data_sf = SDL_CreateRGBSurfaceFrom(
init_data(buffer), WIDTH, HEIGHT, 24, WIDTH * 3,
mask32(0), mask32(1), mask32(2), 0);
SDL_SetEventFilter(filter);
for(; process(buffer); SDL_Delay(10))
render(data_sf);
return 0;
}

I'd recommend SDL too. However, there's a bit of understanding you need to gather if you want to write fast programs, and that's not the easiest thing to do.
I would suggest this O'Reilly article as a starting point.
But I shall boil down the most important points from a computations perspective.
Double buffering
What SDL calls "double buffering" is generally called page flipping.
This basically means that on the graphics card, there are two chunks of memory called pages, each one large enough to hold a screen's worth of data. One is made visible on the monitor, the other one is accessible by your program. When you call SDL_Flip(), the graphics card switches their roles (i.e. the visible one becomes program-accessible and vice versa).
The alternative is, rather than swapping the roles of the pages, instead copy the data from the program-accessible page to the monitor page (using SDL_UpdateRect()).
Page flipping is fast, but has a drawback: after page flipping, your program is presented with a buffer that contains the pixels from 2 frames ago. This is fine if you need to recalculate every pixel every frame.
However, if you only need to modify smallish regions on the screen every frame, and the rest of the screen does not need to change, then UpdateRect can be a better way (see also: SDL_UpdateRects()).
This of course depends on what it is you're computing and how you're visualising it. Analyse your image-generating code - maybe you can restructure it to get something more efficient out of it?
Note that if your graphics hardware doesn't support page flipping, SDL will gracefully use the other method for you.
Software/Hardware/OpenGL
This is another question you face. Basically, software surfaces live in RAM, hardware surfaces live in Video RAM, and OpenGL surfaces are managed by OpenGL magic.
Depending on your hardware, OS, and SDL version, programatically modifying the pixels of a hardware surface can involve a LOT of memory copying (VRAM to RAM, and then back!). You don't want this to happen every frame. In such cases, software surfaces work better. But then, you can't take advantage of double buffering, nor hardware accelerated blits.
Blits are block-copies of pixels from one surface to another. This works well if you want to draw a whole lot of identical icons on a surface. Not so useful if you're generating a temperature map.
OpenGL lets you do much more with your graphics hardware (3D acceleration for a start). Modern graphics cards have a lot of processing power, but it's kind of hard to use unless you're making a 3D simulation. Writing code for a graphics processor is possible but quite different to ordinary C.
Demo
Here's a quick demo SDL program that I made. It's not supposed to be a perfect example, and may have some portability problems. (I will try to edit a better program into this post when I get time.)
#include "SDL.h"
#include <assert.h>
#include <math.h>
/* This macro simplifies accessing a given pixel component on a surface. */
#define pel(surf, x, y, rgb) ((unsigned char *)(surf->pixels))[y*(surf->pitch)+x*3+rgb]
int main(int argc, char *argv[])
{
int x, y, t;
/* Event information is placed in here */
SDL_Event event;
/* This will be used as our "handle" to the screen surface */
SDL_Surface *scr;
SDL_Init(SDL_INIT_VIDEO);
/* Get a 640x480, 24-bit software screen surface */
scr = SDL_SetVideoMode(640, 480, 24, SDL_SWSURFACE);
assert(scr);
/* Ensures we have exclusive access to the pixels */
SDL_LockSurface(scr);
for(y = 0; y < scr->h; y++)
for(x = 0; x < scr->w; x++)
{
/* This is what generates the pattern based on the xy co-ord */
t = ((x*x + y*y) & 511) - 256;
if (t < 0)
t = -(t + 1);
/* Now we write to the surface */
pel(scr, x, y, 0) = 255 - t; //red
pel(scr, x, y, 1) = t; //green
pel(scr, x, y, 2) = t; //blue
}
SDL_UnlockSurface(scr);
/* Copies the `scr' surface to the _actual_ screen */
SDL_UpdateRect(scr, 0, 0, 0, 0);
/* Now we wait for an event to arrive */
while(SDL_WaitEvent(&event))
{
/* Any of these event types will end the program */
if (event.type == SDL_QUIT
|| event.type == SDL_KEYDOWN
|| event.type == SDL_KEYUP)
break;
}
SDL_Quit();
return EXIT_SUCCESS;
}

GUI stuff is a regularly-reinvented wheel, and there's no reason to not use a framework.
I'd recommend using either QT4 or wxWidgets. If you're using Ubuntu, GTK+ will suffice as it talks to GNOME and may be more comfortable to you (QT and wxWidgets both require C++).
Have a look at GTK+, QT, and wxWidgets.
Here's the tutorials for all 3:
Hello World, wxWidgets
GTK+ 2.0 Tutorial, GTK+
Tutorials, QT4

In addition to Jed Smith's answer, there are also lower-level frameworks, like OpenGL, which is often used for game programming. Given that you want to use a high frame rate, I'd consider something like that. GTK and the like aren't primarily intended for rapidly updating displays.

In my experience Xlib via MIT-SHM extension was significantly faster than SDL surfaces, not sure I used SDL in the most optimal way though.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to efficiently draw to plain win32 windows using Direct2D and GDI - c

Related

OpenGL - Is it efficient to call glBufferSubData (nearly) each frame?

Why is the third vertex of my first triangle always at the origin and the rest of my triangles not being drawn?

How to realize the DRAWING processing in Processing?

How can this function be optimized? (Uses almost all of the processing power)

Easy way to display a continuously updating image in C/Linux

Categories

Resources