SDL2 [optimisation]: is SDL_Surface->texture->copy faster than drawing on renderer? - c

I'm wondering witch way is more efficient:
modify RGB pixels data from a surface sized as the window, crate a texture from this surface then copy it on the render.
Or (what I use)
SDL_SetRenderDrawColor + SDL_SetRenderDrawPoint directly in the double buffered Renderer, driven by a buffer array
I would prefer the first solution, but I would like to be sure before testing.
Thanks if you know SDL :)

When it comes to per-pixel things, manipulating an SDL_Surface's pixels is generally a little way faster than using SDL_RenderDrawPoint(). Of course, if that's just a pixel or two, it won't make a big difference, but filling an entire window could be slow. Turning this surface into a texture could take a little time, but not as much(On my computer, it adds about 2 ms per frames).
However, for as much as I know, your best option is to access the pixels of the screen's surface(SDL_GetWindowSurface()), and then to use SDL_UpdateWindowSurface().
I believe the SDL_RenderDrawPoint() slowdown is due to the extra time the CPU has to take to pass the pixels to the GPU(Maybe a software SDL_Renderer would act faster?).
Hope this helps.

Related

Frame by frame animation using OpenGL and SDL

I am working in a game project that features a large amout of assets. The character animations are very detailed and that require a lot of frames to happen.
At first, I created large spritesheets containing all the animations for a specific character. It was working well on my PC but when I tested it on an Android tablet, I noticed it ecceeded the maximum texture dimension of its GPU. My solution was to break down the big spritesheet into individual frames (the worst case is 180 frames) and upload them individually to the GPU. Things now seem to be working everywhere I need it to work.
Right now, the largest animation I have been working with is a character with 180 frames with 407x725 pixels of width and height. However, as I couldn't find any orientation on the web regarding how to properly render 2D animations using OpenGL, I would like to ask if there is a problem with this approach. Is there a maximum number of textures that can be uploaded to the GPU? Can I exceed the amout of RAM of the GPU?
The most efficient method for the GPU is to pass the entire sprite sheet to opengl as a single texture, and select which frame you want by adjusting the texture coordinates when you draw. You should also pack the sprites into, ideally, a square texture. Reducing the overall amount of memory used by the GPU is very good for performance esp. on phones and tablets.
You want to avoid if possible frequently changing which texture is bound. Ideally you want to bind a single texture and then render bits and pieces of it to the screen until you don't need it anymore, then bind a different texture and continue.
The reason for this is that the GPU will try hard to optimize the operation of the pipeline it creates to handle the geometry you feed it, and the shaders you select. But when you make big changes to the configuration like changing what texture is bound or what shader is bound, that's necessarily going to be somewhat opaque to optimization. Feeding it more vertices and texture coordinates at a time is better because they basically can all get done in a batch without unloading and reloading resources etc.
However depending what cards you are targetting, you should keep in mind that there may be a maximum of 8192 x 8192 size of textures or something like this. So depending on what assets you have you may be forced to split them up across several textures.

Displaying pixel data with GDI+

I am writing a simple 3D rendering engine.
The end result of my 3D processing is pixel data. Next I need to display it on the screen with GDI+.
I am using WinForms and Visual Basic. I am drawing directly on form's ClientRectangle.
I have some questions.
After I process a pixel, should I be writing pixel data to a buffer first, instead of sending each pixel to GDI+ individually?
- If so, how much of a screen should I buffer at one time? Full screen, half, quarter, eighth? I think there may be RAM usage / performance trade-offs here.
- What is the best data structure for the pixel buffer?
- Which GDI+ command do I use to render the pixel buffer (or the individual pixel)? Is it possible to avoid creating the bitmap as an intermediate step and send pixel data directly to screen?
Maximum screen size I anticipate is 1600x1200. RAM could be as low as 1GB.
Thanks.
Hope you can find some of those answers here
Write the data into a buffer of RGBA structs first. This will make it easy if, for example you want to render multiple "layers" and then composite those as well. It will also make it easy if you want to perform any deferred processing at some point. Once a full (tile?) render is complete, you can flush it to the output bitmap/file.
This depends on what resolutions you allow the user to render to. If you want to render gigapixel images, you will need to tile it at some reasonable size. I would recommend that the tile size be configurable and then you can set it at a reasonable default after testing.
I would recommend starting out with a simple RGBA buffer if you're not looking to perform any deferred shading.
If you are NOT performing tiled rendering/rendering images that can fit in memory, you can simply use Bitmap.LockBits and write the data that way. If you are using tiled rendering, you will need to either find a library that allows you to render a scanline at a time (and make that a "tile") or fix the file format you want to write TGA, PNG and seek/write directly to the file. Dumping the image as a RAW file and then using a command-line tool to convert it would also be another option.
Hope this helps!

Determining if a polygon is inside the viewing frustum

here are my questions. I heard that opengl ignores the vertices which are outside the viewing frustum and doesn't consider them in rendering pipeline. Recently I ran into a same post that said you should check this your self and if a point is not inside, it is you duty to find out not opengl's! Now,
Is this true about opengl? does it understand if a point is not inside, and not to render it?
I am developing a grass scene which has about 4000 grasses on rectangles. I have awful FPS, and the only solution I came up was to decide which grasses are inside the viewport and then only render them! My question here is that what solution is best for me to find out which rectangle is not inside or which one is?
Please consider that my question is not about points mainly but about rectangles. Also I need to sort the grasses based on their distance, so it is better if native on client side memory.
Please let me know if there are any effective and real-time ways to find out if any given mesh is inside or outside the frustum. Thanks.
Even if is true then OpenGL does not show polygons outside the frustum ( as any other 3d engines ) it has to consider them to check if there are inside or not and then fps slow down. Usually some smart optimization algorithm is needed to avoid flooding the scene with invisible objects. Check for example BSP trees+PVS or Portals as a starting point.
To check if there is some bottleneck in the application, you can try with gDebugger. If nothing is reasonable wrong optimizing in order to draw just the PVS ( possible visible set ) is the way to go.
OpenGL won't render pixels ("fragments") outside your screen, so it has to clip somehow...
More precisely :
You submit your geometry
You make a Draw Call (glDrawArrays or glDrawElements)
Each vertex goes through the vertex shader, which computes the final position of the vertex in camera space. If you didn't write a vertex shader (=old opengl), the driver will create one for you.
The perspective division transforms these coordinates in Normalized Device Coordinates. Roughly, its means that the frustum of your camera is deformed to fit in a [-1,1]x[-1,1]x[-1,1] box
Everything outside this box is clipped. This can mean completely discarding a triangle, or subdivide it if it is across a clipping plane
Each remaining triangle is rasterized into fragments
Each fragment goes through the fragment shader
So basically, OpenGL knows how to clip, but each vertex still has to go through the vertex shader. So submitting your entire world will work, of course, but if you can find a way not to submit everything, your GPU will be happier.
This is a tradeoff, of course. If you spend 10ms checking each and every patch of grass on the CPU so that the GPU has only the minimal amount of data to draw, it's not a good solution either.
If you want to optimize grass, I suggest culling big patches (5m x 5m or so). It's standard AABB-frustum testing.
If you want to optimize a more generic model, you can investigate quadtree for "flat" models, octrees and bsp-trees for more complex objects.
Yes, OpenGL does not rasterize triangles outsize the viewing frustrum. But, this doesn't mean that this is optimal for applications: OpenGL implementation shall transform the vertex coordinate (by using fixed pipeline or vertex shaders), then, having the normalized coordinates it finally knows whether the triangle lie inside the viewing frustrum.
This mean that no pixel is rasterized in that cases, but the vertex data is processed all the same; simply doesn't produce fragments derived from a non visible triangle!
The OpenGL extension ARB_occlusion_query may help you, but in the discussion section make it clear:
Do occlusion queries make other visibility algorithms obsolete?
No.
Occlusion queries are helpful, but they are not a cure-all. They
should be only one of many items in your bag of tricks to decide
whether objects are visible or invisible. They are not an excuse
to skip frustum culling, or precomputing visibility using portals
for static environments, or other standard visibility techniques.
For the question regarding the mesh sorting on depth, you shall use the depth buffer: essentially the mesh fragment is effectively rendered only if its distance from the viewport is less than the previous fragment in the same position. This make you aware of sorting meshes. This buffer is essentially free, and it allows you to improve performances since it discard more far fragments.
Yes. Like others have pointed out, OpenGL has to perform a lot of per-vertex operations to determine if it is in the frustum. It must do this for every vertex you send it. In addition to the processing overhead that must take place, keep in mind that there is also additional overhead in the transmission of those vertices from the CPU to the GPU. You want to avoid sending information to the GPU that it isn't going to use. Though the bandwidth between the CPU and GPU is quite good on modern hardware, there's still a limit.
What you want is a Scene Graph. Scene graphs are frequently implemented with some kind of spatial partitioning scheme, e.g., Quadtrees, Octrees, BSPTrees, etc etc. Spatial partitioning allows you to intelligently determine what geometries are visible. Instead of doing this on a per-vertex basis (like OpenGL is forced to do) it can eliminate huge spatial subsets of geometry at a time. When rendering a complex scene, the performance savings can be enormous.

Low level C - Display text, pixel by pixel

I am working on a small project where I have to write a low level app. I'd like to display text in that app, and I would even like it to be anti aliased (à la ClearType). No libraries allowed, I have to draw each char pixel by pixel.
What is the best way to do this? Can you recommend some known algorithms? How should I store/read the fonts?
Thanks!
You mean you just want to smooth the edges of an existing bitmapped font? This is easy if your original font is 16x32 and you want to render it at 8x16 or something like that, but if you don't have a higher-resolution bitmap to begin with, smoothing is a highly nontrivial operation involving a lot of guesswork. In that case, I would lookup the 2xsai algorithm (which gives visually-pleasing results for this kind of thing) and first perform it to upscale the font to double resolution, then scale it back down with a area-averaging algorithm (i.e. take each destination pixel from the average of a 4-pixel square).
I would also recommend saving your final "anti-aliased" bitmap font and simply using it in your program, rather than performing all this work at runtime.
Putting all together:
There are two main types of fonts:
1) Monospaced: all the characters have fixed size, and you define a bitmap for each. No need for Anti Aliasing (you can hardcode the grey levels in the bitmap). Look horrible when resized.
2) True Type: each letter is defined by a set of parameters for Bezier curves. Can be easily scaled to any size, but requires lots of program logic (and processing power!) for that. Anti Aliasing is useful here (and especially the sub-pixel rendering techniquies).
As I see you want to use bitmapped font and rescaling? You could just precompute several of them, thus avoiding complex runtime logic.
As R. suggested, keeping the bitmaps at higher resolution in greyscale instead of BW will help. I'd suggest using size that is divisible by most small numbers, so that the bitmap can be downscaled easily. Also, if this resolution is high enough, then you can keep it in BW and downscale to greyscale (using surface integral).
EDIT: feel free to edit it and please don't vote. Just put all those commentaries together.
It is hard to build a good font engine, especially if you need to do scaling and anti-aliasing. So I suggest you take the easy path:
Decide on the fonts and sizes you want to use.
Generate a bitmap font for every font/size combination you need to use. This can be done with a tool like Bitmap Font Generator.
Use the bitmap fonts in your program. Blitting bitmaps should be relatively easy.
If you want more features, I suggest you look into using an engine like FreeType before trying to make your own solution.
Well, reading a TTF (or any other) font and rendering some glyph into the bitmap isnt that hard, given you know some stuff about rasterization and bezier curves. The bad point is that if you want the text to look good, it's gonna take a huge amount of code. Aliased font is pretty hard to render, I'm not talking about hinting. There needs to be a routine for kerning, multi-character sequences, something that decides which glyphs map to your characters and also encoding stuff, ...
You might want to use a bitmap font, which comes pre-rendered - then the whole rendering operation is a simple image copy, eventually with some resampling or rotation; but well, you lose the vector font features.
My advice is to take FreeType and live with it, it's a nice library just for this, and can be statically linked and stripped of unnecessary bloat very easily.

How can I manage a cache texture in OpenGL?

I am writing a text renderer for an OpenGL application. Size, colour, font face, and anti-aliasing can be twiddled at run time (and so multiple font faces can appear on the screen at once). There are too many combinations to allocate one texture to each combination of string and attributes. However, only a small subset of the entire database of strings will be on the screen at any given time.
This leads me into the opportunity to create a cache for the strings that are being printed frame after frame. It has been mandated that I use only one texture for the entire operation, as creating a cache of many textures would incur a texture swapping penalty for every different string printed from the cache.
So I have before me a 2048x2048 texture, into which I can place whatever strings I can fit as they are being requested by the application for caching purposes. I have quickly realized that tracking the free space available in a two dimensional space is not trivial.
I have been looking at things like Best Fit and Next fit, but those seem to be suitable for 1d spaces.
How can I manage this cache texture in OpenGL?
Edit: I have since learned that this is an instance of a "2d packing problem".
What you have is the bin-packing problem.
Bad news first: It's NP-hard, so it's worth to find the optimal solution.
I've done such texture-caching for fonts as well. I didn't cached entire words but just the glyph images. That makes things a lot easier because all your images are roughly square-shaped. A simple grid based approach to keep track of the texture-memory worked pretty good.
In case I got glyphs that are larger than one of my grid-boxes I just allocated two or more boxes using brute force search (it didn't happend that often). In case I didn't found any suitable block I just randomly removed some glyphs from the cache to make free space.
That was much easier than keeping things in a last recently used cache and performed nearly as good.
Btw - you will always have some waste on texture memory for such a cache. Unless you're very tight on memory that shouldn't be a problem. You should use a small texture-format (8 bit alpha works well for fonts).
Also: If you make your grid-blocks a multiple of 8 pixels, and you can drop your antialiasing to 4 bits you can compress the glyphs into one of the compressed DXT or S3TC formats on the fly. The wasted texture-space becomes a non-issue that way.
If you are short on texture memory you could take a look at "Distance Field" or "Signed Distance Field" font rendering technique. You could use 512x512 texture per font family and you could render perfectly antialiased text of any size.
For that algorithm you need to generate a special texture, which contains distance from the texel to the edge of the texture. Take a look at original paper by Valve guys: http://www.valvesoftware.com/publications/2007/SIGGRAPH2007_AlphaTestedMagnification.pdf . There are some frameworks which utilize this. For instance latest version of Qt uses signed distance field for text rendering.
I have opted to use a simple approach. Divide the texture into variable height rows. The first texture to be placed in a row decides the height of the row. If a texture can fit into an existing row by height, check to see if there is enough width remaining and place it there. Otherwise start a new row. If a new row cannot be started, do not cache the string.

Resources