I am working in a game project that features a large amout of assets. The character animations are very detailed and that require a lot of frames to happen.
At first, I created large spritesheets containing all the animations for a specific character. It was working well on my PC but when I tested it on an Android tablet, I noticed it ecceeded the maximum texture dimension of its GPU. My solution was to break down the big spritesheet into individual frames (the worst case is 180 frames) and upload them individually to the GPU. Things now seem to be working everywhere I need it to work.
Right now, the largest animation I have been working with is a character with 180 frames with 407x725 pixels of width and height. However, as I couldn't find any orientation on the web regarding how to properly render 2D animations using OpenGL, I would like to ask if there is a problem with this approach. Is there a maximum number of textures that can be uploaded to the GPU? Can I exceed the amout of RAM of the GPU?
The most efficient method for the GPU is to pass the entire sprite sheet to opengl as a single texture, and select which frame you want by adjusting the texture coordinates when you draw. You should also pack the sprites into, ideally, a square texture. Reducing the overall amount of memory used by the GPU is very good for performance esp. on phones and tablets.
You want to avoid if possible frequently changing which texture is bound. Ideally you want to bind a single texture and then render bits and pieces of it to the screen until you don't need it anymore, then bind a different texture and continue.
The reason for this is that the GPU will try hard to optimize the operation of the pipeline it creates to handle the geometry you feed it, and the shaders you select. But when you make big changes to the configuration like changing what texture is bound or what shader is bound, that's necessarily going to be somewhat opaque to optimization. Feeding it more vertices and texture coordinates at a time is better because they basically can all get done in a batch without unloading and reloading resources etc.
However depending what cards you are targetting, you should keep in mind that there may be a maximum of 8192 x 8192 size of textures or something like this. So depending on what assets you have you may be forced to split them up across several textures.
Related
I am coding a modern OpenGL application to visualize 3d atomic models (molecules, periodic systems ...) for chemistry and condense matter physics.
I started to work on this few years ago, the first version of my program was in old OpenGL now I am updating it to modern OpenGL.
I come with a question regarding the quality of the rendering of the OpenGL window. In the following examples I draw 3D cylinders and 3D spheres using instanced drawing, in this model to render the bonds I only draw one cylinder, then I translate/scale/rotate it properly in the vertex shader
to render all bonds, same goes for the sphere to render the atoms.
As you can see it works just fine, and the efficiency of the method is amazing and I can render models with hundreds of thousand of atoms smoothly.
However I noticed something weird, that somehow the quality of the rendering seems to be dependent on the number of vertices (objects, atoms and bonds) in the scene, obviously the number of triangles is the most important parameter but not the only one ... since the quality decrease when a lot of vertices are rendered ... please see the attached snapshots:
To render the spheres in the scene I am using 50x50 vertices, and 2x50 for the cylinders (GL_TRIANGLE_STRIP in both cases)
1) In this test model I load: 96 atoms, 512 half bonds, : ~ 291200 vertices:
2) I zoom in to focus on one selected atom and it surrounding, at this scale the result is impeccable:
3) I reset the view and use the builder in my program to increase the number of boxes
(I am simply doing replicas in the 3 direction of space) here I choose to do 20x20x20 replicas,
see the result bellow, the original box is highlighted.
In that scene there are 768000 atoms, 4096000 half-bonds, and thus: 291200x20x20x20 = 2329600000 vertices
quite a lot, yet it works, but something weird appears ...
4) I zoom in again on that particular area of the model I picked before and there is a decrease in quality in particular
in the areas where 3D objects (spheres/cylinders) superimpose/overlap ...
Can somebody explain to me what I see ?
Note 1: In the same window I can decrease the number of replicas back to the original box, zoom again
and see that the result is back to impeccable.
Note 2: the older version of my program still works fine (old OpenGL, using display list with glutsphere and glutcylinders),
I can do the same things, the rendering will take much much longer, but at the end of the process when I zoom in on the 20x20x20
boxes model, the results remains perfect, like for the single box model, and obviously I use same graphic card, driver and else.
Can somebody explain to me what I see ?
You're seeing the limited precision of the depth buffer. There are only so many bits you can work with and in a perspective projection a lonlinear scaling from Z distance to depth buffer value is applied.
The best course of action is to limit the near/depth range of the perspective projection matrix to what's going to be actually visible on screen, to make better use of the depth buffer. Also it's possible to linearize the depth buffer (but that comes with a performance hit). Also you could try to cleanly intersect the geometry where sticks and spheres meet, i.e. constrain the sphere's vertices to the cylinder surface where the sticks and similarly constrain the sticks' end vertices to the sphere where they meet. That way you avoid overlap and hence these artifacts.
I'm wondering witch way is more efficient:
modify RGB pixels data from a surface sized as the window, crate a texture from this surface then copy it on the render.
Or (what I use)
SDL_SetRenderDrawColor + SDL_SetRenderDrawPoint directly in the double buffered Renderer, driven by a buffer array
I would prefer the first solution, but I would like to be sure before testing.
Thanks if you know SDL :)
When it comes to per-pixel things, manipulating an SDL_Surface's pixels is generally a little way faster than using SDL_RenderDrawPoint(). Of course, if that's just a pixel or two, it won't make a big difference, but filling an entire window could be slow. Turning this surface into a texture could take a little time, but not as much(On my computer, it adds about 2 ms per frames).
However, for as much as I know, your best option is to access the pixels of the screen's surface(SDL_GetWindowSurface()), and then to use SDL_UpdateWindowSurface().
I believe the SDL_RenderDrawPoint() slowdown is due to the extra time the CPU has to take to pass the pixels to the GPU(Maybe a software SDL_Renderer would act faster?).
Hope this helps.
here are my questions. I heard that opengl ignores the vertices which are outside the viewing frustum and doesn't consider them in rendering pipeline. Recently I ran into a same post that said you should check this your self and if a point is not inside, it is you duty to find out not opengl's! Now,
Is this true about opengl? does it understand if a point is not inside, and not to render it?
I am developing a grass scene which has about 4000 grasses on rectangles. I have awful FPS, and the only solution I came up was to decide which grasses are inside the viewport and then only render them! My question here is that what solution is best for me to find out which rectangle is not inside or which one is?
Please consider that my question is not about points mainly but about rectangles. Also I need to sort the grasses based on their distance, so it is better if native on client side memory.
Please let me know if there are any effective and real-time ways to find out if any given mesh is inside or outside the frustum. Thanks.
Even if is true then OpenGL does not show polygons outside the frustum ( as any other 3d engines ) it has to consider them to check if there are inside or not and then fps slow down. Usually some smart optimization algorithm is needed to avoid flooding the scene with invisible objects. Check for example BSP trees+PVS or Portals as a starting point.
To check if there is some bottleneck in the application, you can try with gDebugger. If nothing is reasonable wrong optimizing in order to draw just the PVS ( possible visible set ) is the way to go.
OpenGL won't render pixels ("fragments") outside your screen, so it has to clip somehow...
More precisely :
You submit your geometry
You make a Draw Call (glDrawArrays or glDrawElements)
Each vertex goes through the vertex shader, which computes the final position of the vertex in camera space. If you didn't write a vertex shader (=old opengl), the driver will create one for you.
The perspective division transforms these coordinates in Normalized Device Coordinates. Roughly, its means that the frustum of your camera is deformed to fit in a [-1,1]x[-1,1]x[-1,1] box
Everything outside this box is clipped. This can mean completely discarding a triangle, or subdivide it if it is across a clipping plane
Each remaining triangle is rasterized into fragments
Each fragment goes through the fragment shader
So basically, OpenGL knows how to clip, but each vertex still has to go through the vertex shader. So submitting your entire world will work, of course, but if you can find a way not to submit everything, your GPU will be happier.
This is a tradeoff, of course. If you spend 10ms checking each and every patch of grass on the CPU so that the GPU has only the minimal amount of data to draw, it's not a good solution either.
If you want to optimize grass, I suggest culling big patches (5m x 5m or so). It's standard AABB-frustum testing.
If you want to optimize a more generic model, you can investigate quadtree for "flat" models, octrees and bsp-trees for more complex objects.
Yes, OpenGL does not rasterize triangles outsize the viewing frustrum. But, this doesn't mean that this is optimal for applications: OpenGL implementation shall transform the vertex coordinate (by using fixed pipeline or vertex shaders), then, having the normalized coordinates it finally knows whether the triangle lie inside the viewing frustrum.
This mean that no pixel is rasterized in that cases, but the vertex data is processed all the same; simply doesn't produce fragments derived from a non visible triangle!
The OpenGL extension ARB_occlusion_query may help you, but in the discussion section make it clear:
Do occlusion queries make other visibility algorithms obsolete?
No.
Occlusion queries are helpful, but they are not a cure-all. They
should be only one of many items in your bag of tricks to decide
whether objects are visible or invisible. They are not an excuse
to skip frustum culling, or precomputing visibility using portals
for static environments, or other standard visibility techniques.
For the question regarding the mesh sorting on depth, you shall use the depth buffer: essentially the mesh fragment is effectively rendered only if its distance from the viewport is less than the previous fragment in the same position. This make you aware of sorting meshes. This buffer is essentially free, and it allows you to improve performances since it discard more far fragments.
Yes. Like others have pointed out, OpenGL has to perform a lot of per-vertex operations to determine if it is in the frustum. It must do this for every vertex you send it. In addition to the processing overhead that must take place, keep in mind that there is also additional overhead in the transmission of those vertices from the CPU to the GPU. You want to avoid sending information to the GPU that it isn't going to use. Though the bandwidth between the CPU and GPU is quite good on modern hardware, there's still a limit.
What you want is a Scene Graph. Scene graphs are frequently implemented with some kind of spatial partitioning scheme, e.g., Quadtrees, Octrees, BSPTrees, etc etc. Spatial partitioning allows you to intelligently determine what geometries are visible. Instead of doing this on a per-vertex basis (like OpenGL is forced to do) it can eliminate huge spatial subsets of geometry at a time. When rendering a complex scene, the performance savings can be enormous.
if i put everything in viewbox container then my wpf apps will be resolution independent or do i need to do anything else. please help with concept.
Scale elements accordingly to the available screen or medium size
If your desire is, to allways fill some room of the screen or output device, independently of the metrics, using the viewbox is a good choice. If you have a big monitor, you will have a big element, if you have a small paper, you will have a small print out of the same element.
With the Stretch-property of an image you have a similar possibility only for pictures.
Make elements on every device equaly sized
WPF is designed "resolution independent". The goal of this resolution indepency is, that if you design an element to be 15 inches, then it will be on every output medium this 15 inches, independently of the resolution of your output device. Calculaction and specification of dimensions is done in "device independent pixels" (DIP) which you can convert to centimeters or inches without having specific knowledge about the output devices resolution.
96DIP == 1inch == 2.54cm;
1 inch == 96DPI;
1 cm == 37.8DIP;
If want to use this resolution indepency, you can set fixed values (in DIPs) to your elements. On a large monitor then your element then maybe only uses a small part (e.G. 15inches), and on a small monitor it fills the whole screen (also 15inches).
WPF is resolution independent without any extra tricks at all. If you host legacy controls (non-WPF controls) then this may break for them, but WPF itself is resolutions independent and vector based.
Viewbox has nothing to do with resolution independence.
Resolution independence means, controls you specify can be drawn on different resolutions while keeping scale. So you can use display that has 10x bigger density of points, but controls will still look same to you.
And like it was said, WPF itself was designed with this in mind, you dont have to do anything.
I am writing a text renderer for an OpenGL application. Size, colour, font face, and anti-aliasing can be twiddled at run time (and so multiple font faces can appear on the screen at once). There are too many combinations to allocate one texture to each combination of string and attributes. However, only a small subset of the entire database of strings will be on the screen at any given time.
This leads me into the opportunity to create a cache for the strings that are being printed frame after frame. It has been mandated that I use only one texture for the entire operation, as creating a cache of many textures would incur a texture swapping penalty for every different string printed from the cache.
So I have before me a 2048x2048 texture, into which I can place whatever strings I can fit as they are being requested by the application for caching purposes. I have quickly realized that tracking the free space available in a two dimensional space is not trivial.
I have been looking at things like Best Fit and Next fit, but those seem to be suitable for 1d spaces.
How can I manage this cache texture in OpenGL?
Edit: I have since learned that this is an instance of a "2d packing problem".
What you have is the bin-packing problem.
Bad news first: It's NP-hard, so it's worth to find the optimal solution.
I've done such texture-caching for fonts as well. I didn't cached entire words but just the glyph images. That makes things a lot easier because all your images are roughly square-shaped. A simple grid based approach to keep track of the texture-memory worked pretty good.
In case I got glyphs that are larger than one of my grid-boxes I just allocated two or more boxes using brute force search (it didn't happend that often). In case I didn't found any suitable block I just randomly removed some glyphs from the cache to make free space.
That was much easier than keeping things in a last recently used cache and performed nearly as good.
Btw - you will always have some waste on texture memory for such a cache. Unless you're very tight on memory that shouldn't be a problem. You should use a small texture-format (8 bit alpha works well for fonts).
Also: If you make your grid-blocks a multiple of 8 pixels, and you can drop your antialiasing to 4 bits you can compress the glyphs into one of the compressed DXT or S3TC formats on the fly. The wasted texture-space becomes a non-issue that way.
If you are short on texture memory you could take a look at "Distance Field" or "Signed Distance Field" font rendering technique. You could use 512x512 texture per font family and you could render perfectly antialiased text of any size.
For that algorithm you need to generate a special texture, which contains distance from the texel to the edge of the texture. Take a look at original paper by Valve guys: http://www.valvesoftware.com/publications/2007/SIGGRAPH2007_AlphaTestedMagnification.pdf . There are some frameworks which utilize this. For instance latest version of Qt uses signed distance field for text rendering.
I have opted to use a simple approach. Divide the texture into variable height rows. The first texture to be placed in a row decides the height of the row. If a texture can fit into an existing row by height, check to see if there is enough width remaining and place it there. Otherwise start a new row. If a new row cannot be started, do not cache the string.