Performance drop in SceneKit with custom Metal shader - scenekit

I have scene with 4000 objects (1000 objects are visible), all using same material (via assigning custom created SCNGeometry's firstMaterial property to same SCNMaterial object) running at 60FPS (1000 draw calls, 150k triangles, Metal flush ~12ms).
Now I want to change rendering my material. Using shader modifier all works fine, performance is the same, but I need to completely replace SceneKit's rendering, so I am using SCNProgram with Metal vertex and fragment shaders (tried both very basic and dumped SceneKit's default shaders). When I assign that program to my material everything works as expected, except huge performance drop - statistics shows 20FPS, 4000 draw calls, 600K triangles, Metal flush ~30ms.
So, it seems that some work done several times. Maybe some one have idea where's the root of problem can be?
EDIT:
It seems I narrowed problem. This affects both OpenGL and Metal shaders: I'm loosing frustum culling when using custom SCNProgram. In default SceneKit Metal shader I see:
typedef struct {
...
#ifdef USE_BOUNDINGBOX
float2x3 boundingBox;
#endif
...
} commonprofile_node;
Which is passed to vertex and fragment shader, but is not used anywhere. Since default shader does not have frustum culling code, then perhaps it should be done elsewhere.

Related

OpenGL Hooking -- Rendering to an arbitrarily-sized FBO

So I'm trying to make an OpenGL application render at a higher resolution than it normally would. I've already created a shared library that hooks most of the relevant GLX/OpenGL functions. Here's my current approach (at a high-level):
When my hooked SwapBuffers() is called
Unbind my FBO
Call the (original/unhooked) SwapBuffers()
Bind my FBO
Set the viewport to (0, 0, HIGH_RES_X, HIGH_RES_Y)
Set the scissor region to (0, 0, HIGH_RES_X, HIGH_RES_Y)
return
This approach doesn't seem to work for (most) applications. I suspect that is because some applications perform texture lookups (for screen-space operations) by dividing glFragCoord.xy by a uniform that represents their screen resolution (to convert from screen space to texture coordinates).
If resizing the output isn't possible, I wonder if it is possible to obtain the contents drawn onto the default framebuffer (i.e both the color and the depth buffer) without using glReadPixels. Ideally there would be a way to access this data in the form of a texture (so it's already on the GPU). I've heard things about Pixel Buffer Objects -- would using one of these prevent a pipeline stall?
The technique I proposed actually works for a handful of applications, but most of the time it just doesn't work.
If you just want to extract the color/depth buffer, you can either use a pool of PBOs with glReadPixels or use glBlitFramebuffer. The former is not an option if latency is a concern; the latter works fairly well (see this example from the reshade project)

Frame by frame animation using OpenGL and SDL

I am working in a game project that features a large amout of assets. The character animations are very detailed and that require a lot of frames to happen.
At first, I created large spritesheets containing all the animations for a specific character. It was working well on my PC but when I tested it on an Android tablet, I noticed it ecceeded the maximum texture dimension of its GPU. My solution was to break down the big spritesheet into individual frames (the worst case is 180 frames) and upload them individually to the GPU. Things now seem to be working everywhere I need it to work.
Right now, the largest animation I have been working with is a character with 180 frames with 407x725 pixels of width and height. However, as I couldn't find any orientation on the web regarding how to properly render 2D animations using OpenGL, I would like to ask if there is a problem with this approach. Is there a maximum number of textures that can be uploaded to the GPU? Can I exceed the amout of RAM of the GPU?
The most efficient method for the GPU is to pass the entire sprite sheet to opengl as a single texture, and select which frame you want by adjusting the texture coordinates when you draw. You should also pack the sprites into, ideally, a square texture. Reducing the overall amount of memory used by the GPU is very good for performance esp. on phones and tablets.
You want to avoid if possible frequently changing which texture is bound. Ideally you want to bind a single texture and then render bits and pieces of it to the screen until you don't need it anymore, then bind a different texture and continue.
The reason for this is that the GPU will try hard to optimize the operation of the pipeline it creates to handle the geometry you feed it, and the shaders you select. But when you make big changes to the configuration like changing what texture is bound or what shader is bound, that's necessarily going to be somewhat opaque to optimization. Feeding it more vertices and texture coordinates at a time is better because they basically can all get done in a batch without unloading and reloading resources etc.
However depending what cards you are targetting, you should keep in mind that there may be a maximum of 8192 x 8192 size of textures or something like this. So depending on what assets you have you may be forced to split them up across several textures.

Opengl Lighting and Normals

I'm currently experimenting with opengl and glut.
As i have like no idea what i'm doing i totally mess up with the lighting.
The complete compilable file can be found here: main.c
I have a display loop which currently operates like following:
glutDisplayFunc(also idle func):
glClear GL_COLOR_BUFFER_BIT and GL_DEPTH_BUFFER_BIT
switch to modelview matrix
load identity
first Rotate and then Translate according to my keyboard and mouse inputs for the camera
draw a ground with glNormal 0,1,0 and glMaterial on front and back,
which is encapsulated by push/popmatrix
pushmatrix
translate
glutSolidTeapod
popmatrix
do lighting related things, glEnable GL_LIGHTING and LIGHT0 and passing
float pos[] = {0.1,0.1,-0.1,0.1};
glLightfv( GL_LIGHT0, GL_POSITION, pos );
swap the buffers
the function associated with
glutReshapeFunc operates(this is from the lighthouse3d.com glut tutorial):
calculate the ratio of the screen
switch to projection matrix
loadidentity
set the viewport
set the perspective
switch to modelview matrix
However this all seems to work somehow,
but as i enable lighting, the normals seem to totally mess up.
My GL_LIGHT0 seems to stay as it should, as i can see the lightspot on the ground
not moving, as i move around
And the Teapods texture seem to move if i move my camera,
the teapod itself stands still.
Here is some visual material to explain it,
i apologize for my bad english : /
Link to YouTube video describing visually
You have a series of mistakes in your code:
You don't properly set the properties of your OpenGL window:
glutCreateWindow (WINTITLE);
glutInitDisplayMode (GLUT_RGB | GLUT_DOUBLE | GLUT_DEPTH);
The glutInitDisplayMode will only affect any windows you create after that. You should swap those two lines.
You never enable the depth test. You should add glEnable(GL_DEPTH_TEST) after you created the windows. Not using the depth test expalins the weird "see-through" effect you get with the teapot.
You have the following code
glEnable (GL_CULL_FACE | GL_CULL_FACE_MODE);
This is wrong in two ways: the GLenums are not single bits, but just values. You cannot OR them together and expect anything useful to happen. I don't know if this particular call will enable something you don't expect or just generate an error.
The second issue here is that GL_CULL_FACE_MODE isn't even a valid enum to enable.
In your case, you either skip the CULL_FACE completely, or you should write
glEnable (GL_CULL_FACE);
glFrontFace(GL_CW);
The latter call changes the face orientation from OpenGL's default counterclokcwise rule to the clockwise one, as the GLUT teapot is defined that way. Interestingly, your floor is also drawn following that rule, so it will fit for your whole scene. At least for now.
You have not fully understood how GL's state machine works. You draw the scene and then set the lighting. But this will not have an effect on the already drawn objects. It just affects the next time you draw something, which will be in the next frame here. Now, the lighting of the fixed function pipeline works in eye space. That means that if you want a light source which is located at a fixed position in the world, and not in a fixed position relativ to the camera, you have to re-specify the light position, with the updated modelview matrix, everytime the camera moves. In your case, the light source will lag behind one frame when the camera moves. This is probably not noticeable, but still wrong in principle. You should reorder your display() function to
glClear (GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
glMatrixMode (GL_MODELVIEW);
glLoadIdentity();
control (WALKSPEED, MOUSESPEED, mousein);
lightHandler();
drawhelpgrid(1, 20);
drawTeapod();
glutSwapBuffers();
With those changes, I can actually get the expected result of a lighted teapot on my system. But as I side note I feel obligded to warn you that almost all of your code relies on deprecated features of OpenGL. This stuff has been removed from modern versions of OpenGL. If you start learning OpenGL now, you should consider learning the modern programmable pipeline, and not some decades old obsolete stuff.

Determining if a polygon is inside the viewing frustum

here are my questions. I heard that opengl ignores the vertices which are outside the viewing frustum and doesn't consider them in rendering pipeline. Recently I ran into a same post that said you should check this your self and if a point is not inside, it is you duty to find out not opengl's! Now,
Is this true about opengl? does it understand if a point is not inside, and not to render it?
I am developing a grass scene which has about 4000 grasses on rectangles. I have awful FPS, and the only solution I came up was to decide which grasses are inside the viewport and then only render them! My question here is that what solution is best for me to find out which rectangle is not inside or which one is?
Please consider that my question is not about points mainly but about rectangles. Also I need to sort the grasses based on their distance, so it is better if native on client side memory.
Please let me know if there are any effective and real-time ways to find out if any given mesh is inside or outside the frustum. Thanks.
Even if is true then OpenGL does not show polygons outside the frustum ( as any other 3d engines ) it has to consider them to check if there are inside or not and then fps slow down. Usually some smart optimization algorithm is needed to avoid flooding the scene with invisible objects. Check for example BSP trees+PVS or Portals as a starting point.
To check if there is some bottleneck in the application, you can try with gDebugger. If nothing is reasonable wrong optimizing in order to draw just the PVS ( possible visible set ) is the way to go.
OpenGL won't render pixels ("fragments") outside your screen, so it has to clip somehow...
More precisely :
You submit your geometry
You make a Draw Call (glDrawArrays or glDrawElements)
Each vertex goes through the vertex shader, which computes the final position of the vertex in camera space. If you didn't write a vertex shader (=old opengl), the driver will create one for you.
The perspective division transforms these coordinates in Normalized Device Coordinates. Roughly, its means that the frustum of your camera is deformed to fit in a [-1,1]x[-1,1]x[-1,1] box
Everything outside this box is clipped. This can mean completely discarding a triangle, or subdivide it if it is across a clipping plane
Each remaining triangle is rasterized into fragments
Each fragment goes through the fragment shader
So basically, OpenGL knows how to clip, but each vertex still has to go through the vertex shader. So submitting your entire world will work, of course, but if you can find a way not to submit everything, your GPU will be happier.
This is a tradeoff, of course. If you spend 10ms checking each and every patch of grass on the CPU so that the GPU has only the minimal amount of data to draw, it's not a good solution either.
If you want to optimize grass, I suggest culling big patches (5m x 5m or so). It's standard AABB-frustum testing.
If you want to optimize a more generic model, you can investigate quadtree for "flat" models, octrees and bsp-trees for more complex objects.
Yes, OpenGL does not rasterize triangles outsize the viewing frustrum. But, this doesn't mean that this is optimal for applications: OpenGL implementation shall transform the vertex coordinate (by using fixed pipeline or vertex shaders), then, having the normalized coordinates it finally knows whether the triangle lie inside the viewing frustrum.
This mean that no pixel is rasterized in that cases, but the vertex data is processed all the same; simply doesn't produce fragments derived from a non visible triangle!
The OpenGL extension ARB_occlusion_query may help you, but in the discussion section make it clear:
Do occlusion queries make other visibility algorithms obsolete?
No.
Occlusion queries are helpful, but they are not a cure-all. They
should be only one of many items in your bag of tricks to decide
whether objects are visible or invisible. They are not an excuse
to skip frustum culling, or precomputing visibility using portals
for static environments, or other standard visibility techniques.
For the question regarding the mesh sorting on depth, you shall use the depth buffer: essentially the mesh fragment is effectively rendered only if its distance from the viewport is less than the previous fragment in the same position. This make you aware of sorting meshes. This buffer is essentially free, and it allows you to improve performances since it discard more far fragments.
Yes. Like others have pointed out, OpenGL has to perform a lot of per-vertex operations to determine if it is in the frustum. It must do this for every vertex you send it. In addition to the processing overhead that must take place, keep in mind that there is also additional overhead in the transmission of those vertices from the CPU to the GPU. You want to avoid sending information to the GPU that it isn't going to use. Though the bandwidth between the CPU and GPU is quite good on modern hardware, there's still a limit.
What you want is a Scene Graph. Scene graphs are frequently implemented with some kind of spatial partitioning scheme, e.g., Quadtrees, Octrees, BSPTrees, etc etc. Spatial partitioning allows you to intelligently determine what geometries are visible. Instead of doing this on a per-vertex basis (like OpenGL is forced to do) it can eliminate huge spatial subsets of geometry at a time. When rendering a complex scene, the performance savings can be enormous.

Recreating <BevelBitmapEffect> in a Pixel Shader/Other Method in WPF

Now that <BevelBitmapEffect> (amongst other effects) has been depreciated, I'm looking to see how I could re-create the exact same thing in a Shader Effect (including it's properties of BevelWidth, EdgeProfile, LightAngle, Relief and Smoothness).
I'm somewhat familar with pixel shading, mostly just colors manipulation of the whole image/element in Shazzam, but how to create a bevel elludes me. Is this a vertex shader and if so, how would I get started? I have searched high and low on this but can't seem to find an inkling of information that would allow me to get started in reproducing <BevelBitmapEffect> in a custom Effect.
Or, based on a comment below, is this 3D in WPF and if so, are there code libraries out there for recreating a <BevelBitmapEffect> that mimics the one that came with previous versions of WPF?
To create the bevel you need to know the distance from the edge for each pixel (search in all directions until alpha=0). From this you can calculate the normal then shade it (see silverlight example). As you mentioned there isn't much content about bevels but there are some good resources if you search for bump mapping/normal mapping to which the shading is similar. In particular this thread has a Silverlight example using a pre-calculated normal map.
To do everything in hardware ideally you would use a multipass shader, WPF's built-in effects are multipass but it doesn't allow you to write your own.
To workaround this limitation:
You could create multiple shaders and nest your element in multiple controls applying a different effect to each one.
Target WPF 4.0 and use Pixel Shader 3.0, for the increased instruction count. Although this may be a too high a hardware requirement and there is no software fallback for PS 3.0
Do some or all of the steps in software.
Without doing one of these you'd be lucky to do a 3 or 4 pixel bevel before you reach the instruction limit as the loops needed to find the distance increase the instruction count quickly.
New Sample
Download. Here is an example that uses PixelShader 3.0. It uses one shader to find the distance (aka height) to the edge, another (based on the nvidia phong shaders) is used to shade it. Bevel profiles are created by adjusting input height either with code or a custom profile can be used by supplying a special texture. There are some other features to add but it seems easily performant enough to animate the properties. Its lacking in comments but I can explain parts if needed.
There's a great article by Rod Stephens on DevX that shows how to use System.Drawing to create the WPF effects (the ones that used to exist, such as Bevel) and more. You've gotta register to view the article though, it's at http://www.devx.com/DevXNet/Article/45039. Downloadable source code too.

Resources