glVertexAttribPointer, interleaved elements and performance / cache friendliness - c

So, in the course of writing a model loader for a 3D scene I'm working on, I've decided to pack the vertex, texture and normal data like so:
VVVVTTTNNN
for each vertex, where V = vertex coordinate, T = UV coordinate, and N = normal coordinate. When I pass this data on to the vertex shader for my scene, I make three glVertexAttribPointer calls, like so:
glVertexAttribPointer(ATTRIB_VERTEX, 4, GL_FLOAT, 0, 10, group->vertices.data);
glEnableVertexAttribArray(ATTRIB_VERTEX);
glVertexAttribPointer(ATTRIB_NORMAL, 3, GL_FLOAT, 0, 10, group->normals.data);
glEnableVertexAttribArray(ATTRIB_NORMAL);
glVertexAttribPointer(ATTRIB_UV_COORDINATES, 3, GL_FLOAT, 0, 10, group->uvcoordinates.data);
glEnableVertexAttribArray(ATTRIB_UV_COORDINATES);
Each of the group pointers being passed refer to the beginning position in the shared vertex data block where that vertex type starts:
group->vertices.data == data
group->uvcoordinates.data == &data[4]
group->normals.data == &data[7]
Part of the reason for me interleaving this data was to program for cache friendliness and minimize data being sent to the card. ( NOTE: This is not for a realistic performance bottleneck. I'm investigating the optimization because I want to learn more about programming to address these sort of concerns. ) However, for the life of me, I can't imagine how GL would be able to infer that the 3 different pointers refer to offset positions within the same larger data block, and thereby make the necessary optimization to avoid copying the data once it has already been copied. Furthermore, since I'm only ensuring data locality in system memory ( and don't really have any guarantees on how that data is going to be organized on the GPU ), I'm only really optimizing for the case where I access any of these vertices outside of GL. Is that right? Are these optimizations mostly useless, or will providing data in this manner help minimize the data transfer to the GPU / prevent cache misses when iterating over vertex data in the vertex shader?

OpenGL is just an API, the intelligence lies in the driver. Anyway the problem is actually rather simple to implement: For every Vertex Attribute you got a starting memory address and when calling glDrawArrays or glDrawElements one looks for the largest index found. That defines the upper bound of the range.
Then you sort the vertex attributes starting addresses and for each address check if it range overlaps with any other vertex attribute range. You find the contiguous regions and copy those.
In the case of Vertex Buffer Objects it's even simpler since you already copied stuff to OpenGL ready for processing.

Related

Adding up buffers into one

I have a mesh consisting of several entries.
Every entry contains it's own list of faces, vertices, normals, colors and texture coordinates.
Can I loop though all of my entries and use glVertexAttribPointer to cummulate data of an attribute in a single buffer object, like this?:
glBindBuffer(vbo);
for(Entry* e : entries) {
glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 0, e->vertices);
...
}
In other words, will repeated calls on glVertexAttribPointer for attribute 0 of buffer vbo rewrite the data pointed on before or not?
If yes, is there any effective solution out of copying all vertices into one consecutive memory block before calling glVertexAttribPointer only once for the whole buffer?
glVertexAttribPointer does only store (for each attribute) the last information you supplied to it. So appending buffers is not possible by this method.
You have two options when you have a situation like yours:
Issue for each buffer a separate draw-call
Copy the data off all buffers into a single buffer and issue one draw-call for it. Note that in this case the indices might have to be adjusted to point to the correct positions in the combined buffer.
glVertexAttribPointer() does not copy anything. It only sets state that specifies where the vertex data will be fetched from. If you call it repeatedly for the same attribute, each call will replace the previous state, and the last one wins.
Starting with OpenGL 3.1, there is a glCopyBufferSubData() call (man page) that allows you to copy data from one buffer to another. Using this, you could allocate a buffer with enough space for all verctices, and then copy the smaller buffers into the buffer holding all vertices.
That being said, it does not sound like a great idea to use it this way. If you want all vertices in the same buffer, it's much easier and more efficient to store them in that buffer right from the start.
You definitely should not copy around the vertex data on each draw call. While reducing the number of draw calls is desirable, copying around vertex data is much more expensive.

How to pass variable length float array to GPUImageFilter Shader?

I want to pass my touch points to GPUImage (iOS)
The Point can be translate to float array, the length of the array is variable length.
But I must direct the length of array in shader.
Disclaimer: not a glsl expert
AFAIk you can't have variable length arrays like what you want. This is a GLSL limitation, not GPUImage so it's not a quick fix- the work you'll be doing will be with textures or glsl, not GPUImage.
Here's another stack overflow post about glsl: GLSL indexing into uniform array with variable length
There's two solutions that could work:
1) Limit the number of points. It's reasonable to limit touches but in practice may be hard to narrow them down if there's too many. You could pass these points in to a fixed length array or as individual constants (one for each point). If you really care about scalability with the number of points this isn't a great method because in your shader you'll have to do check each of these points and perform the relevant computation, which could be expensive when performed for the entire image (again, depending on your use case). If for each pixel you're checking a distance to point, this could be too expensive.
2) Input your points in a texture. You can either have 2 1D textures with the x&y coordinates and then treat them like an array (then go to option 1), or you can create a 2D texture, all 0, and set parts to 1 where there are touches. The 2D texture can have a lower resolution than the actual screen. This method could be a lot less work for the shader if you're doing something simple like turning finger touches black.
Your choice depends largely on what you're doing with the points in the shader.

What's the difference between TMU and openGL's GL_TEXTUREn?

I can't quite understand what's the difference.
I know TMU is a texture mapping unit on GPU, and in opengl, we can have many texture units.I used to think they're the same, that if I got n TMU, then I can have n GL_TEXTURE to use, but I found that this may not be true.
Recently, I was working on an android game, targetting a platform using the Mali 400MP GPU.According to the document, it has only one TMU, I thought that I can use only one texture at a time.But suprisingly, I can use at least 4 textures without trouble.Why is this?
Is the hardware or driver level doing something like swap different textures in/out automatically for me? If so, is it supposed to cause a lot of cache miss?
I'm not the ultimate hardware architecture expert, particularly not for Mali. But I'll give it a shot anyway, based on my understanding.
The TMU is a hardware unit for texture sampling. It does not get assigned to a OpenGL texture unit on a permanent basis. Any time a shader executes a texture sampling operation, I expect this specific operation to be assigned to one of the TMUs. The TMU then does the requested sampling, delivers the result back to the shader, and is available for the next sampling operation.
So there is no relationship between the number of TMUs and the number of supported OpenGL texture units. The number of OpenGL texture units that can be supported is determined by the state tracking part of the hardware.
The number of TMUs has an effect on performance. The more TMUs are available, the more texture sampling operations can be executed within a given time. So if you use a lot of texture sampling in your shaders, your code will profit from having more TMUs. It doesn't matter if you sample many times from the same texture, or from many different textures.
Texture Mapping Units (TMUs) are functional units on the hardware, once upon a time they were directly related to the number of pixel pipelines. As hardware is much more abstract/general purpose now, it is not a good measure of how many textures can be applied in a single pass anymore. It may give an indication of overall multi-texture performance, but by itself does not impose any limits.
OpenGL's GL_TEXTURE0+n actually represents Texture Image Units (TIUs), which are locations where you bind a texture. The number of textures you can apply simultaneously (in a single execution of a shader) varies per-shader stage. In Desktop GL, which has 5 stages as of GL 4.4, implementations must support 16 unique textures per-stage. This is why the number of Texture Image Units is 80 (16x5). GL 3.3 only has 3 stages, and its minimum TIU count is thus only 48. This gives you enough binding locations to provide a set of 16 unique textures for every stage in your GLSL program.
GL ES, particularly 2.0, is a completely different story. It mandates support for at least 8 simultaneous textures in the fragment shader stage and 0 (optional) in the vertex shader.
const mediump int gl_MaxVertexTextureImageUnits = 0; // Vertex Shader Limit
const mediump int gl_MaxTextureImageUnits = 8; // Fragment Shader Limit
const mediump int gl_MaxCombinedTextureImageUnits = 8; // Total Limit for Entire Program
There is also a limit on the number of textures you can apply across all of the shaders in a single execution of your program (gl_MaxCombinedTextureImageUnits), and this limit is usually just the sum total of the limits for each individual stage.

Polygons are being drawn when I modify an unrelated, undrawn array

I am implementing Catmull-Clark subdivision on a mesh using OpenGL. I can draw my mesh just fine, and I do so using a vertex array.
The array that I draw is called extraVert1[].
In order to implement this subdivision, I have to do operations on certain points besides just the vertices used to draw. I have implemented the standard half-edge data structure in order to iterate through the edges of the mesh and generate the edge-points needed to subdivide.
The issue is here
When I calculate edge-points, I store them into a vertex array, and make the corresponding face point to this edge-point vertex (of which each face points to 4).
The code snippet is as follows (edgeAry1[] is the array of half-edges)
edgePoint1[j].x = (edgeAry1[i].end->x + edgeAry1[i].next->next->next->end->x + edgeAry1[i].heFace->center.x + edgeAry1[i].opp->heFace->center.x) / 4.0;
edgePoint1[j].y = (edgeAry1[i].end->y + edgeAry1[i].next->next->next->end->y + edgeAry1[i].heFace->center.y + edgeAry1[i].opp->heFace->center.y) / 4.0;
edgePoint1[j].z = (edgeAry1[i].end->z + edgeAry1[i].next->next->next->end->z + edgeAry1[i].heFace->center.z + edgeAry1[i].opp->heFace->center.z) / 4.0;
faceAry1[i].e = &edgePoint1[j];
j++;
When this code executes (it loops through for each face in faceAry1[]), I get random edges and triangles around the center of my mesh, even though I never make any changes to extraVert1[], the array I draw from.
I thought this had something to do with my pointers, so I individually commented out each operand and none of them changed anything. I then set every line equal to just 4.0. This gave me a single extra triangle, with points [approximately] (0,0,0), (4,0,0), (4,4,4).
When debugging, I went through the extraVer1[] array both before and after this section of code. It remained unchanged. My draw code is: (extraVert has size 408)
glEnableClientState(GL_VERTEX_ARRAY);
glVertexPointer(3, GL_FLOAT, 0, extraVert1);
glDrawArrays(GL_QUADS, 0, 408);
glDisableClientState(GL_VERTEX_ARRAY);
Again, I'm not modifying the drawing array extraVert1[] in any way, so I'm completely stumped as to why this is occurring. I'm sure I'll need to provide more information if anyone is interested in answering, so feel free to ask for it. I'm going to keep working at it for now until then.
UPDATE
It seems that using a different array large enough to store these values (in this case, extraVert2[]). The problem seems to be one of overwriting memory, but I'm not sure exactly how. When my arrays are declared like so:
face faceAry1[34];
float extraVert1[408];
halfEdge edgeAry1[136];
vertex edgePoint1[136];
vertex extraVert2[1632];
I can store the information in extraVert2[] with no issues. If I flip the order of extraVert2[] and edgePoint1[], I get the same issue as before. Anyone know what causes this?
While I don't know how 3D-rendering works, random edges and triangles usually occurs from uninitialized variables or dangling pointers. These two are usually the cause of unexpected random behaviour in my programs. I too think this has something to do with your pointers, but as I have no knowledge of 3D-rendering it could be related to any 3D-specific context aswell.

Drawing per-pixel into a backbuffer or texture to display to screen, using opengl - no glDrawPixels()

Basically, I have an array of data (fluid simulation data) which is generated per-frame in real-time from user input (starts in system ram). I want to write the density of the fluid to a texture as an alpha value - I interpolate the array values to result in an array the size of the screen (the grid is relatively small) and map it to a 0 - 255 range. What is the most efficient way (ogl function) to write these values into a texture for use?
Things that have been suggested elsewhere, which I don't think I want to use (please, let me know if I've got it wrong):
glDrawPixels() - I'm under the impression that this will cause an interrupt each time I call it, which would make it slow, particularly at high resolutions.
Use a shader - I don't think that a shader can accept and process the volume of data in the array each frame (It was mentioned elsewhere that the cap on the amount of data they may accept is too low)
If I understand your problem correctly, both solutions are over-complicating the issue. Am I correct in thinking you've already generated an array of size x*y where x and y are your screen resolution, filled with unsigned bytes ?
If so, if you want an OpenGL texture that uses this data as its alpha channel, why not just create a texture, bind it to GL_TEXTURE_2D and call glTexImage2D with your data, using GL_ALPHA as the format and internal format, GL_UNSIGNED_BYTE as the type and (x,y) as the size ?
What makes you think a shader would perfom bad? The whole idea of shaders is about processing huge amounts of data very, very fast. Please use Google on the search phrase "General Purpose GPU computing" or "GPGPU".
Shaders can only gather data from buffers, not scatter. But what they can do is change values in the buffers. This allows for a (fragment) shader to write the locations of *GL_POINT*s, which are then in turn placed on the target pixels of the texture. Shader Model 3 and later GPUs can also access texture samplers from the geometry and vertex shader stages, so the fragment shader part gets really simple then.
If you just have a linear stream of positions and values, just send those to OpenGL through a Vertex Array, drawing *GL_POINT*s, with your target texture being a color attachment for a framebuffer object.
What is the most efficient way (ogl function) to write these values into a texture for use?
A good way would be to try to avoid any unnecessary extra copies. So you could use Pixel Buffer Objects which you map to your address space, and use that to directly generate your data into.
Since you want to update this data per frame, you also want to look for efficient buffer object streaming, so that you don't force implicit synchronizations between the CPU and GPU. An easy way to do that in your scenario would be using a ring buffer of 3 PBOs, which you advance every frame.
Things that have been suggested elsewhere, which I don't think I want to use (please, let me know if I've got it wrong):
glDrawPixels() - I'm under the impression that this will cause an interrupt each time I call it, which would make it slow, particularly at high resolutions.
Well, what the driver does is totally implementation-specific. I don't think that the "cause an interrupt each time" is a useful mental image here. You seem to completely underestimate the work the GL implementation will be doing behind your back. A GL call will not correspond to some command which is sent to the GPU.
But not using glDrawPixels is still a good choice. It is not very efficient, and it has been deprecated and removed from modern GL.
Use a shader - I don't think that a shader can accept and process the volume of data in the array each frame (It was mentioned elsewhere that the cap on the amount of data they may accept is too low)
You got this totally wrong. There is no way to not use a shader. If you're not writing one yourself (e.g. by using old "fixed-function pipeline" of the GL), the GPU driver will provide the shader for you. The hardware implementation for these earlier fixed function stages has been completely superseeded by programmable units - so if you can't do it with shaders, you can't do it with the GPU. And I would strongly recommend to write your own shader (it is the only option in modern GL, anyway).

Resources