OpenGL VBO only being uploaded to GPU when rendered - c

My VBOs are only being sent to the GPU when they are used for the first time, this causes small freezes the first time an object/group of objects is drawn.
I tried loading the data this way:
glBufferData(GL_ARRAY_BUFFER, size, NULL, GL_STATIC_DRAW);
glBufferSubData(GL_ARRAY_BUFFER, 0, size, data);
and this way
glBufferData(GL_ARRAY_BUFFER, size, data, GL_STATIC_DRAW);
But the result is the same.
If I then draw a triangle after glBufferData:
glDrawElements(GL_TRIANGLES, 3, GL_UNSIGNED_BYTE, NULL);
then the problem is solved, but I find this solution rather hackish.
Is there a better solution?
(I have a bunch of small VBOs containing 256 vertices each)

Well, this is how Buffer Objects are supposed to work, namely adding somewhat asynchronous operation. The idea is, that you can upload a large bunch of Buffer Objects, and continue OpenGL operations afterwards, with the pipline stalling only, if data is accessed which upload has not been completed yet. glBufferData and glBufferSubData either make the pages of the pointer passed them CoW or make an interim copy, either way you can safely discard the data in your process after the call returned, the OpenGL client side will still have the data around for (the ongoing) upload process.
Calling glFinish() will block until the operations pipline has been completely finished (hence the name).

Try calling glFlush() after your glBufferData call.

Related

Can a GLSL fragment shader run without a framebuffer and similar inconveniences?

Reapeating the above: Can a GLSL fragment shader run without a framebuffer and any rasterization stage?
This perfect answer gives an insight about where to start with SSBO's. The answer has a link to OpenGL ARB extension that has a boilerplate code. The code works for me if made with some changes to work with OpenGL compute programs. But, I really does not get it, how to do with a fragment program? And without any other buffers than SSBO.
The code clearly has fragment source code without any pixel operations, only SSBO ones.
in vec4 color;
void main()
{
uint fragmentNumber = atomicCounterIncrement(fragmentCounter);
if (fragmentNumber < maxFragmentCount) {
fragments[fragmentNumber].position = ivec2(gl_FragCoord.xy);
fragments[fragmentNumber].color = color;
}
}
And later in the C program file:
// Generate, bind, and specify the data store for the atomic counter.
glGenBuffers(1, &counterBuffer);
glBindBufferBase(GL_ATOMIC_COUNTER_BUFFER, 0, counterBuffer);
glBufferData(GL_ATOMIC_COUNTER_BUFFER, sizeof(GLuint), NULL,
GL_DYNAMIC_DRAW);
// Reset the atomic counter to zero, then draw stuff. This will record
// values into the shader storage buffer as fragments are generated.
GLuint zero = 0;
glBufferSubData(GL_ATOMIC_COUNTER_BUFFER, 0, sizeof(GLuint), &zero);
glUseProgram(program);
glDrawElements(GL_TRIANGLES, ...);
As per my setup, I do not have any output with the means of OpenGL pixels. I wish it to stay so. Is it possible, or am I missing something?
P.S The above setup gives me error invalid framebuffer operation after glDrawElements immediately followed by glFinish.
Update 21.03.2021
There is a Framebuffers with no attachments. The only thing you should set in its state is its width and height. And that is somewhat at the course that anyone's heading, if one wish to minimize setup.
The minus of the aformentioned, is that it is still requires some geometry to be fed to rasterization stage. To start the shader stages, you know. But, as a plus, one gets geometry rasterization, wish it or not.
If I have time, I leave some code as a reminder for miself.
Can a GLSL fragment shader run without a framebuffer and similar inconveniences?
No. The fragment shaders need the step that invokes them. The stage that produce fragments called rasterization.
From the khronos wiki:
A Fragment Shader is the Shader stage that
will process a Fragment generated by the Rasterization
into a set of colors and a single depth value.
The fragment shader is the OpenGL pipeline stage after a primitive is rasterized.
And the rasterization needs a render step to produce fragments. The rendering is done to somewhere.
In OpenGL, it is done to framebuffer. So without a framebuffer, you can not render, hence OpenGL
can not produce fragments.
Setup of a framebuffer can be minimized by
Framebuffers with no attachments.
But one needs to supply geometry and render it, to invoke fragment shaders.
Fragment shaders can read and write to arbitrary SSBO. But the usage is not similar to compute shaders.
Fragment shaders invoke on each produced fragment, and compute shaders can be invoked, as I may say, arbitrary.
Many thanks to all commenters who had pointed me to the obvious, by now, reason why the fragment shaders need a render operation.

Opengl - appending to a texture

I want to create a texture system where I add to a texture, not overwrite it. My texture has integer values (32 bit). What I want: Ex. I have an integer pixel with bits 100, I want to add 10 to it so it becomes 110.
My current implementation has two textures, one with the previous texture, and a texture to write on. The previous texture's values are read and then rewritten with the new data. Is there a better method to do so because using two textures feel very inefficient?
Depending on what you mean by "appending", you could use additive blending:
glEnable(GL_BLEND);
glBlendEquation(GL_FUNC_ADD);
glBlendFunc(GL_ONE, GL_ONE);
then, the routput of your fragment shader will by added to the current contents of the color buffer. If you use a FBO to render into the texture, you can directly add to this texture.
You should just be careful to not create any feedback loops, so your fragment shader's result should not depend on any sample of the very same texture you render to.
UPDATE
As noted in the comment, the texture in question has GL_RED_INTEGER format. Unfortunately, the blending is only applied on floating-point color buffers (including normalized integers), and never on unnormalized integers.
However, there is another potential approach. The rules for the "feedback loops" I mentioned before have been relaxed with recent OpenGL. The extension GL_ARB_texture_barrier explicitely allowes a fragment shader to read pixels from the same texture it is writing to:
Specifically, the values of rendered fragments are undefined if any
shader stage fetches texels and the same texels are written via fragment
shader outputs, even if the reads and writes are not in the same Draw
call, unless any of the following exceptions apply:
The reads and writes are from/to disjoint sets of texels (after
accounting for texture filtering rules).
There is only a single read and write of each texel, and the read is in
the fragment shader invocation that writes the same texel (e.g. using
"texelFetch2D(sampler, ivec2(gl_FragCoord.xy), 0);").
[...]
This extension has been promoted to a core feature of OpenGL 4.5. This is quite new and not available on a lot of platforms, so it is unclear if you can use it...

Adding up buffers into one

I have a mesh consisting of several entries.
Every entry contains it's own list of faces, vertices, normals, colors and texture coordinates.
Can I loop though all of my entries and use glVertexAttribPointer to cummulate data of an attribute in a single buffer object, like this?:
glBindBuffer(vbo);
for(Entry* e : entries) {
glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 0, e->vertices);
...
}
In other words, will repeated calls on glVertexAttribPointer for attribute 0 of buffer vbo rewrite the data pointed on before or not?
If yes, is there any effective solution out of copying all vertices into one consecutive memory block before calling glVertexAttribPointer only once for the whole buffer?
glVertexAttribPointer does only store (for each attribute) the last information you supplied to it. So appending buffers is not possible by this method.
You have two options when you have a situation like yours:
Issue for each buffer a separate draw-call
Copy the data off all buffers into a single buffer and issue one draw-call for it. Note that in this case the indices might have to be adjusted to point to the correct positions in the combined buffer.
glVertexAttribPointer() does not copy anything. It only sets state that specifies where the vertex data will be fetched from. If you call it repeatedly for the same attribute, each call will replace the previous state, and the last one wins.
Starting with OpenGL 3.1, there is a glCopyBufferSubData() call (man page) that allows you to copy data from one buffer to another. Using this, you could allocate a buffer with enough space for all verctices, and then copy the smaller buffers into the buffer holding all vertices.
That being said, it does not sound like a great idea to use it this way. If you want all vertices in the same buffer, it's much easier and more efficient to store them in that buffer right from the start.
You definitely should not copy around the vertex data on each draw call. While reducing the number of draw calls is desirable, copying around vertex data is much more expensive.

glVertexAttribPointer, interleaved elements and performance / cache friendliness

So, in the course of writing a model loader for a 3D scene I'm working on, I've decided to pack the vertex, texture and normal data like so:
VVVVTTTNNN
for each vertex, where V = vertex coordinate, T = UV coordinate, and N = normal coordinate. When I pass this data on to the vertex shader for my scene, I make three glVertexAttribPointer calls, like so:
glVertexAttribPointer(ATTRIB_VERTEX, 4, GL_FLOAT, 0, 10, group->vertices.data);
glEnableVertexAttribArray(ATTRIB_VERTEX);
glVertexAttribPointer(ATTRIB_NORMAL, 3, GL_FLOAT, 0, 10, group->normals.data);
glEnableVertexAttribArray(ATTRIB_NORMAL);
glVertexAttribPointer(ATTRIB_UV_COORDINATES, 3, GL_FLOAT, 0, 10, group->uvcoordinates.data);
glEnableVertexAttribArray(ATTRIB_UV_COORDINATES);
Each of the group pointers being passed refer to the beginning position in the shared vertex data block where that vertex type starts:
group->vertices.data == data
group->uvcoordinates.data == &data[4]
group->normals.data == &data[7]
Part of the reason for me interleaving this data was to program for cache friendliness and minimize data being sent to the card. ( NOTE: This is not for a realistic performance bottleneck. I'm investigating the optimization because I want to learn more about programming to address these sort of concerns. ) However, for the life of me, I can't imagine how GL would be able to infer that the 3 different pointers refer to offset positions within the same larger data block, and thereby make the necessary optimization to avoid copying the data once it has already been copied. Furthermore, since I'm only ensuring data locality in system memory ( and don't really have any guarantees on how that data is going to be organized on the GPU ), I'm only really optimizing for the case where I access any of these vertices outside of GL. Is that right? Are these optimizations mostly useless, or will providing data in this manner help minimize the data transfer to the GPU / prevent cache misses when iterating over vertex data in the vertex shader?
OpenGL is just an API, the intelligence lies in the driver. Anyway the problem is actually rather simple to implement: For every Vertex Attribute you got a starting memory address and when calling glDrawArrays or glDrawElements one looks for the largest index found. That defines the upper bound of the range.
Then you sort the vertex attributes starting addresses and for each address check if it range overlaps with any other vertex attribute range. You find the contiguous regions and copy those.
In the case of Vertex Buffer Objects it's even simpler since you already copied stuff to OpenGL ready for processing.

Drawing per-pixel into a backbuffer or texture to display to screen, using opengl - no glDrawPixels()

Basically, I have an array of data (fluid simulation data) which is generated per-frame in real-time from user input (starts in system ram). I want to write the density of the fluid to a texture as an alpha value - I interpolate the array values to result in an array the size of the screen (the grid is relatively small) and map it to a 0 - 255 range. What is the most efficient way (ogl function) to write these values into a texture for use?
Things that have been suggested elsewhere, which I don't think I want to use (please, let me know if I've got it wrong):
glDrawPixels() - I'm under the impression that this will cause an interrupt each time I call it, which would make it slow, particularly at high resolutions.
Use a shader - I don't think that a shader can accept and process the volume of data in the array each frame (It was mentioned elsewhere that the cap on the amount of data they may accept is too low)
If I understand your problem correctly, both solutions are over-complicating the issue. Am I correct in thinking you've already generated an array of size x*y where x and y are your screen resolution, filled with unsigned bytes ?
If so, if you want an OpenGL texture that uses this data as its alpha channel, why not just create a texture, bind it to GL_TEXTURE_2D and call glTexImage2D with your data, using GL_ALPHA as the format and internal format, GL_UNSIGNED_BYTE as the type and (x,y) as the size ?
What makes you think a shader would perfom bad? The whole idea of shaders is about processing huge amounts of data very, very fast. Please use Google on the search phrase "General Purpose GPU computing" or "GPGPU".
Shaders can only gather data from buffers, not scatter. But what they can do is change values in the buffers. This allows for a (fragment) shader to write the locations of *GL_POINT*s, which are then in turn placed on the target pixels of the texture. Shader Model 3 and later GPUs can also access texture samplers from the geometry and vertex shader stages, so the fragment shader part gets really simple then.
If you just have a linear stream of positions and values, just send those to OpenGL through a Vertex Array, drawing *GL_POINT*s, with your target texture being a color attachment for a framebuffer object.
What is the most efficient way (ogl function) to write these values into a texture for use?
A good way would be to try to avoid any unnecessary extra copies. So you could use Pixel Buffer Objects which you map to your address space, and use that to directly generate your data into.
Since you want to update this data per frame, you also want to look for efficient buffer object streaming, so that you don't force implicit synchronizations between the CPU and GPU. An easy way to do that in your scenario would be using a ring buffer of 3 PBOs, which you advance every frame.
Things that have been suggested elsewhere, which I don't think I want to use (please, let me know if I've got it wrong):
glDrawPixels() - I'm under the impression that this will cause an interrupt each time I call it, which would make it slow, particularly at high resolutions.
Well, what the driver does is totally implementation-specific. I don't think that the "cause an interrupt each time" is a useful mental image here. You seem to completely underestimate the work the GL implementation will be doing behind your back. A GL call will not correspond to some command which is sent to the GPU.
But not using glDrawPixels is still a good choice. It is not very efficient, and it has been deprecated and removed from modern GL.
Use a shader - I don't think that a shader can accept and process the volume of data in the array each frame (It was mentioned elsewhere that the cap on the amount of data they may accept is too low)
You got this totally wrong. There is no way to not use a shader. If you're not writing one yourself (e.g. by using old "fixed-function pipeline" of the GL), the GPU driver will provide the shader for you. The hardware implementation for these earlier fixed function stages has been completely superseeded by programmable units - so if you can't do it with shaders, you can't do it with the GPU. And I would strongly recommend to write your own shader (it is the only option in modern GL, anyway).

Resources