Drawing per-pixel into a backbuffer or texture to display to screen, using opengl - no glDrawPixels() - arrays

Basically, I have an array of data (fluid simulation data) which is generated per-frame in real-time from user input (starts in system ram). I want to write the density of the fluid to a texture as an alpha value - I interpolate the array values to result in an array the size of the screen (the grid is relatively small) and map it to a 0 - 255 range. What is the most efficient way (ogl function) to write these values into a texture for use?
Things that have been suggested elsewhere, which I don't think I want to use (please, let me know if I've got it wrong):
glDrawPixels() - I'm under the impression that this will cause an interrupt each time I call it, which would make it slow, particularly at high resolutions.
Use a shader - I don't think that a shader can accept and process the volume of data in the array each frame (It was mentioned elsewhere that the cap on the amount of data they may accept is too low)

If I understand your problem correctly, both solutions are over-complicating the issue. Am I correct in thinking you've already generated an array of size x*y where x and y are your screen resolution, filled with unsigned bytes ?
If so, if you want an OpenGL texture that uses this data as its alpha channel, why not just create a texture, bind it to GL_TEXTURE_2D and call glTexImage2D with your data, using GL_ALPHA as the format and internal format, GL_UNSIGNED_BYTE as the type and (x,y) as the size ?

What makes you think a shader would perfom bad? The whole idea of shaders is about processing huge amounts of data very, very fast. Please use Google on the search phrase "General Purpose GPU computing" or "GPGPU".
Shaders can only gather data from buffers, not scatter. But what they can do is change values in the buffers. This allows for a (fragment) shader to write the locations of *GL_POINT*s, which are then in turn placed on the target pixels of the texture. Shader Model 3 and later GPUs can also access texture samplers from the geometry and vertex shader stages, so the fragment shader part gets really simple then.
If you just have a linear stream of positions and values, just send those to OpenGL through a Vertex Array, drawing *GL_POINT*s, with your target texture being a color attachment for a framebuffer object.

What is the most efficient way (ogl function) to write these values into a texture for use?
A good way would be to try to avoid any unnecessary extra copies. So you could use Pixel Buffer Objects which you map to your address space, and use that to directly generate your data into.
Since you want to update this data per frame, you also want to look for efficient buffer object streaming, so that you don't force implicit synchronizations between the CPU and GPU. An easy way to do that in your scenario would be using a ring buffer of 3 PBOs, which you advance every frame.
Things that have been suggested elsewhere, which I don't think I want to use (please, let me know if I've got it wrong):
glDrawPixels() - I'm under the impression that this will cause an interrupt each time I call it, which would make it slow, particularly at high resolutions.
Well, what the driver does is totally implementation-specific. I don't think that the "cause an interrupt each time" is a useful mental image here. You seem to completely underestimate the work the GL implementation will be doing behind your back. A GL call will not correspond to some command which is sent to the GPU.
But not using glDrawPixels is still a good choice. It is not very efficient, and it has been deprecated and removed from modern GL.
Use a shader - I don't think that a shader can accept and process the volume of data in the array each frame (It was mentioned elsewhere that the cap on the amount of data they may accept is too low)
You got this totally wrong. There is no way to not use a shader. If you're not writing one yourself (e.g. by using old "fixed-function pipeline" of the GL), the GPU driver will provide the shader for you. The hardware implementation for these earlier fixed function stages has been completely superseeded by programmable units - so if you can't do it with shaders, you can't do it with the GPU. And I would strongly recommend to write your own shader (it is the only option in modern GL, anyway).

Related

Seeking some guidance on webcam picture display using GTK+ and Cairo in C

In this question I'm mostly seeking for advice and guidance on overall understanding of some concepts of drawing wth GTK+ and Cairo in C language (IMO the information on topic is rather scarce, also my experience in really modest).
I'm coding some pet application which captures frames from webcam and displays them on a GTK window.
My app is working, but there are some points which I don't feel like grasped.
Overall process:
I've got a webcam frame as an array of bytes mmaped from webcam device to my app's process memory. So when another frame is captured what I have is a 640*480*3 bytes long array which is denoted as being in a RGB24 format. After some searching it looks like for a purpose of displaying it in a GTK window I need to create an object called drawing area using gtk_drawing_area_new(), add a "draw" callback and do "drawing" there in a designated callback. So, according to Cairo "drawing" is a process of applying "source" to "destination". I assume that I already have a source - my webcam mmaped pixels, but it looks like I need to use some "source" that Cairo is able to understand. I found a candidate:
cairo_surface_t* surface = cairo_image_surface_create(CAIRO_FORMAT_RGB24, 640, 480);
As I see this call creates some Cairo acceptable object, which along the way allocates a buffer in my app's memory which I can get, using:
unsigned char* surface_data = cairo_image_surface_get_data(surface);
According to docs this is a 640x480x4 bytes long buffer, which, on a little endian archs, should be filled with BGRA formatted pixel data.
Then I should rearrange my original webcam pixels for EVERY frame captured using this :
for (size_t idx_src=0, idx_dst=0; idx_src<640*480*3; idx_dst+=4, idx_src+=3) {
surface_data[idx_dst] = image[idx_src+2]; //B [3rd pos -> 1st pos]
surface_data[idx_dst+1] = image[idx_src+1]; //G [no change]
surface_data[idx_dst+2] = image[idx_src]; //R [1st pos -> 3rd pos]
}
After this I should do "drawing" with:
cairo_set_source_surface(cr, surface, 0, 0);
cairo_paint(cr);
So questions:
Is it what is supposed to be done for task at hand or I miss
something completely here ?
What confuses me is that I should
rearrange my original webcam pixels for EVERY frame captured (this
presumably consumes some cpu time, could be a limiting factor for
capturing in HD res at high frame rates). Is there some other way ?
Let's suppose I somehow acquire pixels from webcam in a Cairo
conforming format, e.g. 640x480x4 BGRA formatted bytes. Is there a
way to "wrap" this data in some Cairo acceptable object to exclude
pixel rearranging part ?
Any other thoughts I should've consider ?
Thanks for attention.
For most of your questions: Cairo only supports some image formats. Since your data comes in another format, you will have to convert it. All this copying around will likely be too slow. To make this work with an acceptable speed, you would need some other approach. No, I do not have any helpful suggestions here.
An unhelpful one would be: Is there some example for this webcam that you could look at?
Let's suppose I somehow acquire pixels from webcam in a Cairo conforming format, e.g. 640x480x4 BGRA formatted bytes. Is there a way to "wrap" this data in some Cairo acceptable object to exclude pixel rearranging part ?
Yup. cairo_image_surface_create_for_data.

Opengl - appending to a texture

I want to create a texture system where I add to a texture, not overwrite it. My texture has integer values (32 bit). What I want: Ex. I have an integer pixel with bits 100, I want to add 10 to it so it becomes 110.
My current implementation has two textures, one with the previous texture, and a texture to write on. The previous texture's values are read and then rewritten with the new data. Is there a better method to do so because using two textures feel very inefficient?
Depending on what you mean by "appending", you could use additive blending:
glEnable(GL_BLEND);
glBlendEquation(GL_FUNC_ADD);
glBlendFunc(GL_ONE, GL_ONE);
then, the routput of your fragment shader will by added to the current contents of the color buffer. If you use a FBO to render into the texture, you can directly add to this texture.
You should just be careful to not create any feedback loops, so your fragment shader's result should not depend on any sample of the very same texture you render to.
UPDATE
As noted in the comment, the texture in question has GL_RED_INTEGER format. Unfortunately, the blending is only applied on floating-point color buffers (including normalized integers), and never on unnormalized integers.
However, there is another potential approach. The rules for the "feedback loops" I mentioned before have been relaxed with recent OpenGL. The extension GL_ARB_texture_barrier explicitely allowes a fragment shader to read pixels from the same texture it is writing to:
Specifically, the values of rendered fragments are undefined if any
shader stage fetches texels and the same texels are written via fragment
shader outputs, even if the reads and writes are not in the same Draw
call, unless any of the following exceptions apply:
The reads and writes are from/to disjoint sets of texels (after
accounting for texture filtering rules).
There is only a single read and write of each texel, and the read is in
the fragment shader invocation that writes the same texel (e.g. using
"texelFetch2D(sampler, ivec2(gl_FragCoord.xy), 0);").
[...]
This extension has been promoted to a core feature of OpenGL 4.5. This is quite new and not available on a lot of platforms, so it is unclear if you can use it...

What's the difference between TMU and openGL's GL_TEXTUREn?

I can't quite understand what's the difference.
I know TMU is a texture mapping unit on GPU, and in opengl, we can have many texture units.I used to think they're the same, that if I got n TMU, then I can have n GL_TEXTURE to use, but I found that this may not be true.
Recently, I was working on an android game, targetting a platform using the Mali 400MP GPU.According to the document, it has only one TMU, I thought that I can use only one texture at a time.But suprisingly, I can use at least 4 textures without trouble.Why is this?
Is the hardware or driver level doing something like swap different textures in/out automatically for me? If so, is it supposed to cause a lot of cache miss?
I'm not the ultimate hardware architecture expert, particularly not for Mali. But I'll give it a shot anyway, based on my understanding.
The TMU is a hardware unit for texture sampling. It does not get assigned to a OpenGL texture unit on a permanent basis. Any time a shader executes a texture sampling operation, I expect this specific operation to be assigned to one of the TMUs. The TMU then does the requested sampling, delivers the result back to the shader, and is available for the next sampling operation.
So there is no relationship between the number of TMUs and the number of supported OpenGL texture units. The number of OpenGL texture units that can be supported is determined by the state tracking part of the hardware.
The number of TMUs has an effect on performance. The more TMUs are available, the more texture sampling operations can be executed within a given time. So if you use a lot of texture sampling in your shaders, your code will profit from having more TMUs. It doesn't matter if you sample many times from the same texture, or from many different textures.
Texture Mapping Units (TMUs) are functional units on the hardware, once upon a time they were directly related to the number of pixel pipelines. As hardware is much more abstract/general purpose now, it is not a good measure of how many textures can be applied in a single pass anymore. It may give an indication of overall multi-texture performance, but by itself does not impose any limits.
OpenGL's GL_TEXTURE0+n actually represents Texture Image Units (TIUs), which are locations where you bind a texture. The number of textures you can apply simultaneously (in a single execution of a shader) varies per-shader stage. In Desktop GL, which has 5 stages as of GL 4.4, implementations must support 16 unique textures per-stage. This is why the number of Texture Image Units is 80 (16x5). GL 3.3 only has 3 stages, and its minimum TIU count is thus only 48. This gives you enough binding locations to provide a set of 16 unique textures for every stage in your GLSL program.
GL ES, particularly 2.0, is a completely different story. It mandates support for at least 8 simultaneous textures in the fragment shader stage and 0 (optional) in the vertex shader.
const mediump int gl_MaxVertexTextureImageUnits = 0; // Vertex Shader Limit
const mediump int gl_MaxTextureImageUnits = 8; // Fragment Shader Limit
const mediump int gl_MaxCombinedTextureImageUnits = 8; // Total Limit for Entire Program
There is also a limit on the number of textures you can apply across all of the shaders in a single execution of your program (gl_MaxCombinedTextureImageUnits), and this limit is usually just the sum total of the limits for each individual stage.

glsl multi light, best practice of passing data (array of structs?)

working myself from step to step I am now trying to figure out more about multi lights in glsl. I read some tutorials so far but none seems to have THE answer for this.
Lets say I have such a struct for my lighting:
struct LightInfo
{
vec4 LightLocation;
vec3 DiffuseLightColor;
vec3 AmbientLightColor;
vec3 SpecularLightColor;
vec3 spotDirection;
float AmbientLightIntensity;
float SpecularLightIntensity;
float constantAttenuation;
float linearAttenuation;
float quadraticAttenuation;
float spotCutoff;
float spotExponent;
};
uniform LightInfo gLight;
my first idea would be to make it something like
uniform LightInfo gLight[NumLights];
but then I read a lot about that passing data that way to the shader wouldn't work, since it can't get the location of that. Now I have to admit that I didn't try it myself yet, but I found a couple of pages mentioning this, so it's probably not that wrong - or is this maybe just outdated information?
The other idea would be to make it just:
uniform[NumOfArgs]
and split it in the shader again, but if I take my example struct above I have an immense huge array very soon, and taking the information out of it with a for loop probably will be quite expensive too- and that only if I want to use a similar number of lights like the max of 8 when using gl_LightSource - while I wanted to avoid using that because of the advantage having an own struct with all information needed at once.
Of course not any light in question would require that many parameters, but any light COULD require them (and even if stripping it quite some it will grow very soon also).
Yet again I could use some qsort first to determine the closest lights, then limiting the maxlights to something like 3 (something which is also suggested on many places), but here again I have to say that I expect a bit more from nowadays glsl and modern hardware although there is no contradiction in using this as well, unrelated to the chosen solution.
So my question now, what's best practice here, what's really fast? Or should I stay with gl_LightSource and passing the additional information then via some uniform array? Although this doesn't seem to make more sense to me either.
The idea of a light struct is just fine. For forward rendering - passing all lights into the one shader which processes your actual geometry - an array is just fine.
You may have an array of structs as uniforms (uniform LightInfo gLight[NumLights], where NumLights is compile-time constatnt), but arrays are not so different to just declaring uniform LightInfo gLight0, gLight1....
You get the uniform location via the full name, eg:
glGetUniformLocation(program, "gLight[3].spotExponent")
Note that glGetActiveUniform will return just the string with element zero but the size will give the number of elements.
Uniform buffers will be important with lots of lights and attributes. You can store all the data for the structs on the GPU, so it doesn't get sent every time with individual calls to glUniform*. You can use glMapBuffer to modify parts of the buffer if the rest doesn't need changing.
Be very aware of how the structs and arrays get packed (it's not always intuitive)! Related issues occur in non-uniform/uniform block cases too.
See: Sub-section 2.15.3.1.2 - Standard Uniform Block Layout
To get the byte offset from the beginning of the block, use the GL_UNIFORM_OFFSET​ enum
See: Uniform Buffer Object
Elements are aligned to whopping big 16 byte boundaries (vec4 size). To make that struct more efficient, you should pair the vec3s with the floats.
You're right, if you have more lights than there are in the shader you'll have to chop and choose. Lights that are close are important, but you might also want to prioritize lights in the direction you're facing (those whose area of influence touches the viewing volume formed by the projection matrix) and bigger/brighter lights (eg. sun/directional).
Ultimately if you have too many lights this method ceases to work. Your next step is to swap to deferred shading, which brings with it a few more issues (eg. blending/transparency).

How could we get a variable value from GLSL?

I'm doing a project with a lot of calculation and i got an idea is throw pieces of work to GPU, but i wonder whether could we retrieve results from GLSL, if it is posible, how?
GLSL does not provide outputs besides what is placed in the frame buffer.
To program a GPU and get results more conveniently, use CUDA (NVidia only) or OpenCL (cross-platform).
In general, what you want to do is use OpenCL for general-purpose GPU tasks. However, if you are insistent about pretending that OpenGL is not a rendering API...
Framebuffer Objects make it relatively easy to render to multiple outputs. This of course means that you have to structure your processing such that what gets rendered matches what you want. You can render to 32-bit floating-point "images", so you have access to plenty of precision. The biggest difficulty is what I stated: figuring out how to structure your task to match rendering.
It's a bit easier when using transform feedback. This is the ability to write the output of the vertex (or geometry) shader processing to a buffer object. This still requires structuring your tasks into something like rendering, but it's easier because vertex shaders have a strict one-vertex-to-one-vertex mapping. For every input vertex, there is exactly one output. And if you draw GL_POINTS, it's not too difficult to use attributes to pass the data that changes.
Both easier and harder is the use of shader_image_load_store. This is effectively the ability to read/write from/to arbitrary images "whenever you want". I put that last part in quotes because there are lots of esoteric rules about data race conditions: reading from a value written by another shader invocation and so forth. These are not trivial to deal with. You can try to structure your code to avoid them, by not writing to the same image location in the same shader. But in many cases, if you could do that, you could just render to the framebuffer.
Ultimately, it's pretty much impossible to answer this question in the general case, without knowing what exactly you're trying to actually do. How you approach GPGPU through a rendering API depends greatly on exactly what you're trying to compute.

Resources