DX9 GenerateMipSubLevels produces broken mipmap and destroys framerate - directx-9

For Variance Shadow Mapping I need a low-res, averaged version of my shadowMap. In DirectX9 I do this by rendering distances to a renderTexture and then generating mipmaps. The mipmap generation should yield results similar to a separable box-filter, but be faster since there's special hardware for it. However, I'm running into two problems:
According to PIX, each next mipmap level contains the top-left quadrant of the previous one (at full resolution), instead of the full image at half resolution.
Framerate is absolutely obliterated when I do this: it's 50x lower than without the mipmap generation.
This is how I initialise the RenderTexture:
D3DXCreateTexture(device, 2048, 2048, 0, D3DUSAGE_RENDERTARGET, D3DFMT_G32R32F, D3DPOOL_DEFAULT, &textureID);
D3DXCreateRenderToSurface(device, 2048, 2048, D3DFMT_G32R32F, TRUE, D3DFMT_D24S8, &renderToSurface);
dxId->GetSurfaceLevel(0, &topSurface);
This is how I generate the mipmaps (this is done every frame after rendering to the shadowMap has finished):
textureID->SetAutoGenFilterType(D3DTEXF_LINEAR);
textureID->GenerateMipSubLevels();
I think setting the filterType shouldn't be done every time, but removing that doesn't make a difference.
Since this is for Variance Shadow Mapping I really need colour format D3DFMT_G32R32F, but I've also tried D3DFMT_A8R8G8B8 to see whether that made a difference, and it didn't: mipmap is still broken in the same way and framerate is still 1fps.
I've also tried using D3DUSAGE_AUTOGENMIPMAP instead, but that didn't generate a mipmap at all according to PIX. That looks like this (and then I don't call GenerateMipSubLevels at all anymore):
D3DXCreateTexture(device, 2048, 2048, 0, D3DUSAGE_RENDERTARGET | D3DUSAGE_AUTOGENMIPMAP, D3DFMT_G32R32F, D3DPOOL_DEFAULT, &textureID);
Note that this is on Windows 10 with a Geforce 550 GPU. I've already tried reinstalling my videocard drivers entirely, just in case, but that didn't make a difference.
How can I get a proper mipmap for a renderTexture in DirectX9, and how can I get proper framerate while doing so?

Related

Most performant image format for SCNParticles?

I've been using 24bit .png with Alpha, from Photoshop, and just tried a .psd which worked fine with OpenGL ES, but Metal didn't see the Alpha channel.
What's the absolutely most performant texture format for particles within SceneKit?
Here's a sheet to test on, if needs be.
It looks white... right click and save as in the blank space. It's an alpha heavy set of rings. You can probably barely make them out if you squint at the screen:
exaggerated example use case:
https://www.dropbox.com/s/vu4dvfl0aj3f50o/circless.mov?dl=0
// Additional points for anyone can guess the difference between the left and right rings in the video.
Use a grayscale/alpha PNG, not an RGBA one. Since it uses 16 bits per pixel (8+8) instead of 32 (8+8+8+8), the initial texture load will be faster and it may (depending on the GPU) use less memory as well. At render time, though, you’re not going to see much of a speed difference, since whatever the texture format is it’s still being drawn to a full RGB(A) render buffer.
There’s also PVRTC, which can get you down as low as 2–4 bits per pixel, but I tried Imagine’s tool out on your image and even the highest quality settings caused a bunch of artifacts like the below:
Long story short: go with a grayscale+alpha PNG, which you can easily export from Photoshop. If your particle system is hurting your frame rate, reduce the number and/or size of the particles—in this case you might be able to get away with layering a couple of your particle images on top of each other in the source texture atlas, which may not be too noticeable if you pick ones that differ in size enough.

OpenGL glTexImage2D memory issue

I'm loading a cubemap to create a skybox, everything is fine and the skybox renders properly with a correct texture application.
However, I decided to check my program safety with valgrind, Valgrind gives this error: http://pastebin.com/seqmXjyx
The line 53 in sky.c is:
glTexImage2D(GL_TEXTURE_CUBE_MAP_POSITIVE_X + i, 0, GL_RGB, texture.width, texture.height, 0, GL_BGR, GL_UNSIGNED_BYTE, texture.pixels);
protoype:
void glTexImage2D( GLenum target,
GLint level,
GLint internalformat,
GLsizei width,
GLsizei height,
GLint border,
GLenum format,
GLenum type,
const GLvoid *pixels )
The texture width and height are unsigned int (1024x1024), and the pixels have bmp texture format.
It is correctly parsed for sure (as I said before, everything is rendered correctly, openGL returns no error, I only get this invalid write of 4 from valgrind).
(this invalid write appear every time I load a texture)
So I read the man, and it made me even more confused, this is what I get from it:
GL_INVALID_VALUE is generated if width or height is less than 0 or greater than 2 + GL_MAX_TEXTURE_SIZE, or if either cannot be represented as 2^k +2 (border) for some integer value of k.
glGetError() gives me GL_NO_ERROR, when I'm sending 1024*1024 as parameters, which is obviously not (2^k + 2)
I also read about the border parameter, which seems kind of useless for now the openGL I use, but could this be link to this overwrite?
Finally, as I said, everything works properly, but I would definitely like to know where are these invalids writes coming from.
The full project: https://github.com/toss-dev/minetoss
Which man page are you quoting? There are multiple man pages available, not all mapping to the same OpenGL version.
Anyways, the idea behind the + 2 (border) is to have 2 multiplied by the value of border, which is in your case 0. So your code is just fine. border is a feature that is not supported by the latest GL versions and is therefore absent from the more recent man pages.
Now, back to your problem. The valgrind error is coming from within the GeForce GL driver, so unless you get access to the source, it's unlikely you'll get anywhere investigating it (you can try to contact the driver maintainers if you want... but may have a hard time getting any answer).

What's the difference between TMU and openGL's GL_TEXTUREn?

I can't quite understand what's the difference.
I know TMU is a texture mapping unit on GPU, and in opengl, we can have many texture units.I used to think they're the same, that if I got n TMU, then I can have n GL_TEXTURE to use, but I found that this may not be true.
Recently, I was working on an android game, targetting a platform using the Mali 400MP GPU.According to the document, it has only one TMU, I thought that I can use only one texture at a time.But suprisingly, I can use at least 4 textures without trouble.Why is this?
Is the hardware or driver level doing something like swap different textures in/out automatically for me? If so, is it supposed to cause a lot of cache miss?
I'm not the ultimate hardware architecture expert, particularly not for Mali. But I'll give it a shot anyway, based on my understanding.
The TMU is a hardware unit for texture sampling. It does not get assigned to a OpenGL texture unit on a permanent basis. Any time a shader executes a texture sampling operation, I expect this specific operation to be assigned to one of the TMUs. The TMU then does the requested sampling, delivers the result back to the shader, and is available for the next sampling operation.
So there is no relationship between the number of TMUs and the number of supported OpenGL texture units. The number of OpenGL texture units that can be supported is determined by the state tracking part of the hardware.
The number of TMUs has an effect on performance. The more TMUs are available, the more texture sampling operations can be executed within a given time. So if you use a lot of texture sampling in your shaders, your code will profit from having more TMUs. It doesn't matter if you sample many times from the same texture, or from many different textures.
Texture Mapping Units (TMUs) are functional units on the hardware, once upon a time they were directly related to the number of pixel pipelines. As hardware is much more abstract/general purpose now, it is not a good measure of how many textures can be applied in a single pass anymore. It may give an indication of overall multi-texture performance, but by itself does not impose any limits.
OpenGL's GL_TEXTURE0+n actually represents Texture Image Units (TIUs), which are locations where you bind a texture. The number of textures you can apply simultaneously (in a single execution of a shader) varies per-shader stage. In Desktop GL, which has 5 stages as of GL 4.4, implementations must support 16 unique textures per-stage. This is why the number of Texture Image Units is 80 (16x5). GL 3.3 only has 3 stages, and its minimum TIU count is thus only 48. This gives you enough binding locations to provide a set of 16 unique textures for every stage in your GLSL program.
GL ES, particularly 2.0, is a completely different story. It mandates support for at least 8 simultaneous textures in the fragment shader stage and 0 (optional) in the vertex shader.
const mediump int gl_MaxVertexTextureImageUnits = 0; // Vertex Shader Limit
const mediump int gl_MaxTextureImageUnits = 8; // Fragment Shader Limit
const mediump int gl_MaxCombinedTextureImageUnits = 8; // Total Limit for Entire Program
There is also a limit on the number of textures you can apply across all of the shaders in a single execution of your program (gl_MaxCombinedTextureImageUnits), and this limit is usually just the sum total of the limits for each individual stage.

How to identify optimal parameters for cvCanny for polygon approximation

This is my source image (ignore the points, they were added manually later):
My goal is to get a rough polygon approximation of the two hands. Something like this:
I have a general idea on how to do this; I want to use cvCanny to find edges, cvFindContours to find contours, and then cvApproxPoly.
The problem I'm facing is that I have no idea on how to properly use cvCanny, particularly, what should I use for the last 3 parameters (threshold1&2, apertureSize)? I tried doing:
cvCanny(source, cannyProcessedImage, 20, 40, 3);
but the result is not ideal. The left hand looks relatively fine but for the right hand it detected very little:
In general it's not as reliable as I'd like. Is there a way to guess the "best" parameters for Canny, or at least a detailed explanation (understandable by a beginner) of what they do so I can make educated guesses? Or perhaps there's a better way to do this altogether?
It seems you have to lower your thresholds.
The Canny algorithm work on the hysteresis threshold: it selects a contour if at least a pixel is as bright as the max threshold, and takes all the connected contour pixels if they are above the lower threshold.
Papers recommend to take the two thresholds in a scale of 2:1 oe 3:1 (by example 10 and 30, or 20 and 60, etc). For some applications, a threshold determined manually and hardcoded is enough. It may your case, too. I suspect that if you lower your thresholds, you will have good results, because the images are not that complicated.
A number of methods to automatically determine the best canny thresholds have been proposed. Most of them rely on gradient magnitudes to estimate the best thresholds.
Steps:
Extract the gradients (Sobel is a good option)
You can convert it to uchar. Gradients teoretically can have greater numerical values than 255, but that's ok. opencv's sobel returns uchars.
make a histogram of the resulting image.
take the max threshold at the 95th percentile of your histogram, and the lower as high/3.
You should probably adjust the percentile value depending on your app, but the results will be much more robust than a hardcoded hig and low values
Note: An excellent threshold detection algorithm is implemented in Matlab. It is based on the idea above, but a bit more sophisticated.
Note 2: This methods will work if the contours and illumination do not varies a lot between image areas. If the contours are crisper on one part of the image, then you need locally adaptive thresholds, and that's another story. But looking at you pics, it should not be the case.
Maybe one of the easiest solution is make Otsu thresholding on grayscale image, find contours on the binary image and than approximate them. Here's code:
Mat img = imread("test.png"), gray;
vector<Vec4i> hierarchy;
vector<vector<Point2i> > contours;
cvtColor(img, gray, CV_BGR2GRAY);
threshold(gray, gray, 0, 255, THRESH_OTSU);
findContours(gray, contours, hierarchy, CV_RETR_EXTERNAL, CV_CHAIN_APPROX_SIMPLE);
for(size_t i=0; i<contours.size(); i++)
{
approxPolyDP(contours[i], contours[i], 5, false);
drawContours(img, contours, i, Scalar(0,0,255));
}
imshow("result", img);
waitKey();
And this is result:

Drawing per-pixel into a backbuffer or texture to display to screen, using opengl - no glDrawPixels()

Basically, I have an array of data (fluid simulation data) which is generated per-frame in real-time from user input (starts in system ram). I want to write the density of the fluid to a texture as an alpha value - I interpolate the array values to result in an array the size of the screen (the grid is relatively small) and map it to a 0 - 255 range. What is the most efficient way (ogl function) to write these values into a texture for use?
Things that have been suggested elsewhere, which I don't think I want to use (please, let me know if I've got it wrong):
glDrawPixels() - I'm under the impression that this will cause an interrupt each time I call it, which would make it slow, particularly at high resolutions.
Use a shader - I don't think that a shader can accept and process the volume of data in the array each frame (It was mentioned elsewhere that the cap on the amount of data they may accept is too low)
If I understand your problem correctly, both solutions are over-complicating the issue. Am I correct in thinking you've already generated an array of size x*y where x and y are your screen resolution, filled with unsigned bytes ?
If so, if you want an OpenGL texture that uses this data as its alpha channel, why not just create a texture, bind it to GL_TEXTURE_2D and call glTexImage2D with your data, using GL_ALPHA as the format and internal format, GL_UNSIGNED_BYTE as the type and (x,y) as the size ?
What makes you think a shader would perfom bad? The whole idea of shaders is about processing huge amounts of data very, very fast. Please use Google on the search phrase "General Purpose GPU computing" or "GPGPU".
Shaders can only gather data from buffers, not scatter. But what they can do is change values in the buffers. This allows for a (fragment) shader to write the locations of *GL_POINT*s, which are then in turn placed on the target pixels of the texture. Shader Model 3 and later GPUs can also access texture samplers from the geometry and vertex shader stages, so the fragment shader part gets really simple then.
If you just have a linear stream of positions and values, just send those to OpenGL through a Vertex Array, drawing *GL_POINT*s, with your target texture being a color attachment for a framebuffer object.
What is the most efficient way (ogl function) to write these values into a texture for use?
A good way would be to try to avoid any unnecessary extra copies. So you could use Pixel Buffer Objects which you map to your address space, and use that to directly generate your data into.
Since you want to update this data per frame, you also want to look for efficient buffer object streaming, so that you don't force implicit synchronizations between the CPU and GPU. An easy way to do that in your scenario would be using a ring buffer of 3 PBOs, which you advance every frame.
Things that have been suggested elsewhere, which I don't think I want to use (please, let me know if I've got it wrong):
glDrawPixels() - I'm under the impression that this will cause an interrupt each time I call it, which would make it slow, particularly at high resolutions.
Well, what the driver does is totally implementation-specific. I don't think that the "cause an interrupt each time" is a useful mental image here. You seem to completely underestimate the work the GL implementation will be doing behind your back. A GL call will not correspond to some command which is sent to the GPU.
But not using glDrawPixels is still a good choice. It is not very efficient, and it has been deprecated and removed from modern GL.
Use a shader - I don't think that a shader can accept and process the volume of data in the array each frame (It was mentioned elsewhere that the cap on the amount of data they may accept is too low)
You got this totally wrong. There is no way to not use a shader. If you're not writing one yourself (e.g. by using old "fixed-function pipeline" of the GL), the GPU driver will provide the shader for you. The hardware implementation for these earlier fixed function stages has been completely superseeded by programmable units - so if you can't do it with shaders, you can't do it with the GPU. And I would strongly recommend to write your own shader (it is the only option in modern GL, anyway).

Resources