I've been writing GLSL shaders, and using an integer texture (GL_RED) to store values in the shader.
When I attempt to divide a value taken from the usampler2D texture, it stays the same.
The following is the minimal reproducible shader.
#version 440
in vec2 uv;
out vec3 color;
layout (binding = 1) uniform usampler2D tm_data;
void main(){
float index = texture(tm_data, uv).r;
float divisor = 16.0f
color = vec3(index / divisor, 0, 0);
}
The rendered red value is always 1.0 regardless of a way i try to divide or mutate the index value.
When the sampler is changed to a normalized one (sampler2D) the color manipulation works as expected
#version 440
in vec2 uv;
out vec3 color;
layout (binding = 1) uniform sampler2D tm_data; //Loads as normalized from [0,255] to [0,1]
void main(){
float index = texture(tm_data, uv).r * 255.0f; //Convert back to integer approximation
float divisor = 4.0f
color = vec3(index / divisor, 0, 0); //Shade of red now appears considerably darker
}
Does anyone know why this unexpected behaviour happens?
The tm_data texture is loaded as GL_RED -> GL_RED
The OpenGL version used is 4.4
There is no framework being used (no sneaky additions), everything is loaded using gl function calls.
For the use of usampler2D, the internal format has to be a unsigned integral format (e.g. GL_R8UI). See Sampler types.
If the internal format is the basic format GL_RED, then the sampler type has to be sampler2D.
Note, sampler* is for the use of floating point formats, isampler* for signed integral formats and usampler* for unsigned integral formats.
See OpenGL Shading Language 4.60 Specification - 4.1.7. Opaque Types and OpenGL Shading Language 4.60 Specification - 8.9. Texture Functions
Related
I wrote the light diffusing code and it's just not working I dont know why
this is the code I wrote for diffusing
t_vec pi = vec_add(ray.org,vec_times_double(ray.dir, t));
t_vec hp = vec_diff(light.pos, pi);
t_vec normal = get_sphers_normal(sp, pi);
double dot = vec_dot(normalize_vect(normal),normalize_vect(hp));
printf("hitpoint : %lf\n", dot);
put_pixel(mlx.img, x, y, rgb_to_int(sp.color)*double_abs(dot), resolution.width);
In this line:
put_pixel(mlx.img, x, y, rgb_to_int(sp.color)*double_abs(dot), resolution.width);
^-----------------------------------
You seem to be multiplying an integer by a double. I presume that put_pixel expects a 32-bit integer encoding the RGB color, in which case your double gets converted back to int, but in an un-meaningful way, giving those color bands. For example, if sp.color = { 255., 0., 0. } is a red surface, rgb_to_int converts it to 0xff0000, multiplying it by 0.00390625 (dimly lit surface) and converting back to int gives 0x00ff00, which is a bright green rather than a dark red.
You should rather use your vector times scalar function on the argument to rgb_to_int:
rgb_to_int(vec_times_double(sp.color, double_abs(dot)))
I assume here that sp.color is of type t_vec. If it's not then adjust your code accordingly.
Video encoders like Intel® Media SDK do not accept 8 bits Grayscale image as input format.
8 bits Grayscale format applies one byte per pixel in range [0, 255].
8 bits YUV format in the context of the question applies YCbCr (BT.601 or BT.709).
Although there is a full range YUV standard, the commonly used format is "limited range" YUV, where range of Y is [16, 235] and range of U,V is [16, 240].
NV12 format is the common input format in this case.
NV12 format is YUV 4:2:0 format ordered in memory with a Y plane first, followed by packed chroma samples in interleaved UV plane:
YYYYYY
YYYYYY
UVUVUV
The Grayscale image will be referred as "I plane":
IIIIII
IIIIII
Setting the UV plane is simple: Set all U,V elements to 128 value.
But what about the Y plane?
In case of full range YUV, we can simply put "I plane" as Y plane (i.e Y = I).
In case of "limited" YUV format, a transformation is required:
Setting R=G=B in the conversion formula results: Y = round(I*0.859 + 16).
What is the efficient way to do the above conversion using IPP?
I am adding an answer to my own question.
I hope to see a better answer...
I found a solution using two IPP functions:
ippsMulC_8u_Sfs - Multiplies each element of a vector by a constant value.
ippsAddC_8u_ISfs - Adds a constant value to each element of a vector.
I selected functions that uses fixed point math, for better performance.
Fixed point implementation of 0.859 scaling is performed by expanding, scaling and shifting. Example: b = (a*scale + (1<<7)) >> 8; [When scale = (0.859)*2^8].
val parameter to ippsMulC_8u_Sfs set to round(0.859*2^8) = 220.
scaleFactor parameter to ippsMulC_8u_Sfs set to 8 (divide the scaled result by 2^8).
Code sample:
void GrayscaleToNV12(const unsigned char I[],
int image_width,
int image_height,
unsigned char J[])
{
IppStatus ipp_status;
const int image_size = image_width*image_height;
unsigned char *UV = &J[image_size]; //In NV12 format, UV plane starts below Y.
const Ipp8u expanded_scaling = (Ipp8u)(0.859 * 256.0 + 0.5);
//J[x] = (expanded_scaling * I[x] + 128u) >> 8u;
ipp_status = ippsMulC_8u_Sfs(I, //const Ipp8u* pSrc,
expanded_scaling, //Ipp8u val,
J, //Ipp8u* pDst,
image_size, //int len,
8); //int scaleFactor);
//Check ipp_status, and handle errors...
//J[x] += 16;
//ippsAddC_8u_ISfs is deprecated, I used it to keep the code simple.
ipp_status = ippsAddC_8u_ISfs(16, //Ipp8u val,
J, //Ipp8u* pSrcDst,
image_size, //int len,
0); //int scaleFactor);
//Check ipp_status, and handle errors...
//2. Fill all UV plane with 128 value - "gray color".
memset(UV, 128, image_width*image_height/2);
}
Out of topic note:
There is a way to mark a video stream as "full range" (where Y range is [0, 255] instead of [16, 235], and U,V range is also [0, 255]).
Using the "full range" standard allows placing I in place of Y (i.e Y = I).
Marking the stream as "full range" using Intel Media SDK, is possible (but not well documented).
Marking H.264 stream as "full range" requires to add pointer to mfxExtBuffer **ExtParam list (in structure mfxVideoParam):
A pointer to structure of type mfxExtVideoSignalInfo should be added with the following values:
typedef struct {
mfxExtBuffer Header; //MFX_EXTBUFF_VIDEO_SIGNAL_INFO and sizeof(mfxExtVideoSignalInfo)
mfxU16 VideoFormat; //Most likely 5 ("Unspecified video format")
mfxU16 VideoFullRange; //1 (video_full_range_flag is equal to 1)
mfxU16 ColourDescriptionPresent; //0 (description_present_flag equal to 0)
mfxU16 ColourPrimaries; //0 (no affect when ColourDescriptionPresent = 0)
mfxU16 TransferCharacteristics; //0 (no affect when ColourDescriptionPresent = 0)
mfxU16 MatrixCoefficients; //0 (no affect when ColourDescriptionPresent = 0)
} mfxExtVideoSignalInfo;
VideoFullRange = 1 is the only relevant parameter of setting "full range" video, but we must fill the entire structure.
I spent the day working on an OpenGL application that will tessellate a mesh and apply a lens distortion. The goal is to be able to render wide angle shots for a variety of different lenses. So far I've got the shaders properly applying the distortion but I've been having issues controlling the tessellation the way I want to. Right now my Tessellation Control Shader just breaks a single triangle into a set number of smaller triangles, then I apply the lens distortion in in the Tessellation Evaluation Shader.
The problem I'm having with this approach is that when I have really large triangles in the scene, they tend to need more warping. This means they need to be tessellated more in order to ensure good looking results. Unfortunately, I can't compute the size of a triangle (in screen space) in the Vertex Shader or the Tessellation Control Shader, but I need to define the tessellation amount in the Tessellation Control shader.
My question is then, is there some way to get a hold of the entire primitive in OpenGL's programmable pipeline, compute some metrics about it, then use that information to control tessellation?
Here's some example images of the problem for clarity...
Figure 1 (Above): Each Red or Green Square was originally 2 triangles, this example looks good because the triangles were small.
Figure 2 (Above): Each Red or Green Region was originally 2 triangles, this example looks bad because the triangles were small.
Figure 3 (Above): Another example with small triangles but with a much, much larger grid. Notice how much things curve on the edges. Still looks good with tessellation level of 4.
Figure 4 (Above): Another example with large triangles, only showing center 4 columns because the image is unintelligible if more columns are present. This shows how very large triangles don't get tessellated well. If I set the tessellation really really high then this comes out nice. But then I'm performing a crazy amount of tessellation on smaller triangles too.
In a Tessellation Control Shader (TCS) you have read access to every vertex in the input patch primitive. While that sounds nice on paper, if you are trying to compute the maximum edge length of a patch, it would actually mean iterating over every vertex in the patch on every TCS invocation and that's not particularly efficient.
Instead, it may be more practical to pre-compute the patch's center in object-space and determine the radius of a sphere that tightly bounds the patch. Store this bounding information as an extra vec4 attribute per-vertex, packed as shown below.
Pseudo-code for a TCS that computes the longest length of the patch in NDC-space
#version 420
uniform mat4 model_view_proj;
in vec4 bounding_sphere []; // xyz = center (object-space), w = radius
void main (void)
{
vec4 center = vec4 (bounding_sphere [0].xyz, 1.0f);
float radius = bounding_sphere [0].w;
// Transform object-space X extremes into clip-space
vec4 min_0 = model_view_proj * (center - vec4 (radius, 0.0f, 0.0f, 0.0f));
vec4 max_0 = model_view_proj * (center + vec4 (radius, 0.0f, 0.0f, 0.0f));
// Transform object-space Y extremes into clip-space
vec4 min_1 = model_view_proj * (center - vec4 (0.0f, radius, 0.0f, 0.0f));
vec4 max_1 = model_view_proj * (center + vec4 (0.0f, radius, 0.0f, 0.0f));
// Transform object-space Z extremes into clip-space
vec4 min_2 = model_view_proj * (center - vec4 (0.0f, 0.0f, radius, 0.0f));
vec4 max_2 = model_view_proj * (center + vec4 (0.0f, 0.0f, radius, 0.0f));
// Transform from clip-space to NDC
min_0 /= min_0.w; max_0 /= max_0.w;
min_1 /= min_1.w; max_1 /= max_1.w;
min_2 /= min_2.w; max_2 /= max_2.w;
// Calculate the distance (ignore depth) covered by all three pairs of extremes
float dist_0 = distance (min_0.xy, max_0.xy);
float dist_1 = distance (min_1.xy, max_1.xy);
float dist_2 = distance (min_2.xy, max_2.xy);
// A max_dist >= 2.0 indicates the patch spans the entire screen in one direction
float max_dist = max (dist_0, max (dist_1, dist_2));
// ...
}
If you run your 4th diagram through this TCS, you should come up with a value for max_dist very nearly 2.0, which means you need as much subdivision as possible. Meanwhile, many of the patches on the periphery of the sphere in the 3rd diagram will be close to 0.0; they don't need much subdivision.
This does not properly deal with situations where part of the patch is offscreen. You would need to clamp the NDC extremes to [-1.0,1.0] to properly handle those situations. Seemed like more trouble than it was worth.
So I'm trying to send an array of values to my fragment shader-
The shader reads values from a texture and depending on the value currently being read by the texture, I want to retrieve a value from the array-
I am able to cast the value (u.r) to an int using int(u.r), but when I actually put that into the array index to find the value, it says that the integer isn't a constant, so I can't use it...
ERROR: 0:75: '[]' : Index expression must be constant -
Is there a better way of sending arrays of values to the shader?
Here is some of the code- as you can see, the array "tab" is what I'm looking at mostly
<script id="shader-fs" type="x-shader/x-fragment">
#ifdef GL_ES
precision highp float;
#endif
uniform sampler2D uTexSamp;
uniform sampler2D uTabSamp;
uniform float dt;
uniform float dte;
uniform float dth2;
uniform float a;
uniform float nb;
uniform float m;
uniform float eps;
uniform float weee;
uniform float tab[100];
//uniform float temp;
uniform int fframes;
uniform vec2 vStimCoord;
varying vec2 vTexCoord;
const float d = 0.001953125; // 1./512.
void main(void) {
vec4 t = texture2D(uTexSamp, vTexCoord);
float u = t.r, v = t.g, u2 = t.b, v2 = t.a;
//const mediump int arrindex = floor(u*10 + u2);
//float sigvaluetab = tab[arrindex];
u += u2/255.; v += v2/255.;
//u += u2 * 0.003921568627451;
v += v2 * 0.003921568627451;
//Scaling factors
v = v*1.2;
u = u*4.;
float temp = (1.0 / (exp(2.0 * (u-3.0)) + 1.0)); // (1-tanh(u-3)) * 0.5
//const mediump int utoint;
//utoint = int(u);
//for(int index = 0; index< 50; index++)
int u2toint;
u2toint = int(u2);
// int arrindex = utoint*10 + u2toint;
float sigmoid = tab[u2toint];//(tab[5] + 1.);
//float sigmoid= temp;//tab[arrindex];
float hfunc = sigmoid * u * u;
float ffunc = -u +(a - pow(v*nb,m))*hfunc ;
float gfunc = -v;
if (u > 1.0) { //u-1.0 > 0.0
gfunc += 1.4990;
}
... MORE STUFF UNDER, BUT THIS IS THE IDEA
Fragment shaders are tricky, unlike vertex shaders where you can index a uniform using any integer expression in a fragment shader the expression must qualify as const-index. This can go as far as to rule out indexing uniforms in a loop in fragment shaders :-\
GLSL ES Specification (version 100) - Appendix A: Limitations for ES 2.0 - pp. 110
Many implementations exceed these requirements, but understand that fragment shaders are more restrictive than vertex shaders. If you could edit your question to include the full fragment shader, I might be able to offer you an alternate solution.
One solution might be to use a 1D texture lookup instead of array. Technically, texture lookups that use non-const coordinates are dependent lookups, which can be significantly slower. However, texture lookups do overcome limitations of array indexing in GLSL ES.
I'm trying to tilt an image 90 degrees, currently I'm following the method presented here:
http://www.scribd.com/doc/66589491/Image-Rotation-Using-CUDA
My current code for the kernel is this:
__global__ void kernCuda(float *Source,float * Destination, int width)
{
int x = blockIdx.x * blockDim.x + threadIdx.x;
int y = blockIdx.y * blockDim.y + threadIdx.y;
int i = abs(x*cosf(theta)-y*sinf(theta));
int j = abs(x*sinf(theta)+y*cosf(theta));
if(x<width && y<width){
Destination[j*width+i]=Source[y*width+x];
}
}
The image tilts somewhat, however it seems like it is not correct, on top of that some pixels that are colored is now black(0). Any help would be appreciated
I assume that theta is a floating point number. So you are mixing floating point and integer variables. I would suggest you use the appropriate casts to make it work and look for rounding issues.
Secondly, in order to see whether your program works in general you can replace cosf(theta) by 0 and sinf(theta) by 1.
One third issue that I can see is that you only take one x,y value instead of looping over them by using a while loop. So in case your image dimension is larger than the kernel that you use, you will not get all your pixels.
Edit: I just had a brief look at the report. It really is not very good. If you want to just learn about CUDA I suggest you get the book called "CUDA by Example".
The obvious problem is the integer truncation/interpolation issue.
One way around that would be to bind the source image to a texture and then read from the texture using the computed real valued coordinates. CUDA textures give you "free", hardware based interpolation/filtering. The main disadvantage of textures is the interpolation is limited to 8 bit internal accuracy, so it might not be accurate enough in some situations. But as a first attempt, try something like:
__global__ void kernCuda(float * Destination, const float sintheta, const float costheta,
const int width)
{
int x = blockIdx.x * blockDim.x + threadIdx.x;
int y = blockIdx.y * blockDim.y + threadIdx.y;
float tx = float(x)*costheta-float(y)*sintheta;
float ty = float(x)*sintheta+float(y)*costheta;
if(x<width && y<width){
Destination[x*width+y]=tex2D(Source_texture, tx+0.5f,ty+0.5f);
}
}
(note not tested, never been near a compiler, use at your own risk).
Here Source_texture is the texture you bind the source data to. You can set the edge behaviour of the rotation in the texture setup, depending on how you want to handle it. How to setup and bind a texture in Section 3.2.10.1 of the CUDA 4.1 programming guide.
Note also that the cosine and sine in the rotation matrix are constant for a given value of theta, so it would be much more efficient to pass them to the kernel as an argument, rather than have every thread compute the same values. For the 90 degree rotation you have asked about, simply call the kernel with sintheta=0.f and costheta=1.f and you are done.
If you simply want to tilt the image by 90 degrees, then you won't need any trigonometric functions, since you can simply swap the x and y axis while interating through the pixels:
Destination[x+y*width]=tex2D(Source_texture, tx+0.5f,ty+0.5f);
should do it.