I am trying to adapt the code from a couple of the HLSL shaders in the WPF Pixel Shader Effects Library on Codeplex to create a pixel shader which creates a diagonal transition from Texture1 to Texture2 by sliding Texture2 across Texture1 as if it were overlaying (i.e. Texture1 remain "stationary" while Texture2 gradually replaces Texture1) from the upper left-hand corner.
I am struggling to properly understand the "uv" notation and how to manipulate it to achieve my goal.
So far I have tried
float Progress : register(C0);
sampler2D Texture1 : register(s0);
sampler2D Texture2 : register(s1);
struct VS_OUTPUT
float4 Position : POSITION;
float4 Color : COlOR;
float2 UV : TEXCOORD;
float4 SampleWithBorder(float4 border, sampler2D tex, float2 uv)
if (any(saturate(uv) - uv))
return border;
return tex2D(tex, uv);
float4 SlideDiagonal(float2 uv,float progress)
uv += progress;
float4 c1 = SampleWithBorder(float4(0,0,0,0), Texture2, uv);
if(c1.a <=0)
return tex2D(Texture1, uv);
return c1;
// Pixel Shader
float4 main(VS_OUTPUT input) : COlOR
return SlideDiagonal(input.UV, Progress/100);
and also
float Progress : register(C0);
sampler2D Texture1 : register(s1);
sampler2D Texture2 : register(s0);
float4 SampleWithBorder(float4 border, sampler2D tex, float2 uv)
if (any(saturate(uv) - uv))
return border;
return tex2D(tex, uv);
float4 Shrink(float2 uv,float progress)
float speed = 200;
float2 center = float2(0.001, 0.001);
float2 toUV = uv - center;
float distanceFromCenter = length(toUV);
float2 normToUV = toUV / distanceFromCenter;
float2 newUV = center + normToUV * (distanceFromCenter *(progress*speed+1));
float4 c1 = SampleWithBorder(float4(0,0,0,0), Texture2, newUV);
if(c1.a <= 0)
return tex2D(Texture1, uv);
return c1;
// Pixel Shader
float4 main(float2 uv : TEXCOORD) : COlOR
return Shrink(uv, Progress/100);
but in both cases the "slide" operates in reverse, reducing the amount of visible Texture2 as progress increases to 1 and, in the case of the former, Texture1 is not displayed at all.
I have come back to the problem a couple of times over Christmas to no avail and now I think I am suffering a "wood for the trees" problem.
If anyone knows the solution to this particular problem it would be greatly appreciated.
And in the spirit of "teaching a man to fish" if there was any information out there to help me understand how to manipulate "uv" that would also be great.
Many thanks in advance for your help
The texture coordinates (uv) specify where to look into the texture. With the line
uv += progress;
you shift the texture coordinate depending on the progress. If it is zero, the original coordinates are used (and the entire Texture2 is shown). By increasing progress, you go more and more to the bottom right corner of Texture2 and draw it at the original position. This lets the texture slide towards the top left corner. So if you want it the other way around, try:
uv += 1 - progress;
A big thank you to Nico for creating a clearing in the trees! His answer gave me the change of perspective I need to solve this problem. #Nico - Upvote for you, thanks mate.
For completeness I have included the final result from my Shazzam tinkerings which allows for a diagonal slide from any of the 4 corners using a float input called (imaginatively) "Corner". In production the float would be replaced by an int driven from an enumeration of possible corners - TopLeft, TopRight, BottomRight, BottomLeft. The order in the code is as shown represented by 1, 2, 3 & 4.
Here's the shader:
/// <summary>Modifies the Progress value.</summary>
/// <minValue>0</minValue>
/// <maxValue>100</maxValue>
/// <defaultValue>0</defaultValue>
float Progress : register(C0);
/// <summary>Modifies the Corner value.</summary>
/// <minValue>1</minValue>
/// <maxValue>1</maxValue>
/// <defaultValue>1</defaultValue>
float Corner : register(C1);
sampler2D Texture1 : register(s0);
sampler2D Texture2 : register(s1);
float4 SampleWithBorder(float4 border, sampler2D tex, float2 uv, float corner)
if (any(saturate(uv) - uv))
return border;
float2 rev = uv;
//Swap y position again to counteract the inversion caused by
//needing to get to get to corners BottomLeft and TopRight
if(corner >=2 && corner < 3 || corner>=4
rev.y = 1 - rev.y;
return tex2D(tex, rev);
float4 SlideDiagonal(float2 uv, float progress, float corner)
float2 rev = uv;
//Swap y position to get to corners BottomLeft and TopRight
if(corner >=2 && corner < 3 || corner>=4)
rev.y = 1 - rev.y;
float2 newUV;
if(corner >= 1 && corner < 2)
newUV = rev + (1- progress);
if(corner >= 2 && corner < 3)
newUV = rev - (1- progress);
if(corner >= 3 && corner < 4)
newUV = rev - (1- progress);
if(corner >= 4)
newUV = rev + (1- progress);
float4 c1 = SampleWithBorder(float4(0,0,0,0), Texture2, newUV, corner);
if(c1.a <=0)
return tex2D(Texture1, uv);
return c1;
// Pixel Shader
float4 main(float2 input : TEXCOORD) : COlOR
return SlideDiagonal(input, Progress/100, Corner);
!!!UPDATE!!! Using the vertex shader to generate quads via DrawInstanced() calls definitely reduced CPU overhead and increased quads drawn per second. But there was much more performance to be found by using a combination of instanced drawing via a vertex shader that generates a point list, and a geometry shader that generates quads based on those points.
Thanks to #Soonts for not only recommending a faster way, but also for reminding me of conditional moves and unrolling loops.
Here is the geometry shader I created for sprites with 2D rotation:
cbuffer CB_PROJ {
matrix camera;
/* Reduced packet size -- 256x256 max atlas segments
FLOAT3 Sprite location // 12 bytes
FLOAT Rotation // 16 bytes
FLOAT2 Scale // 24 bytes
UINT // 28 bytes
Fixed8p00 Texture X segment
Fixed8p00 Texture X total segments
Fixed8p00 Texture Y segment
Fixed8p00 Texture Y total segments
.Following vertex data is only processed by the vertex shader.
UINT // 32 bytes
Fixed3p00 Squadron generation method
Fixed7p00 Sprite stride
Fixed8p14 X/Y distance between sprites
struct VOut {
float3 position : POSITION;
float3 r_s : NORMAL;
uint bits : BLENDINDICES;
struct GOut {
float4 pos : SV_Position;
float3 position : POSITION;
float3 n : NORMAL;
float2 tex : TEXCOORD;
uint pID : SV_PrimitiveID;
void main(point VOut gin[1], uint pID : SV_PrimitiveID, inout TriangleStream<GOut> triStream) {
GOut output;
const uint bits = gin[0].bits;
const uint ySegs = (bits & 0x0FF000000) >> 24u;
const uint _yOS = (bits & 0x000FF0000) >> 16u;
const float yOS = 1.0f - float(_yOS) / float(ySegs);
const float yOSd = rcp(float(ySegs));
const uint xSegs = (bits & 0x00000FF00) >> 8u;
const uint _xOS = (bits & 0x0000000FF);
const float xOS = float(_xOS) / float(xSegs);
const float xOSd = rcp(float(xSegs));
float2 v;
output.pID = pID;
output.n = float3( 0.0f, 0.0f, -1.0f );
output.position = gin[0].position; // Translate
v.x = -gin[0].r_s.y; v.y = -gin[0].r_s.z; // Scale
output.tex = float2(xOS, yOS);
output.position.x += v.x * cos(gin[0].r_s.x) - v.y * sin(gin[0].r_s.x); // Rotate
output.position.y += v.x * sin(gin[0].r_s.x) + v.y * cos(gin[0].r_s.x);
output.pos = mul(float4(output.position, 1.0f), camera); // Transform
output.position = gin[0].position;
v.x = -gin[0].r_s.y; v.y = gin[0].r_s.z;
output.tex = float2(xOS, yOS - yOSd);
output.position.x += v.x * cos(gin[0].r_s.x) - v.y * sin(gin[0].r_s.x);
output.position.y += v.x * sin(gin[0].r_s.x) + v.y * cos(gin[0].r_s.x);
output.pos = mul(float4(output.position, 1.0f), camera);
output.position = gin[0].position;
v.x = gin[0].r_s.y; v.y = -gin[0].r_s.z;
output.tex = float2(xOS + xOSd, yOS);
output.position.x += v.x * cos(gin[0].r_s.x) - v.y * sin(gin[0].r_s.x);
output.position.y += v.y * sin(gin[0].r_s.x) + v.y * cos(gin[0].r_s.x);
output.pos = mul(float4(output.position, 1.0f), camera);
output.position = gin[0].position;
v.x = gin[0].r_s.y; v.y = gin[0].r_s.z;
output.tex = float2(xOS + xOSd, yOS - yOSd);
output.position.x += v.x * cos(gin[0].r_s.x) - v.y * sin(gin[0].r_s.x);
output.position.y += v.y * sin(gin[0].r_s.x) + v.y * cos(gin[0].r_s.x);
output.pos = mul(float4(output.position, 1.0f), camera);
Last time I was coding, I had barely started learning Direct3D9c. Currently I'm hitting about 30K single-texture quads lit with 15 lights at about 450fps. I haven't learned instancing or geometry shading at all yet, and I'm trying to prioritise the order I learn things in for my needs, so I've only taken glances at them.
My first thought was to reduce the amount of vertex data being shunted to the GPU, so I changed the vertex structure to a FLOAT2 (for texture coords) and an UINT (for indexing), relying on 4x float3 constants in the vertex shader to define the corners of the quads.
I figured I could reduce the size of the vertex data further, and reduced each vertex unit to a single UINT containing a 2bit index (to reference the real vertexes of the quad), and 2x 15bit fixed-point numbers (yes, I'm showing my age but fixed-point still has it's value) representing offsets into atlas textures.
So far, so good, but I know bugger all about Direct3D11 and HLSL so I've been wondering if there's a faster way.
Here's the current state of my vertex shader:
cbuffer CB_PROJ
matrix model;
matrix modelViewProj;
struct VOut
float3 position : POSITION;
float3 n : NORMAL;
float2 texcoord : TEXCOORD;
float4 pos : SV_Position;
static const float3 position[4] = { -0.5f, 0.0f,-0.5f,-0.5f, 0.0f, 0.5f, 0.5f, 0.0f,-0.5f, 0.5f, 0.0f, 0.5f };
// 00-01 . uint2b == Vertex index (0-3)
// 02-17 . fixed1p14 == X offset into atlas texture(s)
// 18-31 . fixed1p14 == Y offset into atlas texture(s)
VOut main(uint bitField : BLENDINDICES) {
VOut output;
const uint i = bitField & 0x03u;
const uint xStep = (bitField >> 2) & 0x7FFFu;
const uint yStep = (bitField >> 17);
const float xDelta = float(xStep) * 0.00006103515625f;
const float yDelta = float(yStep) * 0.00006103515625f;
const float2 texCoord = float2(xDelta, yDelta);
output.position = (float3) mul(float4(position[i], 1.0f), model);
output.n = mul(float3(0.0f, 1.0f, 0.0f), (float3x3) model);
output.texcoord = texCoord;
output.pos = mul(float4(output.position, 1.0f), modelViewProj);
return output;
My pixel shader for completeness:
Texture2D Texture : register(t0);
SamplerState Sampler : register(s0);
struct LIGHT {
float4 lightPos; // .w == range
float4 lightCol; // .a == flags
cbuffer cbLight {
LIGHT l[16] : register(b0); // 256 bytes
static const float3 ambient = { 0.15f, 0.15f, 0.15f };
float4 main(float3 position : POSITION, float3 n : NORMAL, float2 TexCoord : TEXCOORD) : SV_Target
const float4 Texel = Texture.Sample(Sampler, TexCoord);
if (Texel.a < 0.707106f) discard; // My source images have their alpha values inverted.
float3 result = { 0.0f, 0.0f, 0.0f };
for (uint xx = 0 ; xx < 16 && l[xx].lightCol.a != 0xFFFFFFFF; xx++)
const float3 lCol = l[xx].lightCol.rgb;
const float range = l[xx].lightPos.w;
const float3 vToL = l[xx].lightPos.xyz - position;
const float distToL = length(vToL);
if (distToL < range * 2.0f)
const float att = min(1.0f, (distToL / range + distToL / (range * range)) * 0.5f);
const float3 lum = Texel.rgb * saturate(dot(vToL / distToL, n)) * lCol;
result += lum * (1.0f - att);
return float4(ambient * Texel.rgb + result, Texel.a);
And the rather busy looking C function to generate the vertex data (all non-relevant functions removed):
al16 struct CLASS_PRIMITIVES {
ID3D11Buffer* pVB = { NULL, NULL }, * pIB = { NULL, NULL };
const UINT strideV1 = sizeof(VERTEX1);
void CreateQuadSet1(ui32 xSegs, ui32 ySegs) {
al16 VERTEX1* vBuf;
al16 D3D11_BUFFER_DESC bd = {};
ui32 index = 0, totalVerts = xSegs * ySegs * 4;
if (pVB) return;
vBuf = (VERTEX1*)_aligned_malloc(strideV1 * totalVerts, 16);
for (ui32 yy = ySegs; yy; yy--)
for (ui32 xx = 0; xx < xSegs; xx++) {
double dyStep2 = 16384.0 / double(ySegs); double dyStep1 = dyStep2 * double(yy); dyStep2 *= double(yy - 1);
ui32 yStep1 = dyStep1;
yStep1 <<= 17;
ui32 yStep2 = dyStep2;
yStep2 <<= 17;
vBuf[index].b = 0 + (ui32(double(16384.0 / double(xSegs) * double(xx))) << 2) + yStep1;
vBuf[index].b = 1 + (ui32(double(16384.0 / double(xSegs) * double(xx))) << 2) + yStep2;
vBuf[index].b = 2 + (ui32(double(16384.0 / double(xSegs) * double(xx + 1))) << 2) + yStep1;
vBuf[index].b = 3 + (ui32(double(16384.0 / double(xSegs) * double(xx + 1))) << 2) + yStep2;
bd.BindFlags = D3D11_BIND_VERTEX_BUFFER;
bd.CPUAccessFlags = 0;
bd.ByteWidth = strideV1 * totalVerts;
bd.StructureByteStride = strideV1;
srd.pSysMem = vBuf;
hr = dev->CreateBuffer(&bd, &srd, &pVB);
if (hr != S_OK) ThrowError();
void DrawQuadFromSet1(ui32 offset) {
offset *= sizeof(VERTEX1) * 4;
devcon->IASetVertexBuffers(0, 1, &pVB, &strideV1, &offset);
devcon->Draw(4, 0);
void DestroyQuadSet() {
if (pVB) pVB->Release();
It's all functioning as it should, but it just seems like I'm resorting to hacks to achieve my goal. Surely there's a faster way? Using DrawIndexed() consistently dropped the frame-rate by 1% so I switched back to non-indexed Draw calls.
reducing vertex data down to 32bits per vertex is as far as the GPU will allow
You seem to think that vertex buffer sizes are what's holding you back. Make no mistake here, they are not. You have many gigs of VRAM to work with, use them if it will make your code faster. Specifically, anything you're unpacking in your shaders that could otherwise be stored explicitly in your vertex buffer should probably be stored in your vertex buffer.
I am wondering if anyone has experience with using geometry shaders to auto-generate quads
I'll stop you right there, geometry shaders are very inefficient in most driver implementations, even today. They just aren't used that much so nobody bothered to optimize them.
One quick thing that jumps at me is that you're allocating and freeing your system-side vertex array every frame. Building it is fine, but cache the array, C memory allocation is about as slow as anything is going to get. A quick profiling should have shown you that.
Your next biggest problem is that you have a lot of branching in your pixel shader. Use standard functions (like clamp or mix) or blending to let the math cancel out instead of checking for ranges or fully transparent values. Branching will absolutely kill performance.
And lastly, make sure you have the correct hints and usage on your buffers. You don't show them, but they should be set to whatever the equivalent of GL_STREAM_DRAW is, and you need to ensure you don't corrupt the in-flight parts of your vertex buffer. Future frames will render at the same time as the current one as long as you don't invalidate their data by overwriting their vertex buffer, so instead use a round-robin scheme to allow as many vertices as possible to survive (again, use memory for performance). Personally I allocate a very large vertex buffer (5x the data a frame needs) and write it sequentially until I reach the end, at which point I orphan the whole thing and re-allocate it and start from the beginning again.
I think your code is CPU bound. While your approach has very small vertices, you have non-trivial API overhead.
A better approach is rendering all quads with a single draw call. I would probably use instancing for that.
Assuming you want arbitrary per-quad size, position, and orientation in 3D space, here’s one possible approach. Untested.
Vertex buffer elements:
struct sInstanceData
// Center of the quad in 3D space
XMFLOAT3 center;
// XY coordinates of the sprite in the atlas
uint16_t spriteX, spriteY;
// Local XY vectors of the quad in 3D space
// length of the vectors = half width/height of the quad
XMFLOAT3 plusX, plusY;
Input layout:
Vertex shader:
cbuffer Constants
matrix viewProj;
// Pass [ 1.0 / xSegs, 1.0 / ySegs ] in that field
float2 texcoordMul;
struct VOut
float3 position : POSITION;
float3 n : NORMAL;
float2 texcoord : TEXCOORD;
float4 pos : SV_Position;
VOut main( uint index: SV_VertexID,
float3 center : QuadCenter, uint2 texcoords : SpriteIndex,
float3 plusX : QuadPlusX, float3 plusY : QuadPlusY )
VOut result;
float3 pos = center;
int2 uv = ( int2 )texcoords;
// No branches are generated in release builds;
// only conditional moves are there
if( index & 1 )
pos += plusX;
pos -= plusX;
if( index & 2 )
pos += plusY;
pos -= plusY;
result.position = pos;
result.n = normalize( cross( plusX, plusY ) );
result.texcoord = ( ( float2 )uv ) * texcoordMul;
result.pos = mul( float4( pos, 1.0f ), viewProj );
return result;
UINT stride = sizeof( sInstanceData );
UINT off = 0;
context->IASetVertexBuffers( 0, 1, &vb, &stride, &off );
context->DrawInstanced( 4, countQuads, 0, 0 );
I've been trying to add this post-processing (taken from sebastian lague video which I am trying to convert from unity to threejs) effect that when a ray hits the ocean on my mesh (the blue):
it is colored white (just like in his video):
and everywhere else the original color is returned. But for the life of me can't seem to figure out the problem, I assume my ray origin or direction might be wrong but nothing seems to work, Here's the code that I pass to the ray Sphere intersection function and the function itself.
vec2 raySphere(vec3 centre, float radius, vec3 rayOrigin, vec3 rayDir) {
vec3 offset = rayOrigin - centre;
float a = 1.0; // set to dot(rayDir, rayDir) instead of rayDir may not be normalized
float b = 2.0 * dot(offset, rayDir);
float c = dot(offset, offset) - radius * radius;
float discriminant = b*b-4.0*a*c;
// No intersection: discriminant < 0
// 1 intersection: discriminant == 0
// 2 intersection: discriminant > 0
if(discriminant > 0.0) {
float s = sqrt(discriminant);
float dstToSphereNear = max(0.0, (-b - s) / (2.0 * a));
float dstToSphereFar = (-b + s) / (2.0 * a);
if (dstToSphereFar >= 0.0) {
return vec2(dstToSphereNear, dstToSphereFar-dstToSphereNear);
return vec2(99999999, 0.0);
vec4 ro = inverse(modelMatrix) * vec4(cameraPosition, 1.0);
vec3 rd = normalize(position - ro.xyz);
vec3 oceanCentre = vec3(0.0, 0.0, 0.0);
float oceanRadius = 32.0;
vec2 hitInfo = raySphere(oceanCentre, oceanRadius, ro.xyz, rd);
float dstToOcean = hitInfo.x;
float dstThroughOcean = hitInfo.y;
vec3 rayOceanIntersectPos = ro.xyz + rd * dstToOcean - oceanCentre;
// dst that view ray travels through ocean (before hitting terrain / exiting ocean)
float oceanViewDepth = min(dstThroughOcean, depth - dstToOcean);
vec4 oceanCol;
float alpha;
if(oceanViewDepth > 0.0) {
gl_FragColor = vec4(vec3(1.0), .1);
gl_FragColor = texture2D(tDiffuse, vUv);
Can someone help point out where I might be messing up?
Oh wow, we're in the same place while we're stuck at making these shaders. I checked your ray intersectors have small problems. But here is the cases:
What we want if case 3 happens like on your example, so the intersection are in count the problem probably come from no depth correction by doing this:
Make sure your sphere intersection max depth same as the camera.
I do suspect if the last line is the problem, try do this:
vec3 col; // Declare the color
vec2 o = sphere(ro, rd, vec3(0), 1.0); // Ocean Depth.
float oceanViewDepth = min(o.y - o.x, t - o.x);
if(depth > 0.0 && tmax > depth) {
col = originalCol;
if(oceanViewDepth > 0.0) {
col = vec3(1);
gl_FragColor = vec4(col, 1.0);
If that doesn't work for you I have some finished example for you to checkout at shadertoy
I tried to render a triangle with a tessellation shader. Now, without the tessellation shaders, the triangle renders fine. As soon as I add the tessellation shaders, I get a blank screen. I have also written glPatchParameteri(GL_PATCH_VERTICES,3) before glDrawArrays(GL_PATCHES,0,3).
Here are the shaders:
#version 440 core
layout (location = 0) in vec2 apos;
out vec2 pos;
void main()
//gl_Position = vec4(apos,1.0f,1.0f); without tessellation shaders
pos = apos;
#version 440 core
layout (vertices = 3) out;
in vec2 pos[];
out vec2 EsPos[];
void main()
EsPos[gl_InvocationID] = pos[gl_InvocationID];
gl_TessLevelOuter[0] = 3.0f;
gl_TessLevelOuter[1] = 3.0f;
gl_TessLevelOuter[2] = 3.0f;
gl_TessLevelInner[0] = 3.0f;
#version 440 core
layout (triangles, equal_spacing, ccw) in;
in vec2 EsPos[];
vec2 finalpos;
vec2 interpolate2D(vec2 v0, vec2 v1);
void main()
finalpos = interpolate2D(EsPos[0],EsPos[1]);
gl_Position = vec4(finalpos,0.0f,1.0f);
vec2 interpolate2D(vec2 v0, vec2 v1)
return (vec2(gl_TessCoord.x)*v0 + vec2(gl_TessCoord.y)*v1);
#version 440 core
out vec4 Fragment;
void main()
Fragment = vec4(0.0f,1.0f,1.0f,1.0f);
I made changes in the interpolate2D function, but still I am getting a blank screen.
The output patch size of the Tessellation Evaluation shader is 3:
layout (vertices = 3) out;
Thus the length of the input array to the Tessellation Control Shader is 3, too. Furthermore, the abstract patch type is triangles,
layout (triangles, equal_spacing, ccw) in;
thus the tessellation coordinate (gl_TessCoord) is a Barycentric coordinate. Change the interpolation:
vec2 interpolate2D(vec2 v0, vec2 v1, vec2 v2)
return v0*gl_TessCoord.x + v1*gl_TessCoord.y + v2*gl_TessCoord.z;
void main()
finalpos = interpolate2D(EsPos[0], EsPos[1], EsPos[2]);
// [...]
I wrote a fragment shader that samples a different texture for each voronoi cell. Right now I loop through all positions for each pixel, which is super inefficient.
Any tipps on how to optimize this? I need to run 2 x 1080p with 1000 "cells" - which is my cpu max for box2d.
maybe draw the cells within vertex shader and then sample on them? I am quite new to this, any hints appreciated!
weird sizing (*20 etc) due to my large box2d world for testing.
Texture2DArray texArray <string uiname="Texture Array";>;
Texture2D tex <string uiname="Texture";>;
int id;
int scale= 20;
SamplerState linearSampler : IMMUTABLE
AddressU = Clamp;
AddressV = Clamp;
cbuffer cbPerDraw : register( b0 )
cbuffer cbPerObj : register( b1 )
float4x4 tW : WORLD;
StructuredBuffer<float2> posBuffer;
StructuredBuffer<int> idBuffer;
struct vsInput
float4 PosO : POSITION;
float4 TexCd : TEXCOORD0;
struct psInput
float4 PosWVP: SV_Position;
float4 TexCd: TEXCOORD0;
psInput VS(vsInput In)
return In;
float4 PS(psInput In): SV_Target
uint count, stride;
posBuffer.GetDimensions(count, stride);
float minDist = 100;
float2 uvRaw = In.TexCd.xy;
float2 uv = ( uvRaw -.5) * 20;
float4 col = 1;
uint id;
for (uint i=0; i<count; i++)
id = idBuffer[i];
float2 p = posBuffer[i]*1;
float d = length(uv-p) * .2;
if (d < minDist)
minDist = d;
col = texArray.SampleLevel(linearSampler, float3(uvRaw - p *0.05, i), 0);
return col;
technique10 Constant
pass P0
SetVertexShader( CompileShader( vs_5_0, VS() ) );
SetPixelShader( CompileShader( ps_5_0, PS() ) );
The most common optimization for generating Voronoi noise is to divide the texture into a grid with 1 point in each cell, find the grid cell of the current fragment, and then only compare the distance against this cell and its 8 neighbors. So basically, you should be able to store your points in a 2D array, and then find the cell index by dividing and flooring the UVs using the cell size. Sebastian Lague touched on this in his video on cloud rendering, you can check it out here, he also made the source code available on GitHub.
I am trying to pass a large amount of information to my fragment shader but I always reach a limit (too many textures binded, texture too large, etc., array too large, etc.). I use a ThreeJS custom shader.
I have a 256*256*256 rgba volume that I want to pass to my shader.
In my shader, I want to map the fragments's world position to a voxel in this 256*256*256 volume.
Is there a good strategy to deal with this amount of information? Which would be the best pratice? Is there any good workaround?
My current approach is to generate 4 different 2048x2048 rgba texture containing all the data I need.
To create each 2048x2048 texture, I just push every row of every slice sequencially to a big array and split this array in 2048x2048x4 chuncks, which are my textures:
var _imageRGBA = new Uint8Array(_dims[2] *_dims[1] * _dims[0] * 4);
for (_k = 0; _k < _dims[2]; _k++) {
for (_j = 0; _j < _dims[1]; _j++) {
for (_i = 0; _i < _dims[0]; _i++) {
_imageRGBA[4*_i + 4*_dims[0]*_j + 4*_dims[1]*_dims[0]*_k] = _imageRGBA[4*_i + 1 + 4*_dims[0]*_j + 4*_dims[1]*_dims[0]*_k] = _imageRGBA[4*_i + 2 + 4*_dims[0]*_j + 4*_dims[1]*_dims[0]*_k] = _imageN[_k][_j][_i];//255 * i / (_dims[2] *_dims[1] * _dims[0]);
_imageRGBA[4*_i + 3 + 4*_dims[0]*_j + 4*_dims[1]*_dims[0]*_k] = 255;
Each texture looks something like that:
On the shader side, I try to map a fragment's worldposition to an actual color from the texture:
Vertex shader:
uniform mat4 rastoijk;
varying vec4 vPos;
varying vec2 vUv;
void main() {
vPos = modelMatrix * vec4(position, 1.0 );
vUv = uv;
gl_Position = projectionMatrix * modelViewMatrix * vec4(position, 1.0 );
Fragment shader:
<script id="fragShader" type="shader">
vec4 getIJKValue( sampler2D tex0, sampler2D tex1, sampler2D tex2, sampler2D tex3, vec3 ijkCoordinates, vec3 ijkDimensions) {
// IJK coord to texture
float textureSize = 2048.0;
float index = ijkCoordinates[0] + ijkCoordinates[1]*ijkDimensions[0] + ijkCoordinates[2]*ijkDimensions[0]*ijkDimensions[1];
// map index to right 2048 x 2048 slice
float sliceIndex = floor(index / (textureSize*textureSize));
float inTextureIndex = mod(index, textureSize*textureSize);
// get row in the texture
float rowIndex = floor(inTextureIndex/textureSize);
float colIndex = mod(inTextureIndex, textureSize);
// map indices to u/v
float u = colIndex/textureSize;
float v =1.0 - rowIndex/textureSize;
vec2 uv = vec2(u,v);
vec4 ijkValue = vec4(0, 0, 0, 0);
if(sliceIndex == float(0)){
ijkValue = texture2D(tex0, uv);
else if(sliceIndex == float(1)){
ijkValue = texture2D(tex1, uv);
else if(sliceIndex == float(2)){
ijkValue = texture2D(tex2, uv);
else if(sliceIndex == float(3)){
ijkValue = texture2D(tex3, uv);
return ijkValue;
uniform mat4 rastoijk;
uniform sampler2D ijk00;
uniform sampler2D ijk01;
uniform sampler2D ijk02;
uniform sampler2D ijk03;
uniform vec3 ijkDimensions;
varying vec4 vPos;
varying vec2 vUv;
void main(void) {
// get IJK coordinates of current element
vec4 ijkPos = rastoijk * vPos;
// show whole texture in the back...
vec3 color = texture2D(ijk00, vUv).rgb;
//convert IJK coordinates to texture coordinates
if(int(floor(ijkPos[0])) > 0
&& int(floor(ijkPos[1])) > 0
&& int(floor(ijkPos[2])) > 0
&& int(floor(ijkPos[0])) < int(ijkDimensions[0])
&& int(floor(ijkPos[1])) < int(ijkDimensions[1])
&& int(floor(ijkPos[2])) < int(ijkDimensions[2])){
// try to map IJK to value...
vec3 ijkCoordinates = vec3(floor(ijkPos[0]), floor(ijkPos[1]), floor(ijkPos[2]));
vec4 ijkValue = getIJKValue(ijk00, ijk01, ijk02, ijk03, ijkCoordinates, ijkDimensions);
color = ijkValue.rgb;
gl_FragColor = vec4(color, 1.0);
// or discard if not in IJK bounding box...
That doesn't work well. I now get an image with weird artifacts (nyquist shannon effect?). As I zoom in, the image appears. (even though not perfect, some black dots)
Any help advices would be greatly appreciated. I also plan to do some raycasting for volume rendering using this approach (very needed in the medical field)
The approach to handle large arrays using multiple textures was fine.
The issue was how I was generating the texture with THREE.js.
The texture was generated using the default linear interpolation: http://threejs.org/docs/#Reference/Textures/DataTexture
What I needed was nearest neighboor interpolation. This was, the texture is still pixelated and we can access the real IJK value (not an interpolated value)
Found it there: http://www.html5gamedevs.com/topic/8109-threejs-custom-shader-creates-weird-artifacts-space-between-faces/
texture = new THREE.DataTexture( textureData, tSize, tSize, THREE.RGBAFormat, THREE.UnsignedByteType, THREE.UVMapping,
THREE.ClampToEdgeWrapping, THREE.ClampToEdgeWrapping, THREE.NearestFilter, THREE.NearestFilter );