OpenGL rendering quality vs. number of vertices - c

I am coding a modern OpenGL application to visualize 3d atomic models (molecules, periodic systems ...) for chemistry and condense matter physics.
I started to work on this few years ago, the first version of my program was in old OpenGL now I am updating it to modern OpenGL.
I come with a question regarding the quality of the rendering of the OpenGL window. In the following examples I draw 3D cylinders and 3D spheres using instanced drawing, in this model to render the bonds I only draw one cylinder, then I translate/scale/rotate it properly in the vertex shader
to render all bonds, same goes for the sphere to render the atoms.
As you can see it works just fine, and the efficiency of the method is amazing and I can render models with hundreds of thousand of atoms smoothly.
However I noticed something weird, that somehow the quality of the rendering seems to be dependent on the number of vertices (objects, atoms and bonds) in the scene, obviously the number of triangles is the most important parameter but not the only one ... since the quality decrease when a lot of vertices are rendered ... please see the attached snapshots:
To render the spheres in the scene I am using 50x50 vertices, and 2x50 for the cylinders (GL_TRIANGLE_STRIP in both cases)
1) In this test model I load: 96 atoms, 512 half bonds, : ~ 291200 vertices:
2) I zoom in to focus on one selected atom and it surrounding, at this scale the result is impeccable:
3) I reset the view and use the builder in my program to increase the number of boxes
(I am simply doing replicas in the 3 direction of space) here I choose to do 20x20x20 replicas,
see the result bellow, the original box is highlighted.
In that scene there are 768000 atoms, 4096000 half-bonds, and thus: 291200x20x20x20 = 2329600000 vertices
quite a lot, yet it works, but something weird appears ...
4) I zoom in again on that particular area of the model I picked before and there is a decrease in quality in particular
in the areas where 3D objects (spheres/cylinders) superimpose/overlap ...
Can somebody explain to me what I see ?
Note 1: In the same window I can decrease the number of replicas back to the original box, zoom again
and see that the result is back to impeccable.
Note 2: the older version of my program still works fine (old OpenGL, using display list with glutsphere and glutcylinders),
I can do the same things, the rendering will take much much longer, but at the end of the process when I zoom in on the 20x20x20
boxes model, the results remains perfect, like for the single box model, and obviously I use same graphic card, driver and else.

Can somebody explain to me what I see ?
You're seeing the limited precision of the depth buffer. There are only so many bits you can work with and in a perspective projection a lonlinear scaling from Z distance to depth buffer value is applied.
The best course of action is to limit the near/depth range of the perspective projection matrix to what's going to be actually visible on screen, to make better use of the depth buffer. Also it's possible to linearize the depth buffer (but that comes with a performance hit). Also you could try to cleanly intersect the geometry where sticks and spheres meet, i.e. constrain the sphere's vertices to the cylinder surface where the sticks and similarly constrain the sticks' end vertices to the sphere where they meet. That way you avoid overlap and hence these artifacts.

Related

Is it possible to get a "SCNVector3" position of a World object using CoreML and ARKit?

I am working on a AR based solution in which I am rendering some 3D models using SceneKit and ARKit. I have also integrated CoreML to identify objects and render corresponding 3D objects in scene.
But right now I am just rendering it in the center of screen as soon I detect the object(Only for the list of objects that I have). Is it possible to get the position of the real world object so that I can show some overlay above the object?
That is if I have a water bottled scanned, I should able to get the position of the water bottle. It could be anywhere in the water bottle but shouldn't go outside of it. Is this possible using SceneKit?
All parts of what you ask are theoretically possible, but a) for several parts, there’s no integrated API to do things for you, and b) you’re probably signing yourself up for a more difficult problem than you think.
What you presumably have with your Core ML integration is an image classifier, as that’s what most of the easy to find ML models do. Image classification answers one question: “what is this a picture of?”
What you’re looking for involves at least two additional questions:
“Given that this image has been classified as containing (some specific object), where in the 2D image is that object?”
“Given the position of a detected object in the 2D video image, where is it in the 3D space tracked by ARKit?”
Question 1 is pretty reasonable. There are models that do both classification and detection (location/bounds within an image) in the ML community. Probably the best known one is YOLO — here’s a blog post about using it with Core ML.
Question 2 is the “research team and five years” part. You’ll notice in the YOLO papers that it gives you only coarse bounding boxes for detected objects — that is, it’s working in 2D image space, not doing 3D scene reconstruction.
To really know the shape, or even the 3D bounding box of an object means integrating object detection with scene reconstruction. For example, if an object has some height in the 2D image, are you looking at a 3D object that’s tall with a small footprint, or one that’s long and low, receding into the distance? Such integration would require taking apart the inner workings of ARKit, which nobody outside Apple can do, or recreating an ARKit-alike from scratch.
There might be some assumptions you can make to get very rough estimates of 3D shape from a 2D bounding box, though. For example, if you do AR hit tests on the lower corners of a box and find that they’re on a horizontal plane, you can guess that the 2D height of the box is proportional to the 3D height of the object, and that its footprint on the plane is proportional to the box’s width. You’d have to do some research and testing to see if assumptions like that hold up, especially in whatever use cases your app covers.

How to measure horizontal plane surface(visible in camera) using ARKit-Scenekit before placing objects?

I want to measure the horizontal plane surface to find whether it fits the object that i am going to place. For ex. if i am going to place a cot 3D model(with fixed size) in a room using iOS 11 ARKit,
First i want to detect if that room surface is sufficient or not to place my 3D model by measuring the surface area(width and height etc.)
Second if the user tries to place it without sufficient place, i should not allow him to place the cot and show him error message.
I created a sample POC by following https://developer.apple.com/sample-code/wwdc/2017/PlacingObjects.zip using which i am able to detect the horizontal plane and place the cot. But the issue is whatever may be the surface, user is able to place the cot which shouldn't be allowed in real time.
I saw couple of demos in which they say we can measure the size of the room or a horizontal plane(https://www.curbed.com/2017/6/29/15894556/ar-measure-app-augmented-reality-ruler-measuring-tape-ios)
I am using ARKit Scenekit inorder to achieve this and i am new to AR and Scenekit. I need to know if this is doable, and if so how to achieve it.
You could estimate the size of a detected plane by inspecting its dimensions. But you shouldn't.
ARKit has plane estimation, not scene reconstruction. That is, it'll tell you there's a flat surface at (some point) and that said surface probably extends at least (some distance) from that point. It doesn't know exactly how big the surface is (it's even refining its estimate over time), and it doesn't tell you where there are interruptions in that continuous surface, much less the size and shape of such interruptions.
In fact, if you're looking at the floor and moving around, and you see one patch of floor, then another patch of floor on the other side of a solid wall from the first, ARKit will happily recognize that those two patches are coplanar and merge them into the same anchor. At the same time, neither detected patch may cover the entire extent of the floor around it.
If you attempt to restrict where the user can place virtual objects in AR based on plane estimates, you're likely to frustrate them with two kinds of error: you'll have areas where it looks to the user like they can place something but that don't allow it, and you'll have areas that look like they should be off-limits that do allow placing things.
Instead, design your experience to involve the user in deciding where the sensible places for content are. See this demo for example — ARKit detects the level of the floor (not its boundaries), then uses that to show UI indicating the size/shape of objects to be placed. It's up to the user to make sure there's enough room for the couch, etc.
As for the technical how-to on what you probably shouldn't do: The docs for ARPlaneAnchor.extent say that the x and z coordinates of that vector are the width and length of the estimated plane. And all units in ARKit are meters. (Which is width and which is length? It's a matter of perspective. And of the rotation encoded in the anchor's transform.)

Occlusion culling 3D transformed 2D rectangles?

So, to start off, I'm not very good at computer graphics. I'm trying to implement a GUI toolkit where one of the features is being able to apply 3D transformations to 2D "layers". (a layer only has one Z coordinate, as pre-transform, it's a two dimensional axis aligned rectangle)
Now, this is pretty straightforward, until you come to 3D transformations that would push the layer back, requiring splitting the layer into several polygons in order to render it correctly, as illustrated here. And because we can have transparency, layers may not get completely occluded, while still requiring getting split.
So here is an illustration depicting the issue and the desired outcome. In this scenario, the blue layer (call it B) is on top of the red layer (R), while having the same Z position (but B was added after R). In this scenario, if we rotate B, its top two points will get a Z index lower than 0 while the bottom points will get an index higher than 0 (with the anchor point being the only point/line left as 0).
Can somebody suggest a good way of doing this on the CPU? I've struggled to find a suitable algorithm implementation (in C++ or C) that would be appropriate to this scenario.
Edit: To clarify myself, at this stage in the pipeline, there is no rendering yet. We just need to produce a set of polygons for each layer that would then represent the layer's transformed and occluded geometry. Then, if required, rendering (either software or hardware) is done if required, which is not always the case (for example, when doing hit testing).
Edit 2: I looked at binary space partitioning as an option of achieving this but I have only been able to find one implementation (in GL2PS), which I'm not sure how to use. I do have a vague understanding of how BSPs work, but I'm not sure how they can be used for occlusion culling.
Edit 3: I'm not trying to do colour and transparency blending at this stage. Just pure geometry. Transparency can be handled by the renderer, and overdraw is okay. In this case, the blue polygon can just be drawn under the red one, but with more complicated cases, depth sorting or even splitting up the polygons may be required (example of a scary case like that below). Although the viewport is fixed, because all layers can be transformed in 3D, creating a shape shown below is possible.
So what I'm really looking for is an algorithm that would geometrically split layer B into two blue shapes, one of which would be drawn "above" and one of which would be drawn below R. The part "below" would get overdraw, yes, but it's not a major issue. So B just need to be split into two polygons so it would appear to cut through R when those polygons are drawn in order. No need to worry about blending.
Edit 4: For the purpose of this, we cannot render anything at all. This all has to be done purely geometrically (producing 2D polygons). This is what I was originally getting at.
Edit 5: I should note that the overall number of quads per subscene is around 30 (average). Definitely won't go above 100. Unless the layers are 3D transformed (which is where this problem arises), they are just radix sorted by Z positions before being drawn. Layers with the same Z position are drawn in order in which they were added (first in, first out).
Sorry if I didn't make it clear in the original question.
If you "aren't good with computer graphics", Doing it on CPU (software rendering) will be extremely difficult for you, if polygons can be transparent.
The easiest way to do it is to use GPU rendering (OpenGL/Direct3D) with Depth Peeling technique.
Cpu solutions:
Soltuion #1 (extremely difficult):
(I forgot the name of this algorithm).
You need to split polygon B into two, - for example, using polygon A as clip plane, then render result using painter's algorithm.
To do that you'll need to change your rendering routines so they'll no longer use quads, but textured polygons, plus you'll have to write/debug clipping routines that'll split triangles present in scene in such way that they'll no longer break paitner's algorithm.
Big Problem: If you have many polygons, this solution can potentially split scene into infinite number of triangles. Also, writing texture rendering code yourself isn't much fun, so it is advised to use OpenGL/Direct3D.
This can be extremely difficult to get right. I think this method was discussed in "Computer Graphics Using OpenGL 2nd edition" by "Francis S. Hill" - somewhere in one of their excercises.
Also check wikipedia article on Hidden Surface Removal.
Solution #2 (simpler):
You need to implement multi-layered z-buffer that stores up to N transparent pixels and their depth.
Solution #3 (computationally expensive):
Just use ray-tracing. You'll get perfect rendering result (no limitations of depth peeling and cpu solution #2), but it'll be computationally expensive, so you'll need to optimize rendering routines a lot.
Bottom line:
If you're performing software rendering, use Solution #2 or #3. If you're rendering on hardware, use technique similar to depth-peeling, or implement raytracing on hardware.
--edit-1--
required knowledge for implementing #1 and #2 is "line-plane intersection". If you understand how to split line (in 3d space) into two using a plane, you can implement raytracing or clipping easily.
Required knowledge for #2 is "textured 3d triangle rendering" (algorithm). It is a fairly complex topic.
In order to implement GPU solution, you need to be able to find few OpenGL tutorials that deal with shaders.
--edit-2--
Transparency is relevant, because in order to get transparency right, you need to draw polygons from back to front (from farthest to closest) using painter's algorithms. Sorting polygons properly is impossible in certain situation, so they must be split, or you should use one of the listed techniques, otherwise in certain situations there will be artifacts/incorrectly rendered images.
If there's no transparency, you can implement standard zbuffer or draw using hardware OpenGL, which is a very trivial task.
--edit-3--
I should note that the overall number of quads per subscene is around 30 (average). Definitely won't go above 100.
If you will split polygons, it can easily go way above 100.
It might be possible to position polygons in such way that each polygon will split all others polygon.
Now, 2^29 is 536870912, however, it is not possible to split one surface with a plane in such way that during each split number of polygons would double. If one polygon is split 29 timse, you'll get 30 polygons in the best-case scenario, and probably several thousands in the worst case if splitting planes aren't parallel.
Here's rough algorithm outline that should work:
Prepare list of all triangles in scene.
Remove back-facing triangles.
Find all triangles that intersect each other in 3d space, and split them using line of intersection.
compute screen-space coordinates for all vertices of all triangles.
Sort by depth for painter's algorithm.
Prepare extra list for new primitives.
Find triangles that overlap in 2D (post projection) screen space.
For all overlapping triangles check their rendering order. Basically a triangle that is going to be rendered "below" another triangles should have no part that is above another triangle.
8.1. To do that, use camera origin point and triangle edges to split original triangles into several sub-regions, then check if regions conform to established sort order (prepared for painter's algorithm). Regions are created by splitting existing pair of triangles using 6 clip planes created by camera origin points and triangle edges.
8.2. If all regions conform to rendering order, leave triangles be. If they don't, remove triangles from list, and add them to the "new primitives" list.
IF there are any primitives in new primitives list, merge the list with triangle list, and go to #5.
By looking at that algorithm, you can easily understand why everybody uses Z-buffer nowadays.
Come to think about it, that's a good training exercise for universities that specialize in CG. The kind of exercise that might make your students hate you.
I am going to come out say give the simpler solution, which may not fit your problem. Why not just change your artwork to prevent this problem from occuring.
In problem 1, just divide the polys in Maya or whatever beforehand. For the 3 lines problem, again, divide your polys at the intersections to prevent fighting. Pre-computed solutions will always run faster than on the fly ones - especially on limited hardware. From profesional experience, I can say that it also does scale, well it scales ok. It just requires some tweaking from the art side and technical reviews to make sure nothing is created "ilegally." You may end up getting more polys doing it this way than rendering on the fly, but at least you won't have to do a ton of math on CPUs that may not be up to the task.
If you do not have control over the artwork pipeline, this won't work as writing some sort of a converter would take longer than getting a BSP sub-division routine up and running. Still, KISS is often the best solution.

Blob detection in C (not with OPENCV)

I am trying to do my own blob detection who will receive a real time video, and try to detect a white paper sheet.
Even if is something written inside the paper. I need to detect the paper and is corner, because what i really want is to draw a opengl polygon over the paper in each corner of the paper will be a corner of the polygon. Then i need the coordinates of the paper to do other stuffs.
So i need to:
- detect a square white blob.
- get the coordinates of the cornes
- draw a polygon over the white sheet.
Any ideias how can i do that?
Much depends on context. For example, suppose that you:
know that the paper is always roughly centered (i.e. W/2, Y/2 is always inside the blob), and no more rotated than 45 degrees (30 would be better)
have a suitable border around the sheet so that the corners never touch the edges of the FOV
are able (through analysis of local variance, or if you're lucky, check of background color or luminance) to say whether a point is inside or outside the blob
the inside/outside function never fails (except possibly in the close vicinity of a border)
then you could walk a line from a point on the border (surely outside) and the center (surely inside), even through bisection, and find a point - an areal - on the edge.
Two edge points give a rect (two areals give a beam), two rects give an intersection (two beams give a larger areal) - and there's your corner. You should carry along the detection uncertainty (areal radius) in order to validate corners (another less elegant approach is to roughly calculate where the corner is, and pinpoint it with a spiral search or drunkard's walk).
This algorithm is amenable to parallelization and, as long as the hypotheses hold, should be really fast.
All that said, it remains a hack -- I agree with unwind, why reinvent the wheel? If you have memory or CPU constraints (embedded systems, etc.), I believe there ought to be OpenCV and e-Vision "lite" ports also for ARM and embedded platforms.
(Sorry for my terminology - I'm monkey-translating from Italian. "Areal" is likely to correspond to your "blob", a beam is the family of lines joining all couples of points in two different blobs, line intensity being the product of distance from a point from its areal's center)
I am trying to do my own blob detection who will receive a real time video, and try to detect a white paper sheet.
Your first shot could be a simple flood-fill. That is, select a good threshold to binarize the image and apply the algorithm. The threshold can be fixed if you know the paper is always brighter than X and the background is always darker than this. Or this can be an adaptive threshold, for example Otsu's method. OpenCV offers this for free.
If you'd need to speed it up you could use a union-find data structure.
Finally you'd need to come up with some heuristic how to identify the corners (e.g. the four extreme values in x/y direction).
Then i need [...] the coordinates of the cornes [...]
Then you don't need blob detection, but corner detection or contour detection in the first place. OpenCV has some nice functionality for exactly this.
If you can't use it, I would suggest to binarize the image as above and use a harris-detector to find the corners of the object.
OpenCV's TBB support could also come quite handy if you'd use it and you have problems to meet your real-time requirements.

Determining if a polygon is inside the viewing frustum

here are my questions. I heard that opengl ignores the vertices which are outside the viewing frustum and doesn't consider them in rendering pipeline. Recently I ran into a same post that said you should check this your self and if a point is not inside, it is you duty to find out not opengl's! Now,
Is this true about opengl? does it understand if a point is not inside, and not to render it?
I am developing a grass scene which has about 4000 grasses on rectangles. I have awful FPS, and the only solution I came up was to decide which grasses are inside the viewport and then only render them! My question here is that what solution is best for me to find out which rectangle is not inside or which one is?
Please consider that my question is not about points mainly but about rectangles. Also I need to sort the grasses based on their distance, so it is better if native on client side memory.
Please let me know if there are any effective and real-time ways to find out if any given mesh is inside or outside the frustum. Thanks.
Even if is true then OpenGL does not show polygons outside the frustum ( as any other 3d engines ) it has to consider them to check if there are inside or not and then fps slow down. Usually some smart optimization algorithm is needed to avoid flooding the scene with invisible objects. Check for example BSP trees+PVS or Portals as a starting point.
To check if there is some bottleneck in the application, you can try with gDebugger. If nothing is reasonable wrong optimizing in order to draw just the PVS ( possible visible set ) is the way to go.
OpenGL won't render pixels ("fragments") outside your screen, so it has to clip somehow...
More precisely :
You submit your geometry
You make a Draw Call (glDrawArrays or glDrawElements)
Each vertex goes through the vertex shader, which computes the final position of the vertex in camera space. If you didn't write a vertex shader (=old opengl), the driver will create one for you.
The perspective division transforms these coordinates in Normalized Device Coordinates. Roughly, its means that the frustum of your camera is deformed to fit in a [-1,1]x[-1,1]x[-1,1] box
Everything outside this box is clipped. This can mean completely discarding a triangle, or subdivide it if it is across a clipping plane
Each remaining triangle is rasterized into fragments
Each fragment goes through the fragment shader
So basically, OpenGL knows how to clip, but each vertex still has to go through the vertex shader. So submitting your entire world will work, of course, but if you can find a way not to submit everything, your GPU will be happier.
This is a tradeoff, of course. If you spend 10ms checking each and every patch of grass on the CPU so that the GPU has only the minimal amount of data to draw, it's not a good solution either.
If you want to optimize grass, I suggest culling big patches (5m x 5m or so). It's standard AABB-frustum testing.
If you want to optimize a more generic model, you can investigate quadtree for "flat" models, octrees and bsp-trees for more complex objects.
Yes, OpenGL does not rasterize triangles outsize the viewing frustrum. But, this doesn't mean that this is optimal for applications: OpenGL implementation shall transform the vertex coordinate (by using fixed pipeline or vertex shaders), then, having the normalized coordinates it finally knows whether the triangle lie inside the viewing frustrum.
This mean that no pixel is rasterized in that cases, but the vertex data is processed all the same; simply doesn't produce fragments derived from a non visible triangle!
The OpenGL extension ARB_occlusion_query may help you, but in the discussion section make it clear:
Do occlusion queries make other visibility algorithms obsolete?
No.
Occlusion queries are helpful, but they are not a cure-all. They
should be only one of many items in your bag of tricks to decide
whether objects are visible or invisible. They are not an excuse
to skip frustum culling, or precomputing visibility using portals
for static environments, or other standard visibility techniques.
For the question regarding the mesh sorting on depth, you shall use the depth buffer: essentially the mesh fragment is effectively rendered only if its distance from the viewport is less than the previous fragment in the same position. This make you aware of sorting meshes. This buffer is essentially free, and it allows you to improve performances since it discard more far fragments.
Yes. Like others have pointed out, OpenGL has to perform a lot of per-vertex operations to determine if it is in the frustum. It must do this for every vertex you send it. In addition to the processing overhead that must take place, keep in mind that there is also additional overhead in the transmission of those vertices from the CPU to the GPU. You want to avoid sending information to the GPU that it isn't going to use. Though the bandwidth between the CPU and GPU is quite good on modern hardware, there's still a limit.
What you want is a Scene Graph. Scene graphs are frequently implemented with some kind of spatial partitioning scheme, e.g., Quadtrees, Octrees, BSPTrees, etc etc. Spatial partitioning allows you to intelligently determine what geometries are visible. Instead of doing this on a per-vertex basis (like OpenGL is forced to do) it can eliminate huge spatial subsets of geometry at a time. When rendering a complex scene, the performance savings can be enormous.

Resources