Is it possible to get a "SCNVector3" position of a World object using CoreML and ARKit? - ios11

I am working on a AR based solution in which I am rendering some 3D models using SceneKit and ARKit. I have also integrated CoreML to identify objects and render corresponding 3D objects in scene.
But right now I am just rendering it in the center of screen as soon I detect the object(Only for the list of objects that I have). Is it possible to get the position of the real world object so that I can show some overlay above the object?
That is if I have a water bottled scanned, I should able to get the position of the water bottle. It could be anywhere in the water bottle but shouldn't go outside of it. Is this possible using SceneKit?

All parts of what you ask are theoretically possible, but a) for several parts, there’s no integrated API to do things for you, and b) you’re probably signing yourself up for a more difficult problem than you think.
What you presumably have with your Core ML integration is an image classifier, as that’s what most of the easy to find ML models do. Image classification answers one question: “what is this a picture of?”
What you’re looking for involves at least two additional questions:
“Given that this image has been classified as containing (some specific object), where in the 2D image is that object?”
“Given the position of a detected object in the 2D video image, where is it in the 3D space tracked by ARKit?”
Question 1 is pretty reasonable. There are models that do both classification and detection (location/bounds within an image) in the ML community. Probably the best known one is YOLO — here’s a blog post about using it with Core ML.
Question 2 is the “research team and five years” part. You’ll notice in the YOLO papers that it gives you only coarse bounding boxes for detected objects — that is, it’s working in 2D image space, not doing 3D scene reconstruction.
To really know the shape, or even the 3D bounding box of an object means integrating object detection with scene reconstruction. For example, if an object has some height in the 2D image, are you looking at a 3D object that’s tall with a small footprint, or one that’s long and low, receding into the distance? Such integration would require taking apart the inner workings of ARKit, which nobody outside Apple can do, or recreating an ARKit-alike from scratch.
There might be some assumptions you can make to get very rough estimates of 3D shape from a 2D bounding box, though. For example, if you do AR hit tests on the lower corners of a box and find that they’re on a horizontal plane, you can guess that the 2D height of the box is proportional to the 3D height of the object, and that its footprint on the plane is proportional to the box’s width. You’d have to do some research and testing to see if assumptions like that hold up, especially in whatever use cases your app covers.

Related

How to measure horizontal plane surface(visible in camera) using ARKit-Scenekit before placing objects?

I want to measure the horizontal plane surface to find whether it fits the object that i am going to place. For ex. if i am going to place a cot 3D model(with fixed size) in a room using iOS 11 ARKit,
First i want to detect if that room surface is sufficient or not to place my 3D model by measuring the surface area(width and height etc.)
Second if the user tries to place it without sufficient place, i should not allow him to place the cot and show him error message.
I created a sample POC by following https://developer.apple.com/sample-code/wwdc/2017/PlacingObjects.zip using which i am able to detect the horizontal plane and place the cot. But the issue is whatever may be the surface, user is able to place the cot which shouldn't be allowed in real time.
I saw couple of demos in which they say we can measure the size of the room or a horizontal plane(https://www.curbed.com/2017/6/29/15894556/ar-measure-app-augmented-reality-ruler-measuring-tape-ios)
I am using ARKit Scenekit inorder to achieve this and i am new to AR and Scenekit. I need to know if this is doable, and if so how to achieve it.
You could estimate the size of a detected plane by inspecting its dimensions. But you shouldn't.
ARKit has plane estimation, not scene reconstruction. That is, it'll tell you there's a flat surface at (some point) and that said surface probably extends at least (some distance) from that point. It doesn't know exactly how big the surface is (it's even refining its estimate over time), and it doesn't tell you where there are interruptions in that continuous surface, much less the size and shape of such interruptions.
In fact, if you're looking at the floor and moving around, and you see one patch of floor, then another patch of floor on the other side of a solid wall from the first, ARKit will happily recognize that those two patches are coplanar and merge them into the same anchor. At the same time, neither detected patch may cover the entire extent of the floor around it.
If you attempt to restrict where the user can place virtual objects in AR based on plane estimates, you're likely to frustrate them with two kinds of error: you'll have areas where it looks to the user like they can place something but that don't allow it, and you'll have areas that look like they should be off-limits that do allow placing things.
Instead, design your experience to involve the user in deciding where the sensible places for content are. See this demo for example — ARKit detects the level of the floor (not its boundaries), then uses that to show UI indicating the size/shape of objects to be placed. It's up to the user to make sure there's enough room for the couch, etc.
As for the technical how-to on what you probably shouldn't do: The docs for ARPlaneAnchor.extent say that the x and z coordinates of that vector are the width and length of the estimated plane. And all units in ARKit are meters. (Which is width and which is length? It's a matter of perspective. And of the rotation encoded in the anchor's transform.)

Difference between Point Sprites and Billboards

Can someone tell me the difference between Point Sprites and Billboards in OpenGL? I read a lot about both of them and I'm getting confused more and more about when to use which of them and whether there is actually a difference?
Wikipedia only knows about Sprites (Billboard redirects there):
In computer graphics, a sprite is a two-dimensional bitmap that is integrated into a larger scene, most often in a 2D video game. Originally, the term sprite referred to fixed-sized objects composited together, by hardware, with a background.3 Use of the term has since become more general.
One source states:
Sprite
A sprite is the traditional term given to a 2D image displayed in a game
Billboard
... you need to re-orient each particle so that it's facing the viewer.
This technique of re-orienting the sprites is called billboarding.
Another source:
Billboarding is a popular technique used in 3D graphics programming.
Billboarding allows an object (usually a quad) to always face a given
camera. Here are some common uses of billboarding:
– particles – halo surrounding an object – trees rendering
For the particular case of particles, the billboarding is a GPU
built-in feature when point-sprites are used (a single point is
transformed to a billboarded quad).
Yet another states that both face the camera, but billboards only rotate about their vertical axis (think trees).
Some references specifically for OpenGL:
https://learnopengl.com/In-Practice/2D-Game/Rendering-Sprites
https://learnopengl.com/In-Practice/2D-Game/Particles (billboarding)
https://www.opengl-tutorial.org/intermediate-tutorials/billboards-particles/billboards/
Live Examples by Three.js/WebGL (though I can't tell the difference):
https://threejs.org/examples/?q=billboard#webgl_points_billboards
https://threejs.org/examples/?q=sprite#webgl_points_sprites

2D CAD application in WPF

I'm trying to write an CAD-like application in WPF(.NET 4.0) that needs to be able to display a lot of 2D points/lines. It will be used to display CAD-plans of entire cities with zoom, pan, rotate and point snapping on mouseover.
Right now I purely use WPF. I read the objects from the CAD file draw them into a StreamGeometry, use it as stroke of a new Path and add it to a Canvas, with several transforms.
My problem is that this solution doesn't scale well enough. It works fine with small CAD-files, but when I want to display like half a city(with houses and land boundaries) it is very very delayed.
I also tried to convert my CAD-file to an image, but
- a resolution a 32000x32000 is sometimes not enough
- when zooming out the lines are too thin.
In the end I need to be able to place this on a Canvas(2D/3D) as background.
What are my best options here?
Thanks,
Niklas
wpf is not good for a large 3d models. im afraid it is too slow. Your best bet is direct 3d or openGL
However, even with the speed of direct3d,openGL you will still need to work out how to cull as many polygons/vertices as possible before the rendering of the scene if you are trying to show an entire city.
there is a large amount of information on this (generally under game development)
there are a few techniques including frustrum culling, near and far plane culling.
also, since you probably have a static scene you may be able to use binary spacial partitioning.
As I understand the subject is 2D CAD system within WPF.
Great! I use it...
OpenGL and DirectX are in infinite loop OnDraw always. The CPU works all the time.
WPF/Silverlight 2D is smart model.
Yes, total amount of elements (for example, primitives inherited from Shape) must be not so much. But how many?
I tested own app (Silverlight). WPF will be a bit faster I hope...
Here my 2D CAD results. Performance is still great. Each beam consists of multiple primitives.
Use a VirtualCanvas like this one from Chris Lovett.

Determining if a polygon is inside the viewing frustum

here are my questions. I heard that opengl ignores the vertices which are outside the viewing frustum and doesn't consider them in rendering pipeline. Recently I ran into a same post that said you should check this your self and if a point is not inside, it is you duty to find out not opengl's! Now,
Is this true about opengl? does it understand if a point is not inside, and not to render it?
I am developing a grass scene which has about 4000 grasses on rectangles. I have awful FPS, and the only solution I came up was to decide which grasses are inside the viewport and then only render them! My question here is that what solution is best for me to find out which rectangle is not inside or which one is?
Please consider that my question is not about points mainly but about rectangles. Also I need to sort the grasses based on their distance, so it is better if native on client side memory.
Please let me know if there are any effective and real-time ways to find out if any given mesh is inside or outside the frustum. Thanks.
Even if is true then OpenGL does not show polygons outside the frustum ( as any other 3d engines ) it has to consider them to check if there are inside or not and then fps slow down. Usually some smart optimization algorithm is needed to avoid flooding the scene with invisible objects. Check for example BSP trees+PVS or Portals as a starting point.
To check if there is some bottleneck in the application, you can try with gDebugger. If nothing is reasonable wrong optimizing in order to draw just the PVS ( possible visible set ) is the way to go.
OpenGL won't render pixels ("fragments") outside your screen, so it has to clip somehow...
More precisely :
You submit your geometry
You make a Draw Call (glDrawArrays or glDrawElements)
Each vertex goes through the vertex shader, which computes the final position of the vertex in camera space. If you didn't write a vertex shader (=old opengl), the driver will create one for you.
The perspective division transforms these coordinates in Normalized Device Coordinates. Roughly, its means that the frustum of your camera is deformed to fit in a [-1,1]x[-1,1]x[-1,1] box
Everything outside this box is clipped. This can mean completely discarding a triangle, or subdivide it if it is across a clipping plane
Each remaining triangle is rasterized into fragments
Each fragment goes through the fragment shader
So basically, OpenGL knows how to clip, but each vertex still has to go through the vertex shader. So submitting your entire world will work, of course, but if you can find a way not to submit everything, your GPU will be happier.
This is a tradeoff, of course. If you spend 10ms checking each and every patch of grass on the CPU so that the GPU has only the minimal amount of data to draw, it's not a good solution either.
If you want to optimize grass, I suggest culling big patches (5m x 5m or so). It's standard AABB-frustum testing.
If you want to optimize a more generic model, you can investigate quadtree for "flat" models, octrees and bsp-trees for more complex objects.
Yes, OpenGL does not rasterize triangles outsize the viewing frustrum. But, this doesn't mean that this is optimal for applications: OpenGL implementation shall transform the vertex coordinate (by using fixed pipeline or vertex shaders), then, having the normalized coordinates it finally knows whether the triangle lie inside the viewing frustrum.
This mean that no pixel is rasterized in that cases, but the vertex data is processed all the same; simply doesn't produce fragments derived from a non visible triangle!
The OpenGL extension ARB_occlusion_query may help you, but in the discussion section make it clear:
Do occlusion queries make other visibility algorithms obsolete?
No.
Occlusion queries are helpful, but they are not a cure-all. They
should be only one of many items in your bag of tricks to decide
whether objects are visible or invisible. They are not an excuse
to skip frustum culling, or precomputing visibility using portals
for static environments, or other standard visibility techniques.
For the question regarding the mesh sorting on depth, you shall use the depth buffer: essentially the mesh fragment is effectively rendered only if its distance from the viewport is less than the previous fragment in the same position. This make you aware of sorting meshes. This buffer is essentially free, and it allows you to improve performances since it discard more far fragments.
Yes. Like others have pointed out, OpenGL has to perform a lot of per-vertex operations to determine if it is in the frustum. It must do this for every vertex you send it. In addition to the processing overhead that must take place, keep in mind that there is also additional overhead in the transmission of those vertices from the CPU to the GPU. You want to avoid sending information to the GPU that it isn't going to use. Though the bandwidth between the CPU and GPU is quite good on modern hardware, there's still a limit.
What you want is a Scene Graph. Scene graphs are frequently implemented with some kind of spatial partitioning scheme, e.g., Quadtrees, Octrees, BSPTrees, etc etc. Spatial partitioning allows you to intelligently determine what geometries are visible. Instead of doing this on a per-vertex basis (like OpenGL is forced to do) it can eliminate huge spatial subsets of geometry at a time. When rendering a complex scene, the performance savings can be enormous.

WPF 3D Billboards

In a 3D scene we often need to apply labels (little textelements or icons) next to 3D object that is moving around (rotation, translation) in the scene. These labels should always face the camera but still move with the object. This technique I believe is called billboard.
An additional cool feature would be if the label would stay always at the same size - no matter how far away the associated object is. So the label seems to live in 2D screenspace and not in the 3D scenegraph.
Does anyone figures out a clever way how to do this in WPF?
For billboarding you need to make sure that the face normal is pointing towards the camera. The algorithm is that the dot product between the face normal and the view direction should be -1 (minus one).
I have some old C code that does this, but it's probably not particularly useful.
For keeping the object the same size you'd need to work out the screen size and then apply a transform to keep it the constant size you desired.
However, if you want the object to appear as though it's in 2D space, why not draw it in a 2D overlay? This will solve both the billboarding and scaling problem at the same time. You work out the screen location of your label and then use the 2D drawing functions.

Resources