ARKit: project a feature point found in the ARPointCloud to image space and check to see if it's contained in a CGRect on screen? - scenekit

So, I am using ARKit to display feature points in the session. I am able to get the current frame, then its rawFeaturePoints and place geometries in the world space so the user can see them on screen. That is working great.
In the app I then have a quadrant on screen. My objective is to show in screen coordinates feature points that projected would fall inside the 2D quadrant on screen. To do that, I tried this:
get feature points as an array of vector_float3
for each of those points I then get a SCNVector3 setting the Z component to 0 (near plane)
I then call on the ARSCNView:
public func projectPoint(_ point: SCNVector3) -> SCNVector3
This approach does give me 2D points back, but, depending on where the camera is they seem to be way off.
So then, since in ARKit the camera keeps moving around, do I need to take that into account to achieve what I explained?
EDIT:
About flipping the Y of the CGPoint retrieved from the projectPoint call on the camera:
/**
Project a 3D point in world coordinate system into 2D viewport space.
#param point 3D point in world coordinate system.
#param orientation Viewport orientation.
#param viewportSize Viewport (or image) size.
#return 2D point in viewport coordinate system with origin at top-left.
*/
open func projectPoint(_ point: vector_float3, orientation: UIInterfaceOrientation, viewportSize: CGSize) -> CGPoint
Remy San mentioned flipping the Y. I tried that and it does seem to work. One difference between what he's doing and what I am doing is that I am not using an SKScene, but I am using SCNScene. Looking at the docs it says:
...The projection of the specified point into a 2D pixel coordinate space
whose origin is in the upper left corner...
So, what throws me off is that if I don't flip the Y it seems like it's not really working properly. (I'll try to post images to show what I mean). But then if flipping the Y though makes things look better, it goes against the docs. No?

I get you are using the intrinsics matrix for you projection. ARkit technology may also give you some extra information. These are the cameraPoseARFrame, the projectionMatrix and the transformToWorldMap matrices. Are you taking them into consideration when transforming from world coordinates to pixel coordinates?
If anyone has a methodology for applying these matrices to the point cloud coordinates to convert them into screen coordinates, could you contribute to my answer please? I think they may provide more precision and accuracy to the final result.
Thank you!

Related

How to measure horizontal plane surface(visible in camera) using ARKit-Scenekit before placing objects?

I want to measure the horizontal plane surface to find whether it fits the object that i am going to place. For ex. if i am going to place a cot 3D model(with fixed size) in a room using iOS 11 ARKit,
First i want to detect if that room surface is sufficient or not to place my 3D model by measuring the surface area(width and height etc.)
Second if the user tries to place it without sufficient place, i should not allow him to place the cot and show him error message.
I created a sample POC by following https://developer.apple.com/sample-code/wwdc/2017/PlacingObjects.zip using which i am able to detect the horizontal plane and place the cot. But the issue is whatever may be the surface, user is able to place the cot which shouldn't be allowed in real time.
I saw couple of demos in which they say we can measure the size of the room or a horizontal plane(https://www.curbed.com/2017/6/29/15894556/ar-measure-app-augmented-reality-ruler-measuring-tape-ios)
I am using ARKit Scenekit inorder to achieve this and i am new to AR and Scenekit. I need to know if this is doable, and if so how to achieve it.
You could estimate the size of a detected plane by inspecting its dimensions. But you shouldn't.
ARKit has plane estimation, not scene reconstruction. That is, it'll tell you there's a flat surface at (some point) and that said surface probably extends at least (some distance) from that point. It doesn't know exactly how big the surface is (it's even refining its estimate over time), and it doesn't tell you where there are interruptions in that continuous surface, much less the size and shape of such interruptions.
In fact, if you're looking at the floor and moving around, and you see one patch of floor, then another patch of floor on the other side of a solid wall from the first, ARKit will happily recognize that those two patches are coplanar and merge them into the same anchor. At the same time, neither detected patch may cover the entire extent of the floor around it.
If you attempt to restrict where the user can place virtual objects in AR based on plane estimates, you're likely to frustrate them with two kinds of error: you'll have areas where it looks to the user like they can place something but that don't allow it, and you'll have areas that look like they should be off-limits that do allow placing things.
Instead, design your experience to involve the user in deciding where the sensible places for content are. See this demo for example — ARKit detects the level of the floor (not its boundaries), then uses that to show UI indicating the size/shape of objects to be placed. It's up to the user to make sure there's enough room for the couch, etc.
As for the technical how-to on what you probably shouldn't do: The docs for ARPlaneAnchor.extent say that the x and z coordinates of that vector are the width and length of the estimated plane. And all units in ARKit are meters. (Which is width and which is length? It's a matter of perspective. And of the rotation encoded in the anchor's transform.)

Simple Flat Plane Tessellation Shader

Part 1:
So I want to create a basic tessellation program that takes a plane of quads and transforms it into a more, well, detailed/tessellated plane of quads. Such as the picture below. How much it gets tessellated would depend on user controls, passed in by a uniform (initially). However I am so new to tessellation shaders that I can't even figure out how to do this.
How is this typically done? Surely you shouldn't actually draw the plane of quads prior to the shader program, since from my understanding quads won't get tessellated this way, instead the get tessellated into a way like the below picture:
I believe the answer could to be to draw a plane of points, and these points are then tessellated into more points, and these points are transformed into quads of the appropriate size in the geometry shader I think? Alternatively, instead of converting points into quads could I just draw quads between each four closest points (that would be much better)? Examples very much appreciated!
NOTE: Using GLSL > 4.0 & C only (No C++/Python)
Part 2:
After I get part 1 working, how would I make it so that certain quads are more tessellated than others, such as this?:
I want the parts closer to the camera to be more tessellated.
Part 3:
If I were able to get that far, the next part would be to alter the z-axis of points to make the plane into an interesting environment. This would be done by reading in a 2Dsampler, I know how to do that and all. However, if I am correct in Part 1 about using a plane of points then I need to do more than just alter the points that are converted into quads, because quads need to be sharing vertices essentially in order there to be no gaps between quads. How would that be done? Alternatively if we draw quads between points, with each point being the appriate height, then this wouldn't be a issue.
Part 1
Yes you're correct: generate a 'patch' as a simple grid of points, specify the tesselation levels as uniforms into the TCS (tesselation control shader) and generate the vertex data in the TES (tesselation evaluation shader).
Sounds complicated? Here's a nice tutorial I based my work on: http://antongerdelan.net/opengl/tessellation.html
Part 2
What you are talking about here is LOD (or level of detail). You would need to tesselate and render the higher polygon-count bottom-left corner of your mesh as a separate object.
Your suggested approach is correct: break the overall scene into 'chunks' and determine the LOD (i.e. the tesselation parameters) for each chunk separately, usually by some distance-to-camera algorithm.
Part 3
Another excellent tutorial which does exactly what you are after I believe: http://codeflow.org/entries/2010/nov/07/opengl-4-tessellation/
I used this approach to get a very highly detailed but memory and frame efficient terrain.
Hope this helps.

does glRotate in OpenGL rotate the camera or rotate the world axis or rotate the model object?

I want to know whether glRotate rotates the camera, the world axis, or the object. Explain how they are different with examples.
the camera
There is no camera in OpenGL.
the world axis
There is no world in OpenGL.
or the object.
There are no objects in OpenGL.
Confused?
OpenGL is a drawing system, that operates with points, lines and triangles. There is no concept of a scene or a world in OpenGL. All there is are vertices of which each has a set of attributes and there is the state of OpenGL which determines how vertices are turned into pixels.
The very first stage of this process is getting the vertex positions within the viewport. In the fixed function pipeline (i.e. without shaders), to get those, each vertex position if first multiplied with the so called "modelview" matrix, the intermediary result is used for lighting calculations and then multiplied with the "projection" matrix. After that clipping and then normalization into viewport coordinates are applied.
Those two matrices I mentioned save two purposes. The first one "modelview" is used to apply some transformation on the incoming vertices so that those end up in the desired spot relative to the origin. There is no difference in first moving geometry to some place in the world, and then moving the viewpoint within the world. Or keeping the viewpoint at the origin and move the whole world in the opposite. All this can be described by the modelview matrix.
The second one "projection" works together with the normalization process to behave like a kind of "lens", so to speak. With this you set the field of view (and a few other parameters, like shift, which you need for certain applications – don't worry about it).
The interesting thing about matrices is, that they're non-commutative, i.e. for two given matrices N, M
M * N =/= N * M ; for most M, N
This ultimately means, that you can compose a series of transformations A, B, C, D... into one single compound transformation matrix T by multiplying the primitive transformations onto each other in the right order.
The OpenGL matrix manipulation functions (they're obsolete BTW), do just that. You have a matrix selected for manipulation (the matrix mode) for example the modelview M. Then glRotate effectively does this:
M *= R(angle,axis)
i.e. the active matrix gets multiplied on a rotation matrix constructed from the given parameters. Similar for scale and translate.
If this happens to appear to behave like a camera or placing a object depends entirely on how and in which order those manipulations are combined.
But for OpenGL there are just numbers/vectors (vertex attributes), which somehow translate into 2-dimensional viewport coordinates, that get drawn as points for filled inbetween as line or a triangle.
glRotate works on the current matrix. So it depend if the matrix is the camera one or a world trasformation one. To know more about the current matrix have a look at glMatrixMode().
Finding example is just googling: I found this one that in order to me should help you to figure out what's happening.

Rectangle matrix calculations in OpenCV

I had a generalized question to find out if it was possible or not to do matrix calculations on a rectangle. I have a CvRect that has information stored in it with coordinates and I have a cvMat that has transformational data. What I would like to know is if there was a way to get the Rect to use the matrix data to generate a rotated, skewed, and repositioned rectangle out of it. I've searched online, but I was only able to get information on image transforms.
Thanks in advance for the help.
No, this is not possible. cv::Rect is also not capable of that, as it only describes rectangles in a Manhattan world. There is cv::RotatedRect, but this also does not handle skewing.
You can, however, feed the corner points of your rectangle to cv::transform:
http://opencv.itseez.com/modules/core/doc/operations_on_arrays.html?highlight=transform#cv2.transform
You will then obtain four points that are transformed accordingly. Note that there are also more specialized versions of this function, e.g. warpPerspective() and warpAffine().

Calculating distance using a single camera

I would like to calculate distance to certain objects in the scene, I know that I can only calculate relative distance when using a single camera but I know the coordinates of some objects in the scene so in theory it should be possible to calculate actual distance. According to the opencv mailing list archives,
http://tech.groups.yahoo.com/group/OpenCV/message/73541
cvFindExtrinsicCameraParams2 is the function to use, but I can't find information on how to use it?
PS. Assuming camera is properly calibrated.
My guess would be, you know the width of an object, such as a ball is 6 inches across and 6 inches tall, you can also see that it is 20 pixels tall and 25 pixels wide. You also know the ball is 10 feet away. This would be your start.
Extrinsic parameters wouldn't help you, I don't think, because that is the camera's location and rotation in space relative to another camera or an origin. For a one camera system, the camera is the origin.
Intrinsic parameters might help. I'm not sure, I've only done it using two cameras.

Resources