Do depth values in AVDepthData (from TrueDepth camera) indicate distance from camera or camera plane? - scenekit

Do depth values in AVDepthData (from TrueDepth camera) indicate distance in meters from the camera, or perpendicular distance from the plane of the camera (i.e. z-value in camera space)?
My goal is to get an accurate 3D point from the depth data, and this distinction is important for accuracy. I've found lots online regarding OpenGL or Kinect, but not for TrueDepth camera.
FWIW, this is the algorithm I use. I'm find the value of depth buffer at a pixel found using some OpenCV feature detection. Below is the code I use to find real world 3D point at a given pixel at let cgPt: CGPoint. This algorithm seems to work quite well, but I'm not sure whether small error is introduced by the assumption of depth being distance to camera plane.
let depth = 1/disparity
let vScreen = sceneView.projectPoint(SCNVector3Make(0, 0, -depth))
// cgPt is the 2D coordinates at which I sample the depth
let worldPoint = sceneView.unprojectPoint(SCNVector3Make(cgPt.x, cgPt.y, vScreen.z))

I'm not sure of authoritative info either way, but it's worth noticing that capture in a disparity (not depth) format uses distances based on a pinhole camera model, as explained in the WWDC17 session on depth photography. That session is primarily about disparity-based depth capture with back-facing dual cameras, but a lot of the lessons in it are also valid for the TrueDepth camera.
That is, disparity is 1/depth, where depth is distance from subject to imaging plane along the focal axis (perpendicular to imaging plane). Not, say, distance from subject to the focal point, or straight-line distance to the subject's image on the imaging plane.
IIRC the default formats for TrueDepth camera capture are depth, not disparity (that is, depth map "pixel" values are meters, not 1/meters), but lacking a statement from Apple it's probably safe to assume the model is otherwise the same.

It looks like it measures distance from the camera's plane rather than a straight line from the pinhole. You can test this out by downloading the Streaming Depth Data from the TrueDepth Camera sample code.
Place the phone vertically 10 feet away from the wall, and you should expect to see one of the following:
If it measures from the focal point to the wall as a straight line, you should expect to see a radial pattern (e.g. the point closest to the camera is straight in front of it; the points furthest to the camera are those closer to the floor and ceiling).
If it measures distance from the camera's plane, then you should expect the wall color to be nearly uniform (as long as you're holding the phone parallel to the wall).
After downloading the sample code and trying it out, you will notice that it behaves like #2, meaning it's distance from the camera's plane, not from the camera itself.

Related

Uniform random sampling of CIELUV for RGB colors

Selecting a random color on a computer is a touch harder than I thought it would be.
The naive way of uniform random sampling of 0..255 for R,G,B will tend to draw lots of similar greens. It would make sense to sample from a perceptually uniform space like CIELUV.
A simple way to do this is to sample L,u,v on a regular mesh and ensure the color solid is contained in the bounds (I've seen different bounds for this). If the sample falls outside embedded RGB solid (tested by mapping it XYZ then RGB), reject it and sample again. You can settle for a kludgy-but-guaranteed-to-terminate "bailout" selection (like the naive procedure) if you reject more then some arbitrary threshold number of times.
Testing if the sample lies within RGB needs to be sure to test for the special case of black (some implementations end up being silent on the divide by zero), I believe. If L=0 and either u!=0 or v!=0, then the sample needs to be rejected or else you would end up oversampling the L=0 plane in Luv space.
Does this procedure have an obvious flaw? It seems to work but I did notice that I was rolling black more often than I thought made sense until I thought about what was happening in that case. Can anyone point me to the right bounds on the CIELUV grid to ensure that I am enclosing the RGB solid?
A useful reference for those who don't know it:
https://www.easyrgb.com/en/math.php
The key problem with this is that you need bounds to reject samples that fall outside of RGB. I was able to find it worked out here (nice demo on page, API provides convenient functions):
https://www.hsluv.org/
A few things I noticed with uniform sampling of CIELUV in RGB:
most colors are green and purple (this is true independent of RGB bounds)
you have a hard time sampling what we think of as yellow (very small volume of high lightness, high chroma space)
I implemented various strategies that focus on sampling hues (which is really what we want when we think of "sampling colors") by weighting according to the maximum chromas at that lightness. This makes colors like chromatic light yellows easier to catch and avoids oversampling greens and purples. You can see these methods in actions here (select "randomize colors"):
https://www.mysticsymbolic.art/
Source for color randomizers here:
https://github.com/mittimithai/mystic-symbolic/blob/chromacorners/lib/random-colors.ts
Okay, while you don't show the code you are using to generate the random numbers and then apply them to the CIELUV color space, I'm going to guess that you are creating a random number 0.0-100.0 from a random number generator, and then just assigning it to L*.
That will most likely give you a lot of black or very dark results.
Let Me Explain
L* of L * u * v* is not linear as to light. Y of CIEXYZ is linear as to light. L* is perceptual lightness, so an exponential curve is applied to Y to make it linear to perception but then non-linear as to light.
TRY THIS
To get L* with a random value 0—100:
Generate a random number between 0.0 and 1.0
Then apply an exponent of 0.42
Then multiply by 100 to get L*
Lstar = Math.pow(Math.random(), 0.42) * 100;
This takes your random number that represents light, and applies a powercurve that emulates human lightness perception.
UV Color
As for the u and v values, you can probably just leave them as linear random numbers. Constrain u to about -84 and +176, and v to about -132.5 and +107.5
Urnd = (Math.random() - 0.5521) * 240;
Vrnd = (Math.random() - 0.3231) * 260;
Polar Color
It might be interesting converting uv to LChLUV or LshLUV
For hue, it's probably as simple as H = Math.random() * 360
For chroma contrained 0—178: C = Math.random() * 178
The next question is, should you find chroma? Or saturation? CIELUV can provide either Hue or Sat — but for directly generating random colors, it seems that chroma is a bit better.
And of course these simple examples are not preventing over-runs, so they color values to be tested to see if they are legal sRGB or not. There's a few things that can be done to constrain the generated values to legal colors, but the object here was to get you to a better distribution without excess black/dark results.
Please let me know of any questions.

Obtaining orientation with 3-axis accelerometer and gyro

This is my first post on SO. I haven't already developed much code for embedded systems, but I have few problems and need help from more advanced programmers. I use following devices:
- LandTiger board with LPC1768 (Cortex M3) MCU,
- Digilent pmodACL with ADXL345 accelerometer (3 axis),
- Digilent pmodGYRO with L3G4200D gyroscope (3 axis).
I would like to get some information about device orientation, i.e. rotation angles over X, Y and Z axes. I've read that in order to achieve this I need to combine data from both accelerometer and gyroscope using Kallman filter or its simpler form i.e. complementary filter. I would like to know if it's possible to count roll, pitch and yaw from full range (0-360 degrees) using measurment data only from gyroscope and accelerometer (without magnetometer). I've also found some mathematical formulas (http://www.ewerksinc.com/refdocs/Tilt%20Sensing%20with%20LA.pdf and http://www.freescale.com/files/sensors/doc/app_note/AN3461.pdf) but they contain root squares in numerators/denominators so the information about proper quadrant of coordinate system is lost.
The question you are asking is a fairly frequent one, and is fairly complex, with many different solutions.
Although the title mentions only an accelerometer, the body of your post mentions a gyroscope, so I will assume you have both. In addition there are many steps to getting low-cost accelerometers and gyros to work, one of those is to do the voltage-to-measurement conversion. I will not cover that part.
First and foremost I will answer your question.
You asked if by 'counting' the gyro measurements you can estimate the attitude (orientation) of the device in Euler Angles.
Short answer: Yes, you could sum the scaled gyro measurements to get a very noisy estimate of the device rotation (actual radians turned, not attitude), however it would fail as soon as you rotate more than one axis. This will not suffice for most applications.
I will provide you with some general advise, specific knowledge and some example code that I have used before.
Firstly, you should not try to solve this problem by writing a program and testing with your IMU. You should start by writing a simulation using validated libraries, then validate your algorithm/program, and only then try to implement it with the IMU.
Secondly, you say you want to "count roll, pitch and yaw from full range (0-360 degrees)".
From this I assume you mean you want to be able to determine the Euler Angles that represent the attitude of the device with respect to an external stationary North-East-Down (NED) frame.
Your statement makes me think you are not familiar with representations of attitude, because as far as I know there are no Euler Angle representations with all 3 angles in the 0-360 range.
The application for which you want to use the attitude of the device will be important. If you are using Euler Angles you will not be able to accurately track the attitude of the device when large (greater than around 50 degrees) rotations are made on the roll or pitch axes, due to what is known as Gimbal Lock.
If you require the tracking of such motions then you will need to use a quaternion or Direction Cosine Matrix (DCM) representation of attitude.
Thirdly, as you have said you can use a Complimentary Filter or Kalman Filter variant (Extended Kalman Filter, Error-State Kalman Filter, Indirect Kalman Filter) to accurately track the attitude of the device by fusing the data from the accelerometer, gyro and a magnetometer. I suggest the Complimentary Filter described by Madgwick which is implemented in C, C# and MATLAB here. A Kalman Filter variant would be necessary if you wanted to track the position of the device, and had an additional sensor such as GPS.
For some example code of mine using accelerometer only to get Euler Angle pitch and roll see my answer to this other question.

Source engine styled rope rendering

I am creating a 3D graphics engine and one of the requirements is ropes that behave like in Valve's source engine.
So in the source engine, a section of rope is a quad that rotates along it's direction axis to face the camera, so if the section of rope is in the +Z direction, it will rotate along the Z axis so it's face is facing the camera's centre position.
At the moment, I have the sections of ropes defined, so I can have a nice curved rope, but now I'm trying to construct the matrix that will rotate it along it's direction vector.
I already have a matrix for rendering billboard sprites based on this billboarding technique:
Constructing a Billboard Matrix
And at the moment I've been trying to retool it so that Right, Up, Forward vector match the rope segment's direction vector.
My rope is made up of multiple sections, each section is a rectangle made up of two triangles, as I said above, I can get the position and sections perfect, it's the rotating to face the camera that's causing me a lot of problems.
This is in OpenGL ES2 and written in C.
I have studied Doom 3's beam rendering code in Model_beam.cpp, the method used there is to calculate the offset based on normals rather than using matrices, so I have created a similar technique in my C code and it sort of works, at least it, works as much as I need it to right now.
So for those who are also trying to figure this one out, use the cross-product of the mid-point of the rope against the camera position, normalise that and then multiply it to how wide you want the rope to be, then when constructing the vertices, offset each vertex in either + or - direction of the resulting vector.
Further help would be great though as this is not perfect!
Thank you
Check out this related stackoverflow post on billboards in OpenGL It cites a lighthouse3d tutorial that is a pretty good read. Here are the salient points of the technique:
void billboardCylindricalBegin(
float camX, float camY, float camZ,
float objPosX, float objPosY, float objPosZ) {
float lookAt[3],objToCamProj[3],upAux[3];
float modelview[16],angleCosine;
glPushMatrix();
// objToCamProj is the vector in world coordinates from the
// local origin to the camera projected in the XZ plane
objToCamProj[0] = camX - objPosX ;
objToCamProj[1] = 0;
objToCamProj[2] = camZ - objPosZ ;
// This is the original lookAt vector for the object
// in world coordinates
lookAt[0] = 0;
lookAt[1] = 0;
lookAt[2] = 1;
// normalize both vectors to get the cosine directly afterwards
mathsNormalize(objToCamProj);
// easy fix to determine wether the angle is negative or positive
// for positive angles upAux will be a vector pointing in the
// positive y direction, otherwise upAux will point downwards
// effectively reversing the rotation.
mathsCrossProduct(upAux,lookAt,objToCamProj);
// compute the angle
angleCosine = mathsInnerProduct(lookAt,objToCamProj);
// perform the rotation. The if statement is used for stability reasons
// if the lookAt and objToCamProj vectors are too close together then
// |angleCosine| could be bigger than 1 due to lack of precision
if ((angleCosine < 0.99990) && (angleCosine > -0.9999))
glRotatef(acos(angleCosine)*180/3.14,upAux[0], upAux[1], upAux[2]);
}

similarity between an image and its rotated version using SIFT

I have implemented SIFT in opencv for comparing images... i have not yet written the program for comparing.Thinking of using FLANN for the same.But,my problem is that,looking into the 128 elements of the descriptor,cannot really understand the similarity of an image and its rotated version.
By reading Lowe's paper,i do understand that the descriptor co-ordinates are all rotated in terms of the keypoint orientation...but,how exactly is the similarity obtained.Can we undertstand the similarity by just viewing the 128 values.
pls,help me...this is for my project presentation.
You can first use Lowe's metric to compute some putative matches between the two images. The metric is that for any given descriptor de in image 1, find the distance to all descriptors de' in image 2. If the ratio of the closest distance to the second closest distance is below a threshold, then accept it.
After this, you can do RANSAC or other form of robust estimation or Hough Transform to check geometric consistency in terms of position, orientation, and scale of the keypoints that you accepted as putative matches.
If I recall correctly, SIFT will give you a set of 128-value descriptors that describe each of the interest points. You also have the location of each point in each of the images, as well as its "direction" (I forget what the "direction" is called in the paper) and scale in each image.
Once you've found two points that have matching descriptors, you can calculate the transformation from the interest point in one image to the same point in the other image by comparing coordinates and directions.
If you have enough matches, you see if all (or a majority of) the interest points have the same transformation. If they do, the images are similar, if they don't, the images are different.
Hope this helps...
What you are looking for is basically ASIFT
You can find the code here and some overview

How to recognizing money bills in Images?

I'm having some images, of euro money bills. The bills are completely within the image
and are mostly flat (e.g. little deformation) and perspective skew is small (e.g. image quite taken from above the bill).
Now I'm no expert in image recognition. I'd like to achieve the following:
Find the boundingbox for the money bill (so I can "cut out" the bill from the noise in the rest of the image
Figure out the orientation.
I think of these two steps as pre-processing, but maybe one can do the following steps without the above two. So with that I want to read:
The bills serial-number.
The bills face value.
I assume this should be quite possible to do with OpenCV. I'm just not sure how to approach it right. Would I pick a FaceDetector like approach or houghs or a contour detector on an edge detector?
I'd be thankful for any further hints for reading material as well.
Hough is great but it can be a little expensive
This may work:
-Use Threshold or Canny to find the edges of the image.
-Then cvFindContours to identify the contours, then try to detect rectangles.
Check the squares.c example in opencv distribution. It basically checks that the polygon approximation of a contour has 4 points and the average angle betweeen those points is close to 90 degrees.
Here is a code snippet from the squares.py example
(is the same but in python :P ).
..some pre-processing
cvThreshold( tgray, gray, (l+1)*255/N, 255, CV_THRESH_BINARY );
# find contours and store them all as a list
count, contours = cvFindContours(gray, storage)
if not contours:
continue
# test each contour
for contour in contours.hrange():
# approximate contour with accuracy proportional
# to the contour perimeter
result = cvApproxPoly( contour, sizeof(CvContour), storage,
CV_POLY_APPROX_DP, cvContourPerimeter(contour)*0.02, 0 );
res_arr = result.asarray(CvPoint)
# square contours should have 4 vertices after approximation
# relatively large area (to filter out noisy contours)
# and be convex.
# Note: absolute value of an area is used because
# area may be positive or negative - in accordance with the
# contour orientation
if( result.total == 4 and
abs(cvContourArea(result)) > 1000 and
cvCheckContourConvexity(result) ):
s = 0;
for i in range(4):
# find minimum angle between joint
# edges (maximum of cosine)
t = abs(angle( res_arr[i], res_arr[i-2], res_arr[i-1]))
if s<t:
s=t
# if cosines of all angles are small
# (all angles are ~90 degree) then write quandrange
# vertices to resultant sequence
if( s < 0.3 ):
for i in range(4):
squares.append( res_arr[i] )
-Using MinAreaRect2 (Finds circumscribed rectangle of minimal area for given 2D point set), get the bounding box of the rectangles. Using the bounding box points you can easily calculate the angle.
you can also find the C version squares.c under samples/c/ in your opencv dir.
There is a good book on openCV
Using a Hough transform to find the rectangular bill shape (and angle) and then find rectangles/circles within it should be quick and easy
For more complex searching, something like a Haar classifier - if you needed to find odd corners of bills in an image?
You can also take a look at the Template Matching methods in OpenCV; another option would be to use SURF features. They let you search for symbols & numbers in size, angle etc. invariantly.

Resources