What is dataset Bounding Box? - dataset

I am bit new to Imaging and want to understand below:
what is the bounding box of a dataset and why is that needed? Does it represent something of measurement in real world or just for computer screen where it is displayed? How is this related to the image size specified in pixels?
What does WMTS layers zoom level & matrix sets mean? I understand that WMTS works by using getting tiles of the dataset. Also, I see that the get Capabilities for a specific WMTS dataset returns back matrix Sets in the XML which I don't understand?
what do the matrix datasets and zoom levels signify and how can I understand them as a layman?
I have tried googling a bit but it looks like the articles assume some technical knowledge around this already which I am trying to gather.

A bounding box is the (imaginary) rectangle that you can draw around a dataset (or feature) that touches it's maximums and minimums in both X and Y direction. It is measured in the same units as the geometry. It is related to the image size in pixels as the resolution or scale which are bbox.width/image.width or (the inverse), and are in units of metres/pixel or pixels/metre (or degree or foot).
A WMTS layer is a set of pre-rendered tiles that have been produced at a fixed set of scales and over a fixed area. These are related in the matrix sets of a WMTS layer - the zoom level is how far down that set of scales you have traversed with 0 being the top and an arbitrary number (usually between 15-20 for global data sets) being the lowest (or most detailed).
See 2. - You should not really need to understand them in detail as your client library will handle all of that for you.


How to measure horizontal plane surface(visible in camera) using ARKit-Scenekit before placing objects?

I want to measure the horizontal plane surface to find whether it fits the object that i am going to place. For ex. if i am going to place a cot 3D model(with fixed size) in a room using iOS 11 ARKit,
First i want to detect if that room surface is sufficient or not to place my 3D model by measuring the surface area(width and height etc.)
Second if the user tries to place it without sufficient place, i should not allow him to place the cot and show him error message.
I created a sample POC by following https://developer.apple.com/sample-code/wwdc/2017/PlacingObjects.zip using which i am able to detect the horizontal plane and place the cot. But the issue is whatever may be the surface, user is able to place the cot which shouldn't be allowed in real time.
I saw couple of demos in which they say we can measure the size of the room or a horizontal plane(https://www.curbed.com/2017/6/29/15894556/ar-measure-app-augmented-reality-ruler-measuring-tape-ios)
I am using ARKit Scenekit inorder to achieve this and i am new to AR and Scenekit. I need to know if this is doable, and if so how to achieve it.
You could estimate the size of a detected plane by inspecting its dimensions. But you shouldn't.
ARKit has plane estimation, not scene reconstruction. That is, it'll tell you there's a flat surface at (some point) and that said surface probably extends at least (some distance) from that point. It doesn't know exactly how big the surface is (it's even refining its estimate over time), and it doesn't tell you where there are interruptions in that continuous surface, much less the size and shape of such interruptions.
In fact, if you're looking at the floor and moving around, and you see one patch of floor, then another patch of floor on the other side of a solid wall from the first, ARKit will happily recognize that those two patches are coplanar and merge them into the same anchor. At the same time, neither detected patch may cover the entire extent of the floor around it.
If you attempt to restrict where the user can place virtual objects in AR based on plane estimates, you're likely to frustrate them with two kinds of error: you'll have areas where it looks to the user like they can place something but that don't allow it, and you'll have areas that look like they should be off-limits that do allow placing things.
Instead, design your experience to involve the user in deciding where the sensible places for content are. See this demo for example — ARKit detects the level of the floor (not its boundaries), then uses that to show UI indicating the size/shape of objects to be placed. It's up to the user to make sure there's enough room for the couch, etc.
As for the technical how-to on what you probably shouldn't do: The docs for ARPlaneAnchor.extent say that the x and z coordinates of that vector are the width and length of the estimated plane. And all units in ARKit are meters. (Which is width and which is length? It's a matter of perspective. And of the rotation encoded in the anchor's transform.)

OpenGL rendering quality vs. number of vertices

I am coding a modern OpenGL application to visualize 3d atomic models (molecules, periodic systems ...) for chemistry and condense matter physics.
I started to work on this few years ago, the first version of my program was in old OpenGL now I am updating it to modern OpenGL.
I come with a question regarding the quality of the rendering of the OpenGL window. In the following examples I draw 3D cylinders and 3D spheres using instanced drawing, in this model to render the bonds I only draw one cylinder, then I translate/scale/rotate it properly in the vertex shader
to render all bonds, same goes for the sphere to render the atoms.
As you can see it works just fine, and the efficiency of the method is amazing and I can render models with hundreds of thousand of atoms smoothly.
However I noticed something weird, that somehow the quality of the rendering seems to be dependent on the number of vertices (objects, atoms and bonds) in the scene, obviously the number of triangles is the most important parameter but not the only one ... since the quality decrease when a lot of vertices are rendered ... please see the attached snapshots:
To render the spheres in the scene I am using 50x50 vertices, and 2x50 for the cylinders (GL_TRIANGLE_STRIP in both cases)
1) In this test model I load: 96 atoms, 512 half bonds, : ~ 291200 vertices:
2) I zoom in to focus on one selected atom and it surrounding, at this scale the result is impeccable:
3) I reset the view and use the builder in my program to increase the number of boxes
(I am simply doing replicas in the 3 direction of space) here I choose to do 20x20x20 replicas,
see the result bellow, the original box is highlighted.
In that scene there are 768000 atoms, 4096000 half-bonds, and thus: 291200x20x20x20 = 2329600000 vertices
quite a lot, yet it works, but something weird appears ...
4) I zoom in again on that particular area of the model I picked before and there is a decrease in quality in particular
in the areas where 3D objects (spheres/cylinders) superimpose/overlap ...
Can somebody explain to me what I see ?
Note 1: In the same window I can decrease the number of replicas back to the original box, zoom again
and see that the result is back to impeccable.
Note 2: the older version of my program still works fine (old OpenGL, using display list with glutsphere and glutcylinders),
I can do the same things, the rendering will take much much longer, but at the end of the process when I zoom in on the 20x20x20
boxes model, the results remains perfect, like for the single box model, and obviously I use same graphic card, driver and else.
Can somebody explain to me what I see ?
You're seeing the limited precision of the depth buffer. There are only so many bits you can work with and in a perspective projection a lonlinear scaling from Z distance to depth buffer value is applied.
The best course of action is to limit the near/depth range of the perspective projection matrix to what's going to be actually visible on screen, to make better use of the depth buffer. Also it's possible to linearize the depth buffer (but that comes with a performance hit). Also you could try to cleanly intersect the geometry where sticks and spheres meet, i.e. constrain the sphere's vertices to the cylinder surface where the sticks and similarly constrain the sticks' end vertices to the sphere where they meet. That way you avoid overlap and hence these artifacts.

Determine chessboard dimensions in pixels

Similar to calibrating a single camera 2D image with a chessboard, I wish to determine the width/height of the chessboard (or of a single square) in pixels.
I have a camera aimed vertically at the ground, ensured to be perfectly level with the surface below. I am using the camera to determine the translation between consequtive frames (successfully achieved using fourier phase correlation), at the moment my result returns the translation in pixels, however I would like to use techniques similar to calibration, where I move the camera over the chessboard which is flat on the ground, to automatically determine the size of the chessboard in pixels, relative to my image height and width.
Knowing the size of the chessboard in millimetres, I can then convert a pixel unit to a real-world-unit in millimetres, ie, 1 pixel will represent a distance proportional to the height of the camera above the ground. This will allow me to convert a translation in pixels to a translation in millimetres, recalibrating every time I change the height of the camera.
What would be the recommended way of achieving this? Surely it must be simpler than single camera 2D calibration.
OpenCV can give you the position of the chessboard's corners with cv::findChessboardCorners().
I'm not sure if the perspective distortion will affect your calculations, but if the chessboard is perfectly aligned beneath the camera, it should work.
This is just an idea so don't hit me.. but maybe using the natural contrast of the chessboard?
"At some point it will switch from bright to dark pixels and that should happen (can't remember number of columns on chessboard) times." should be a doable algorithm.

Blob detection in C (not with OPENCV)

I am trying to do my own blob detection who will receive a real time video, and try to detect a white paper sheet.
Even if is something written inside the paper. I need to detect the paper and is corner, because what i really want is to draw a opengl polygon over the paper in each corner of the paper will be a corner of the polygon. Then i need the coordinates of the paper to do other stuffs.
So i need to:
- detect a square white blob.
- get the coordinates of the cornes
- draw a polygon over the white sheet.
Any ideias how can i do that?
Much depends on context. For example, suppose that you:
know that the paper is always roughly centered (i.e. W/2, Y/2 is always inside the blob), and no more rotated than 45 degrees (30 would be better)
have a suitable border around the sheet so that the corners never touch the edges of the FOV
are able (through analysis of local variance, or if you're lucky, check of background color or luminance) to say whether a point is inside or outside the blob
the inside/outside function never fails (except possibly in the close vicinity of a border)
then you could walk a line from a point on the border (surely outside) and the center (surely inside), even through bisection, and find a point - an areal - on the edge.
Two edge points give a rect (two areals give a beam), two rects give an intersection (two beams give a larger areal) - and there's your corner. You should carry along the detection uncertainty (areal radius) in order to validate corners (another less elegant approach is to roughly calculate where the corner is, and pinpoint it with a spiral search or drunkard's walk).
This algorithm is amenable to parallelization and, as long as the hypotheses hold, should be really fast.
All that said, it remains a hack -- I agree with unwind, why reinvent the wheel? If you have memory or CPU constraints (embedded systems, etc.), I believe there ought to be OpenCV and e-Vision "lite" ports also for ARM and embedded platforms.
(Sorry for my terminology - I'm monkey-translating from Italian. "Areal" is likely to correspond to your "blob", a beam is the family of lines joining all couples of points in two different blobs, line intensity being the product of distance from a point from its areal's center)
I am trying to do my own blob detection who will receive a real time video, and try to detect a white paper sheet.
Your first shot could be a simple flood-fill. That is, select a good threshold to binarize the image and apply the algorithm. The threshold can be fixed if you know the paper is always brighter than X and the background is always darker than this. Or this can be an adaptive threshold, for example Otsu's method. OpenCV offers this for free.
If you'd need to speed it up you could use a union-find data structure.
Finally you'd need to come up with some heuristic how to identify the corners (e.g. the four extreme values in x/y direction).
Then i need [...] the coordinates of the cornes [...]
Then you don't need blob detection, but corner detection or contour detection in the first place. OpenCV has some nice functionality for exactly this.
If you can't use it, I would suggest to binarize the image as above and use a harris-detector to find the corners of the object.
OpenCV's TBB support could also come quite handy if you'd use it and you have problems to meet your real-time requirements.

WPF & resolution independent

if i put everything in viewbox container then my wpf apps will be resolution independent or do i need to do anything else. please help with concept.
Scale elements accordingly to the available screen or medium size
If your desire is, to allways fill some room of the screen or output device, independently of the metrics, using the viewbox is a good choice. If you have a big monitor, you will have a big element, if you have a small paper, you will have a small print out of the same element.
With the Stretch-property of an image you have a similar possibility only for pictures.
Make elements on every device equaly sized
WPF is designed "resolution independent". The goal of this resolution indepency is, that if you design an element to be 15 inches, then it will be on every output medium this 15 inches, independently of the resolution of your output device. Calculaction and specification of dimensions is done in "device independent pixels" (DIP) which you can convert to centimeters or inches without having specific knowledge about the output devices resolution.
96DIP == 1inch == 2.54cm;
1 inch == 96DPI;
1 cm == 37.8DIP;
If want to use this resolution indepency, you can set fixed values (in DIPs) to your elements. On a large monitor then your element then maybe only uses a small part (e.G. 15inches), and on a small monitor it fills the whole screen (also 15inches).
WPF is resolution independent without any extra tricks at all. If you host legacy controls (non-WPF controls) then this may break for them, but WPF itself is resolutions independent and vector based.
Viewbox has nothing to do with resolution independence.
Resolution independence means, controls you specify can be drawn on different resolutions while keeping scale. So you can use display that has 10x bigger density of points, but controls will still look same to you.
And like it was said, WPF itself was designed with this in mind, you dont have to do anything.
