I'm trying to write my own software for security camera motion detection, but in the area of interest outside my house, there is a lot of vegetation motion that will obviously trigger recording if I use some of the more simple algorithms that rely just on the difference between images. Does anyone have any recommendations? I'm struggling to find motion detection information online. I'm guessing that I'll have to employ some edge detection, or maybe a filtering process.
Cheers,
Zan
Without having seen any of your recordings I would suspect that motion from the vegetation looks quite noisy and more random with only a few local edges as in contrast I would expect much stronger connected edges for people that move through the scenery. Also edges from objects moving on the floor will be mostly be oriented on specific directions for a longer period of time.
My first attempt would be
median filter on input image to reduce noise
difference image to previous (may be 2nd previous) image
some edge detector
build some edgelists based on the stronger
filter out weak/short edges
match edges from objects in last frame against the newly found
apply some tracking of positions and other features
classify object behaviour based on this features
consistent movement in one direction
consistently strong edges on the same object
object size
to trigger your recording
Alternatively you can jump on the recent Hype of Deep Neural Networks.
Look up online information and tools (and maybe embedded hardware) to train and run a CDNN.
Split your current videos into
videos there you do not want to be warned
videos there you do want to be warned
let the magic happen.
Related
I am trying to build trajectory of moving camera which is seeing downwards. It works perfectly when camera is just translating, but fails when camera rotates. How to take into account of camera heading?
I am using feature matching which gives me where the specific patch is in my image and identifies the coordinates. I am tracking that patch and it give me trajectory of camera (if camera is not rotating.) But when camera rotates at a single place, it identifies the patch at the same place, and when camera starts moving it doesn't take into account of it.
For eg. if my camera is moving forward in north direction and camera is rotated to south and starts moving forward, my algorithm will not recognize it it builds trajectory just a straight line, instead of a right angle.
How to take into account of the camera rotation.
It works perfectly when camera is just translating, but fails when camera rotates. How to take into account of camera heading?
Direct approach (probably not possible)
Something must be responsible for the camera rotating. This something may know how much the camera rotates and may be able to tell you. I guess that in your case this information might not be readily available though.
Feature based image registration
A single feature is not sufficient to detect all affine transformations (translation, rotation, scaling, ..). You would need to take two features at least (for translation and rotation) or better three features (for full affine transformations) into account.
In case of two features and translation and rotation only, the center of the two features is the translation and the orientation of the connection of the two features is the rotation.
Frequency domain, intensity based image registration
Cross-correlation (via FFT) is fast in detecting translations, however, you can use this technique also for detecting rotation and scaling (see An FFT-based technique for translation, rotation, and scale-invariant image registration or Robust image registration using Log-Polar Transform)
Improved accuracy
Instead of comparing consecutive camera frames with features or intensity based techniques, compare all possible frame combinations within a certain time window (for example the time to move half a frame away), then find the single trajectory that fits all the transformations for all combinations. Computationally more expensive but more accurate.
Some words of caution
If the direct approach fails, you may be fooled by the image structures. In certain cases (uniform images, rotationally symmetric images, ...) it just won't work without an independent confirmation.
I am working on a AR based solution in which I am rendering some 3D models using SceneKit and ARKit. I have also integrated CoreML to identify objects and render corresponding 3D objects in scene.
But right now I am just rendering it in the center of screen as soon I detect the object(Only for the list of objects that I have). Is it possible to get the position of the real world object so that I can show some overlay above the object?
That is if I have a water bottled scanned, I should able to get the position of the water bottle. It could be anywhere in the water bottle but shouldn't go outside of it. Is this possible using SceneKit?
All parts of what you ask are theoretically possible, but a) for several parts, there’s no integrated API to do things for you, and b) you’re probably signing yourself up for a more difficult problem than you think.
What you presumably have with your Core ML integration is an image classifier, as that’s what most of the easy to find ML models do. Image classification answers one question: “what is this a picture of?”
What you’re looking for involves at least two additional questions:
“Given that this image has been classified as containing (some specific object), where in the 2D image is that object?”
“Given the position of a detected object in the 2D video image, where is it in the 3D space tracked by ARKit?”
Question 1 is pretty reasonable. There are models that do both classification and detection (location/bounds within an image) in the ML community. Probably the best known one is YOLO — here’s a blog post about using it with Core ML.
Question 2 is the “research team and five years” part. You’ll notice in the YOLO papers that it gives you only coarse bounding boxes for detected objects — that is, it’s working in 2D image space, not doing 3D scene reconstruction.
To really know the shape, or even the 3D bounding box of an object means integrating object detection with scene reconstruction. For example, if an object has some height in the 2D image, are you looking at a 3D object that’s tall with a small footprint, or one that’s long and low, receding into the distance? Such integration would require taking apart the inner workings of ARKit, which nobody outside Apple can do, or recreating an ARKit-alike from scratch.
There might be some assumptions you can make to get very rough estimates of 3D shape from a 2D bounding box, though. For example, if you do AR hit tests on the lower corners of a box and find that they’re on a horizontal plane, you can guess that the 2D height of the box is proportional to the 3D height of the object, and that its footprint on the plane is proportional to the box’s width. You’d have to do some research and testing to see if assumptions like that hold up, especially in whatever use cases your app covers.
I want to analyze a traffic scene. My source data is a point cloud like this one (see images at the bottom of that post). I want to be able to detect objects that are on the road (cars, cyclists etc.). So first of all I need know where the road surface is so that I can remove or ignore these points or simply just run a detection above the surface level.
What are the ways to detect such road surface? The easiest scenario is a straight and flat road - I guess I could try to registrate a simple plane to the approximate position of the surface (I quite surely know it begins just in front of the car) and because the road surface is not a perfect plane I have to allow some tolerance around the plane.
More difficult scenario would be a curvy and wavy (undulated?) road surface that would form some kind of a 3D curve... I will appreciate any inputs.
A relatively simple starting point:
If you can assume that the road surface starts directly in front of the camera then you can use a region growing algorithm to find a region such that the curvature does not change so much within the region (thereby using sharp edges to delineate the region). This would involve calculating the curvature first. This can make a first approximation; there will be issues with occluding objects and other artefacts I am sure.
http://pointclouds.org/documentation/tutorials/region_growing_segmentation.php#region-growing-segmentation
http://pointclouds.org/documentation/tutorials/normal_estimation.php
here are my questions. I heard that opengl ignores the vertices which are outside the viewing frustum and doesn't consider them in rendering pipeline. Recently I ran into a same post that said you should check this your self and if a point is not inside, it is you duty to find out not opengl's! Now,
Is this true about opengl? does it understand if a point is not inside, and not to render it?
I am developing a grass scene which has about 4000 grasses on rectangles. I have awful FPS, and the only solution I came up was to decide which grasses are inside the viewport and then only render them! My question here is that what solution is best for me to find out which rectangle is not inside or which one is?
Please consider that my question is not about points mainly but about rectangles. Also I need to sort the grasses based on their distance, so it is better if native on client side memory.
Please let me know if there are any effective and real-time ways to find out if any given mesh is inside or outside the frustum. Thanks.
Even if is true then OpenGL does not show polygons outside the frustum ( as any other 3d engines ) it has to consider them to check if there are inside or not and then fps slow down. Usually some smart optimization algorithm is needed to avoid flooding the scene with invisible objects. Check for example BSP trees+PVS or Portals as a starting point.
To check if there is some bottleneck in the application, you can try with gDebugger. If nothing is reasonable wrong optimizing in order to draw just the PVS ( possible visible set ) is the way to go.
OpenGL won't render pixels ("fragments") outside your screen, so it has to clip somehow...
More precisely :
You submit your geometry
You make a Draw Call (glDrawArrays or glDrawElements)
Each vertex goes through the vertex shader, which computes the final position of the vertex in camera space. If you didn't write a vertex shader (=old opengl), the driver will create one for you.
The perspective division transforms these coordinates in Normalized Device Coordinates. Roughly, its means that the frustum of your camera is deformed to fit in a [-1,1]x[-1,1]x[-1,1] box
Everything outside this box is clipped. This can mean completely discarding a triangle, or subdivide it if it is across a clipping plane
Each remaining triangle is rasterized into fragments
Each fragment goes through the fragment shader
So basically, OpenGL knows how to clip, but each vertex still has to go through the vertex shader. So submitting your entire world will work, of course, but if you can find a way not to submit everything, your GPU will be happier.
This is a tradeoff, of course. If you spend 10ms checking each and every patch of grass on the CPU so that the GPU has only the minimal amount of data to draw, it's not a good solution either.
If you want to optimize grass, I suggest culling big patches (5m x 5m or so). It's standard AABB-frustum testing.
If you want to optimize a more generic model, you can investigate quadtree for "flat" models, octrees and bsp-trees for more complex objects.
Yes, OpenGL does not rasterize triangles outsize the viewing frustrum. But, this doesn't mean that this is optimal for applications: OpenGL implementation shall transform the vertex coordinate (by using fixed pipeline or vertex shaders), then, having the normalized coordinates it finally knows whether the triangle lie inside the viewing frustrum.
This mean that no pixel is rasterized in that cases, but the vertex data is processed all the same; simply doesn't produce fragments derived from a non visible triangle!
The OpenGL extension ARB_occlusion_query may help you, but in the discussion section make it clear:
Do occlusion queries make other visibility algorithms obsolete?
No.
Occlusion queries are helpful, but they are not a cure-all. They
should be only one of many items in your bag of tricks to decide
whether objects are visible or invisible. They are not an excuse
to skip frustum culling, or precomputing visibility using portals
for static environments, or other standard visibility techniques.
For the question regarding the mesh sorting on depth, you shall use the depth buffer: essentially the mesh fragment is effectively rendered only if its distance from the viewport is less than the previous fragment in the same position. This make you aware of sorting meshes. This buffer is essentially free, and it allows you to improve performances since it discard more far fragments.
Yes. Like others have pointed out, OpenGL has to perform a lot of per-vertex operations to determine if it is in the frustum. It must do this for every vertex you send it. In addition to the processing overhead that must take place, keep in mind that there is also additional overhead in the transmission of those vertices from the CPU to the GPU. You want to avoid sending information to the GPU that it isn't going to use. Though the bandwidth between the CPU and GPU is quite good on modern hardware, there's still a limit.
What you want is a Scene Graph. Scene graphs are frequently implemented with some kind of spatial partitioning scheme, e.g., Quadtrees, Octrees, BSPTrees, etc etc. Spatial partitioning allows you to intelligently determine what geometries are visible. Instead of doing this on a per-vertex basis (like OpenGL is forced to do) it can eliminate huge spatial subsets of geometry at a time. When rendering a complex scene, the performance savings can be enormous.
In computer vision, what would be a good approach to tracking a human in the black and white same scene at different times of the day (i.e. different levels of illumination)? The scene will never be dark so I don't need to worry about searching using infra-red or anything for heat sensing. I need to identify the people and then also track them so there are two parts.
Any advice would be great.
Thanks.
SIFT feature matching works well for this purpose. It is implemented in OpenCV.