Non Redundant Image Extraction From Video - artificial-intelligence

I am collecting data for a project. The data collection is done by recording videos of the subjects and the environment. However, while training the network, I would not want to train it with all the images collected in the video sequence.
The main objective is to not train the network with redundant images. The video sequence collected at 30 frames/sec can have redundant images (images that are very similar) within the short intervals. T(th) frame and (T+1)th frame can be similar.
Can someone suggest ways to extract only the images that can be useful for training ?

Update #2: Further resources,
https://github.com/JohannesBuchner/imagehash
https://www.pyimagesearch.com/2017/11/27/image-hashing-opencv-python/
https://www.pyimagesearch.com/2020/04/20/detect-and-remove-duplicate-images-from-a-dataset-for-deep-learning/
Update #1: You can use this repo to calculate similarity between given images. https://github.com/quickgrid/image-similarity**
If frames with certain objects(e.g., vehicle, device) are important, then use pretrained object detectors if available, to extract important frames.
Next, use a similarity method to remove similar images in nearby frames. Until a chosen threshold is exceeded keep removing nearby N frames.
This link should be helpful in finding right method for your case,
https://datascience.stackexchange.com/questions/48642/how-to-measure-the-similarity-between-two-images
This repository below should help implement the idea with few lines of code. It uses CNN to extract features then calculates there cosine distance as mentioned there.
https://github.com/ryanfwy/image-similarity

Related

Facial Detection with LBPH - Extracting Features

I've created the the framework of the system, which takes a picture, converts it to an LBPH image, and then gets the histograms from each tile of the grid(8x8). I'm following this paper on it, but am confused what to do next to identify features after step 4. Do I just compare each square of the grid with a set of known feature squares and find the closest match? This is my first facial detection program so I'm very new to it.
So basically image processing works like this. Pixel intensity values are way too variant and uninformative by themselves to be useful for algorithms to make sense of an image. Much more useful is the local relationships between pixel intensity values So image processing for recognition, detection is basically a 2-step process.
Feature Extraction - Transform the low-level, high variance, uninformative features such as pixel intensities into a high-level, lower variance, more informative feature set (e.g. edges, visual patterns, etc.) this is referred to as feature extraction. Over the years, there have been a number of feature extraction mechanisms suggested such as edge detection with Sobel filters, histogram of oriented gradients (HOG), Haar-like features, Scale invariant features (SIFTS) and LBPH as you are trying to use. (Note that in most modern applications that are not computationally limited, convolutional neural networks (CNNs) are used for the feature extraction step because they empirically work much much better.
Use Transformed Features - once more useful information (a more informative set of features) has been extracted, you need to use these features to perform the reasoning operation you're hoping to accomplish. In this step, you fit a model (function approximator) such that given your high-level features as an input, the model outputs the information you want (in this case a classification on whether an image contains a face I think). Thus, you need to select and fit a model that can make use of the high-level features for classification. Some classic approaches to this include decision trees, support vector machines, and neural networks. Essentially, model fitting is a standard machine learning problem, and will require using a labelled set of training data to "teach" the model what the high-level feature set will look like for an image that contains a face, versus an image that does not.
It sounds like your code in its current state is missing the second piece. As a good starting place, look into using sci-kit learn's decision tree package.

how to find overlapping region between images in opencv?

I am trying to implement alpha blending with two images for image stitching .
My first image is this ->
here is my second image ->
here is my result image ->
As you can see the result is not proper.I think I first have to find the overlapping region between then and then implement alpha blending on the overlapping part.
First of all, have you seen a new "stitching" module introduced in OpenCV 2.3?
It provides a set of building blocks for stitching pipeline including blending and "finding an overlap" (e.g. registration) steps. Here is a documentation: http://docs.opencv.org/modules/stitching/doc/stitching.html and an example of stitching application: stitching_detailed.cpp
I recommend you to study the code of this sample for better understanding of the details.
Regarding the finding of overlap there are several common approaches in computer vision:
optical flow
template matching
feature matching
For your case I recommend the last one - it works very well on the photos. And this approach is already implemented in OpenCV - explore the OpenCv source and see how the cv::detail::BestOf2NearestMatcher works.
I think the most common approach is SIFT, find a few keypoints in both images, then warp them to get your result. See this
Here are explanations about SIFT and panorama stitching.

How to compare two .mp4 files?

I would like to compare two mp4 files, does somebody has an idea?
Maybe by interposing the video spectrum?
Thanks.
I had an idea for this a while back. I never implemented it, but it went something like this:
Get a good video library to do the heavy lifting for you, I like Aforge.NET
Use the library to walk through the video and extract bitmap frames, get a few hundred
Fix the resolution to a single aspect ratio
Reduce the images to something low-res like 16x16 or 64x64, using a nearest neighbor approach. This will blur the image such that two similar images will reduce to the same
Gather a chunk of these images by relative video timestamp and hash them to further reduce the data
Compare said hashes
Again, I never implemented this, so I don't know if it works, but the thing it has going for it is that video is very complex. While comparing any given frame to another won't work, based on different formats, resolutions, etc., the odds of a series of reduced hashes being the same from two different videos seems very low. Thus, few false positives. Also it seems like it could also tell you if one span of video was contained in another.
If I get around to making something like this I'll circle back here and post about it.

Duplicate image detection algorithms?

I am thinking about creating a database system for images where they are stored with compact signatures and then matched against a "query image" that could be a resized, cropped, brightened, rotated or a flipped version of the stored one. Note that I am not talking about image similarity algorithms but rather strictly about duplicate detection. This would make things a lot simpler. The system wouldn't care if two images have an elephant on them, it would only be important to detect if the two images are in fact the same image.
Histogram comparisons simply won't work for cropped query images. The only viable way to go I see is shape/edge detection. Images would first be somehow discretized, every pixel being converted to an 8-level grayscale for example. The discretized image will contain vast regions in the same colour which would help indicate shapes. These shapes then could be described with coefficients and their relative position could be remembered. Compact signatures would be produced out of that. This process will be carried out over each image being stored and over each query image when a comparison has to be performed. Does that sound like an efficient and realisable algorithm? To illustrate this idea:
removed dead ImageShack link
I know this is an immature research area, I have read Wikipedia on the subject and I would ask you to propose your ideas about such an algorithm.
SURF should do its job.
http://en.wikipedia.org/wiki/SURF
It is fast an robust, it is invariant on rotations and scaling and also on blure and contrast/lightning (but not so strongly).
There is example of automatic panorama stitching.
Check article on SIFT first
http://en.wikipedia.org/wiki/Scale-invariant_feature_transform
If you want to do a feature detection driven model, you could perhaps take the singular value decomposition of the images (you'd probably have to do a SVD for each color) and use the first few columns of the U and V matrices along with the corresponding singular values to judge how similar the images are.
Very similar to the SVD method is one called principle component analysis which I think will be easier to use to compare between images. The PCA method is pretty close to just taking the SVD and getting rid of the singular values by factoring them into the U and V matrices. If you follow the PCA path, you might also want to look into correspondence analysis. By the way, the PCA method was a common method used in the Netflix Prize for extracting features.
How about converting this python codes to C back?
Check out tineye.com They have a good system that's always improving. I'm sure you can find research papers from them on the subject.
The article you might be referring to on Wikipedia on feature detection.
If you are running on Intel/AMD processor, you could use the Intel Integrated Performance Primitives to get access to a library of image processing functions. Or beyond that, there is the OpenCV project, again another library of image processing functions for you. The advantage of a using library is that you can try various algorithms, already implemented, to see what will work for your situation.

How can I store a video with proper indexing

How can I store a video (either in database or file system) so that instead of starting streaming from starting I can start this streaming from any fix index.
Main aim is like I have a large video of roads of New York from one end to other and corresponding map of New York save on a central server. Now a user opens up the website and selects the two points on the map of New York and video of road between those two points starts streaming, not from starting but from first point to second point given by user.
So main requirement is to store a video with its indexes such that I can start streaming from any of the index.
Edited Part :
Actually I am planning how to store video of complete city so I can show it to user whenever he selects it on map.
So Now Main question in my mind is can I merge video for all roads in one video like various linked lists (Roads). Like if there are two turns at particular point then instead of storing two videos from that point for different path can I store them in a single video such that which video you have to play will depend upon starting and ending point selected by user and shortest path between those two points, But can I store video of all roads as a single video.
How can I do this, will it depend on stream mechanism or on storage ?
Thanks,
GG
I guess that this all depends on the capability of your playing/streaming mechanism. I would find out about these before determining how to store the file and/or "index" points. Ask some specific questions about your streaming technology, such as:
can you fast forward to a certain point?
can you stop at a certain point?
can you play one stream after one ends?
another play capabilities that may help solve this?
If you can trigger the playing of your video to fast forward to a certain point, you can store the amount of time or frames to fast forward from the beginning and associate these with your map start. You would also need to "abort" the stream at a certain point, that matches your map end point.
However, if you can not fast forward your stream, you many need to break your video file into smaller segments and start at the proper one based on the map point selected. You would then need to play multiple files until you reach the end point.

Resources