Assume a workflow for 2D image feature extraction by using SIFT, SURF, or MSER methods followed by bag-of-words/features encoded and subsequently used to train classifiers.
I was wondering if there is an analogous approach for 3D datasets, for example, a 3D volume of MRI data. When dealing with 2D images, each image represents an entity with features to be detected and indexed. However, in a 3D dataset is it possible to extract features from the three-dimensional entity? Does this have to be done slice-by-slice, by decomposing the 3D images to multiple 2D images (slices)? Or is there a way of reducing the 3D dimensionality to 2D while retaining the 3D information?
Any pointers would be greatly appreciated.
You can perform feature extraction by passing your 3D volumes through a pre-trained 3D convolutional neural network. Because pre-trained 3D CNNs are hard to find, you could consider training your own on a similar, but distinct, dataset.
Here is a link for code for a 3D CNN in Lasagne. The authors use 3D CNN versions of VGG and Resnet.
Alternatively, you can perform 2D feature extraction on each slice of the volume and then combine the features for each slice, using PCA to reduce the dimensionality to something reasonable. For this, I recommend using ImageNet pre-trained Resnet-50 or VGG.
In Keras, these can be found here.
Assume a grey-scale 2D image which can mathematically be described as a matrix. Generalizing the concept of a matrix results in theory about tensors (informally you can think of a multidimensional array). I.e. a RGB 2D image is represented as a tensor of size [width, height, 3]. Further a RGB 3D Image is represented as a tensor of size [width, height, depth, 3]. Moreover and like in the case of matrices you can also perform tensor-tensor multiplications.
For instance consider the typical neural network with 2D images as input. Such a network does basically nothing else than matrix-matrix multiplications (despite of the elementwise non-linear operations at nodes). In the same way a neural network operates on tensors by performing tensor-tensor multiplications.
Now back to your question of feature extraction: Indeed the problem of tensors are their high dimensionality. Hence modern research problems regard the efficient decomposition of tensors retaining the initial (most meaningful) information. In order to extract features from tensors a tensor decomposition approach might be a good start in order to reduce the rank of the tensor. A few papers on tensors in machine learning are:
Tensor Decompositions for Learning Latent Variable Models
Supervised Learning With Quantum-Inspired Tensor Networks
Optimal Feature Extraction and Classification of Tensors via Matrix Product State Decomposition
Hope this helps, even though the math behind is not easy.
Related
I'm posting here because I'm at a bit of a loss
I'm trying to implement a solution to Maxwells equations (p47 2-2)
,
which is given in Spherical coordinates in C++ so it may be used in a larger modeling project. I'm using Eigen3 as a base for linear algebra, which as far as I can find doesn't explicitly support spherical coordinates (I'm open to alternatives)
To implement the solution I need (or at least i think i need) to define the spherical unit vectors as spherical coordinates however, as they're not constants like in Cartesian Coordinates and I don't understand how to do this.
I'm hesitant to convert the solution to Cartesian coordinates as I don't think I understand the implications of doing this (is it even valid?)
Any and all input and advice is appreciated
The solution, which seems obvious now I have found it, is to implement Spherical Unit Vector Identities as 3 functions (one for each unit vector) that takes r, Theta, and Phi as arguments and return a vector.
I found the new feature of Jaguar Database 3.0 is the geometric functions. I am wondering how does it works? What I can do with these objects
It is possible to store geometric object (2d and 3d) and use some basic utility functions afterwards.
You can calculate Area, Angle, Dimension etc.
You can do Union, Intersection, etc.
You can do Rotation or Scaling.
More details available in the UserManual
I'm doing a recognition problem (faces) and trying to reduce the problem size. I originally began with training data in a feature-wise coordinate system in 120 dimensions, but through PCA I found a better PC-wise coordinate system needing only 20 dimensions while still conveying 95% of the data.
I began thinking that recognition by definition is a problem of classification. Points in n-space belonging to the same object/face/whatever would cluster. To take an example, if 5 instances of the same individual are in the training data, they would cluster and the mid-point of that cluster could be numerically defined using k-means.
I have 100,000 observations, each person is represented by 5-10 headshots, this means instead of comparing a novel input to 100,000 points in my 20-space, I could instead compare to 10,000-20,000 centroids. Can k-means be used like this or have I misinterpreted? k is obviously undefined but I've been reading up on ways to find optimal k.
My specific recognition problem doesn't use neural nets but rather simple arithmetic euclidean distances between points.
Given two recorded voices in digital format, is there an algorithm to compare the two and return a coefficient of similarity?
I recommend to take a look into the HTK toolkit for speech recognition http://htk.eng.cam.ac.uk/, especially the part on feature extraction.
Features that I would assume to be good indicators:
Mel-Cepstrum coefficients (general timbre)
LPC (for the harmonics)
Given your clarification I think what you are looking for falls under speech recognition algorithms.
Even though you are only looking for the measure of similarity and not trying to turn speech into text, still the concepts are the same and I would not be surprised if a large part of the algorithms would be quite useful.
However, you will have to define this coefficient of similarity more formally and precisely to get anywhere.
EDIT:
I believe speech recognition algorithms would be useful because they do abstraction of the sound and comparison to some known forms. Conceptually this might not be that different from taking two recordings, abstracting them and comparing them.
From wikipedia article on HMM
"In speech recognition, the hidden
Markov model would output a sequence
of n-dimensional real-valued vectors
(with n being a small integer, such as
10), outputting one of these every 10
milliseconds. The vectors would
consist of cepstral coefficients,
which are obtained by taking a Fourier
transform of a short time window of
speech and decorrelating the spectrum
using a cosine transform, then taking
the first (most significant)
coefficients."
So if you run such an algorithm on both recordings you would end up with coefficients that represent the recordings and it might be far easier to measure and establish similarities between the two.
But again now you come to the question of defining the 'similarity coefficient' and introducing dogs and horses did not really help.
(Well it does a bit, but in terms of evaluating algorithms and choosing one over another, you will have to do better).
There are many different algorithms - the general name for this task is Speaker Identification - start with this Wikipedia page and work from there: http://en.wikipedia.org/wiki/Speaker_recognition
I'm not sure this will work for soundfiles, but it gives you an idea how to proceed i hope. That is a basic way how to find a pattern (image) in another image.
You first have to calculate the fft of both the soundfiles and then do a correlation. In formular it would look like (pseudocode):
fftSoundFile1 = fft(soundFile1);
fftConjSoundFile2 = conj(fft(soundFile2));
result_corr = real(ifft(soundFile1.*soundFile2));
Where fft= fast Fourier transform, ifft = inverse, conj = conjugate complex.
The fft is performed on the sample values of the soundfiles.
The peaks in the result_corr vector will then give you the positions of high correlation.
Note that both soundfiles must in this case be of the same size-otherwise you have to place the shorter one into a file of max(soundFileLength) vector.
Regards
Edit: .* means (in matlab style) a component wise mult, you must not do a vector mult!
Next Edit: Note that you have to operate with complex numbers - but there are several Complex classes out there so I think you don't have to bother about this.
I am trying to implement a vision algorithm, which includes a prefiltering stage with a 9x9 Laplacian-of-Gaussian filter. Can you point to a document which explains fast filter implementations briefly? I think I should make use of FFT for most efficient filtering.
Are you sure you want to use FFT? That will be a whole-array transform, which will be expensive. If you've already decided on a 9x9 convolution filter, you don't need any FFT.
Generally, the cheapest way to do convolution in C is to set up a loop that moves a pointer over the array, summing the convolved values at each point and writing the data to a new array. This loop can then be parallelised using your favourite method (compiler vectorisation, MPI libraries, OpenMP, etc).
Regarding the boundaries:
If you assume the values to be 0 outside the boundaries, then add a 4 element border of 0 to your 2d array of points. This will avoid the need for `if` statements to handle the boundaries, which are expensive.
If your data wraps at the boundaries (ie it is periodic), then use a modulo or add a 4 element border which copies the opposite side of the grid (abcdefg -> fgabcdefgab for 2 points). **Note: this is what you are implicitly assuming with any kind of Fourier transform, including FFT**. If that is not the case, you would need to account for it before any FFT is done.
The 4 points are because the maximum boundary overlap of a 9x9 kernel is 4 points outside the main grid. Thus, n points of border needed for a 2n+1 x 2n+1 kernel.
If you need this convolution to be really fast, and/or your grid is large, consider partitioning it into smaller pieces that can be held in the processor's cache, and thus calculated far more quickly. This also goes for any GPU-offloading you might want to do (they are ideal for this type of floating-point calculation).
Here is a theory link
http://hebb.mit.edu/courses/9.29/2002/readings/c13-1.pdf
And here is a link to fftw, which is a pretty good FFT library that I've used in the past (check licenses to make sure it is suitable) http://www.fftw.org/
All you do is FFT your image and kernel (the 9x9 matrix). Multiply together, then back transform.
However, with a 9x9 matrix you may still be better doing it in real coordinates (just with a double loop over the image pixels and the matrix). Try both ways!
Actually you don't need to use a FFT size large enough to hold the entire image. You can do a lot of smaller overlapping 2d ffts. You can search for "fast convolution" "overlap save" "overlap add".
However, for a 9x9 kernel. You may not see much advantage speedwise.