How to read a matrix from a file in Chapel - sparse-matrix

This time I have a matrix --IN A FILE-- called "matrix.csv" and I want to read it in. I can do it in two flavors, dense and sparse.
Dense
matrix.csv
3.0, 0.8, 1.1, 0.0, 2.0
0.8, 3.0, 1.3, 1.0, 0.0
1.1, 1.3, 4.0, 0.5, 1.7
0.0, 1.0, 0.5, 3.0, 1.5
2.0, 0.0, 1.7, 1.5, 3.0
Sparse
matrix.csv
1,1,3.0
1,2,0,8
1,3,1.1
// 1,4 is missing
1,5,2.0
...
5,5,3.0
Assume the file is pretty large. In both cases, I want to read these into a Matrix with the appropriate dimensions. In the dense case I probably don't need to provide meta-data. In the second, I was thinking I should provide the "frame" of the matrix, like
matrix.csv
nrows:5
ncols:5
But I don't know the standard patterns.
== UPDATE ==
It's a bit difficult to find, but the mmreadsp can change your day from "Crashing the server" to "done in 11 seconds". Thanks to Brad Cray (not his real name) for pointing it out!

Preface
Since Chapel matrices are represented as arrays, this question is equivalent to:
"How to read an array from a file in Chapel".
Ideally, a csv module or a specialized IO-formatter (similar to JSON formatter) would handle csv I/O more elegantly, but this answer reflects the array I/O options available as of Chapel 1.16 pre-release.
Dense Array I/O
Dense arrays are the easy case, since DefaultRectangular arrays (the default type of a Chapel array) come with a .readWriteThis(f) method. This method allows one to read and write an array with built-in write() and read() methods, as shown below:
var A: [1..5, 1..5] real;
// Give this array some values
[(i,j) in A.domain] A[i,j] = i + 10*j;
var writer = open('dense.txt', iomode.cw).writer();
writer.write(A);
writer.close();
var B: [1..5, 1..5] real;
var reader = open('dense.txt', iomode.r).reader();
reader.read(B);
reader.close();
assert(A == B);
The dense.txt looks like this:
11.0 21.0 31.0 41.0 51.0
12.0 22.0 32.0 42.0 52.0
13.0 23.0 33.0 43.0 53.0
14.0 24.0 34.0 44.0 54.0
15.0 25.0 35.0 45.0 55.0
However, this assumes you know the array shape in advance. We can remove this constraint by writing the array shape at the top of the file, as shown below:
var A: [1..5, 1..5] real;
[(i,j) in A.domain] A[i,j] = i + 10*j;
var writer = open('dense.txt', iomode.cw).writer();
writer.writeln(A.shape);
writer.write(A);
writer.close();
var reader = open('dense.txt', iomode.r).reader();
var shape: 2*int;
reader.read(shape);
var B: [1..shape[1], 1..shape[2]] real;
reader.read(B);
reader.close();
assert(A == B);
Now, dense.txt looks like this:
(5, 5)
11.0 21.0 31.0 41.0 51.0
12.0 22.0 32.0 42.0 52.0
13.0 23.0 33.0 43.0 53.0
14.0 24.0 34.0 44.0 54.0
15.0 25.0 35.0 45.0 55.0
Sparse Array I/O
Sparse arrays require a little more work, because DefaultSparse arrays (the default type of a sparse Chapel array) only provide a .writeThis(f) method and not a .readThis(f) method as of Chapel 1.16 pre-release. This means we have builtin support for writing sparse arrays, but not reading them.
Since you specifically requested csv format, we'll do sparse arrays in csv:
// Create parent domain, sparse subdomain, and sparse array
const D = {1..10, 1..10};
var spD: sparse subdomain(D);
var A: [spD] real;
// Add some non-zeros:
spD += [(1,1), (1,5), (2,7), (5, 4), (6, 6), (9,3), (10,10)];
// Set non-zeros to 1.0 (to make things interesting?)
A = 1.0;
var writer = open('sparse.csv', iomode.cw).writer();
// Write shape
writer.writef('%n,%n\n', A.shape[1], A.shape[2]);
// Iterate over non-zero indices, writing: i,j,value
for (i,j) in spD {
writer.writef('%n,%n,%n\n', i, j, A[i,j]);
}
writer.close();
var reader = open('sparse.csv', iomode.r).reader();
// Read shape
var shape: 2*int;
reader.readf('%n,%n', shape[1], shape[2]);
// Create parent domain, sparse subdomain, and sparse array
const Bdom = {1..shape[1], 1..shape[2]};
var spBdom: sparse subdomain(Bdom);
var B: [spBdom] real;
// This is an optimization that bulk-adds the indices. We could instead add
// the indices directly to spBdom and the value to B[i,j] each iteration
var indices: [1..0] 2*int,
values: [1..0] real;
// Variables to be read into
var i, j: int,
val: real;
while reader.readf('%n,%n,%n', i, j, val) {
indices.push_back((i,j));
values.push_back(val);
}
// bulk add the indices to spBdom and add values to B element-wise
spBdom += indices;
for (ij, v) in zip(indices, values) {
B[ij] = v;
}
reader.close();
// Sparse arrays can't be zippered with anything other than their domains and
// sibling arrays, so we need to do an element-wise assertion:
assert(A.domain == B.domain);
for (i,j) in A.domain {
assert(A[i,j] == B[i,j]);
}
And sparse.csv looks like this:
10,10
1,1,1
1,5,1
2,7,1
5,4,1
6,6,1
9,3,1
10,10,1
MatrixMarket Module
Lastly, I'll mention that there is a MatrixMarket package module that supports dense & sparse array I/O using the matrix market format. This is currently not shown on the public documentation, because it is intended to be moved out as a standalone package once the package manager is reliable enough, but you can use it in your chapel programs with use MatrixMarket;, currently.
Here is the source code, which includes documentation for the interface as comments.
Here are the tests, if you prefer to learn from example, rather than documentation & source code.

A tribute to prof. Rudolf Zitny & prof. Petr Vopenka
( if one happens to remember the PC Tools utility, the Matrix Tools, pioneered and authored by prof. Zitny, were similarly indispensable for smart abstract-representations of large scale F77 FEM matrices, using COMMON-block and similar tricks for large and sparse-matrix efficient storage & operations in numerical-processing projects ... )
Observation:
I cannot disagree more with the last remark on a need to have the "frame", so as to build a sparse matrix.
Matrix is always just an interpretation of some formalism.
While sparse-matrix share the same view on a matrix, as an interpretation, the implementation of each of such module is always strictly based on some concrete representation.
Different kinds of sparsity are always handled using different cells-layout-strategy ( the trick is to use a minimum-needed [SPACE] for cell-elements, while yet having some acceptable processing [TIME] overhead, when trying to perform classical matrix/vector operations on such matrix ( typically without user knowing or "manually" bothering with the underlying sparse-matrix representation, that was used for storing the cell values, and how is that being optimally decoded / translated into a target-sparse-matrix's representation ).
Put it visually, the Matrix Tools will show you each of the representations as compact as possible in their best-possible memory-layouts ( very like in the PC Tools it had compressed your Hard-Disk, laying sector-data so as to avoid any un-necessary non-contiguous HDD-capacity get wasted ) and the very ( type-by-type specific ) representation-aware handler will then provide any external observer the complete illusion, needed for an assumed matrix interpretation ( during the phase of computing ).
So let's realise first, that not knowing all the details about the platform-specific rules, used for a sparse-matrix representation, both on the source-side ( python-?, JSON-meta-payload-?, etc ) and on the chapel target-side ( LinearAlgebra ver-1.16 being yet confirmed not to be public ( W.I.P. ), there is not much to start to implement.
The actual materialisation of a ( yet un-known ) sparse-matrix representation ( be it a file://, a DMA-access or a CSP-channel or any other means of a Non-InRAM storage or an InRAM memory-map ) does not change the solution of cross-representation xlator a single bit.
As a matematician, you may enjoy the concept of representation being less a Cantor-set driven ( running into (almost) infinite, dense enumerations ) objects, but rather using Vopenka's Alternative Set Theory ( so lovely introduced with in-depth both historical and mathematical contexts in Vopenka's "Meditations About The Bases of Science" ) that has brought and polished much closer views on these very situations with a yet changing Horizon-of-Definition ( caused not only by an actual sharpness of observers view, but in a much broader and general sense of such a principle ), leaving pi-class and sigma-class semi-sets ready for continuous handling of emerging new details, as they come into our recognised part of the view ( once appearing "in front" of the Horizon-of-Definition ) about the observed ( and mathematicised ) phenomenon.
Sparse-matrices ( as a representation ) help us build the interpretation we need, so as to use the so far acquired data-cells in further processing "as a matrix".
This said, the workflow always needs to know a-priori:
a) the constraints and rules used in the sparse-matrix source-system's representation
b) the additional constraints a mediation-channel imposes ( expressivity, format, self-healing/error-prone ) irrespective of it being a file, a CSP-channel or a ZeroMQ / nanomsg smart-socket signalling- / messaging-plane distributed agent infrastructure
c) the constraints and rules imposed in the target-system's representation, setting rules for defining / loading / storing / further handling & computing that a sparse-matrix type of one's choice has to meet / follow in the target computing eco-system
Not knowing the a) would introduce unnecessarily large overheads on preparing the strategy for both a successful and efficient cross-representation pipeline i.e. for translating the common interpretation from source-side representation for entering the b). Ignoring the c) would always cause a penalty - to pay additional overheads in target-eco-system during the b)'s-mediated reconstruction of a communicated-interpretation onto the target-representation.

Related

Array assembly and StaticArrays under Julia: Why is my performance so bad?

I need to prepare "flattened" versions of 2D fftfrequencies in the shape Nx^2 * 2. Those are basically constructed like a ravel(meshgrid(fftfreqs1d,fftfreqs1d)) in matlab or python.
This appears to be no big deal in python, but can hang for reasonable array sizes in julia, especially when i want to build a StaticArray out of the intermediate results. To make it more confusing, #btime pretends that my arrays are created in no time, while they are clearly not.
My question is why this happens and how it is done right.
I am aware that using julia it might be a waste to keep the full 2D fftfreqs in memory instead of using the 1D versions and a loop, but let us assume for a moment that i need it this way.
Julia
function my_freqs1(Nnu::Int,T)
dx = 2. /Nnu
freq1d = fftfreq(Nnu).*dx
nu = hcat( vec([ i for i in freq1d, j in freq1d ]),
vec([ j for i in freq1d, j in freq1d ]))
return nu
end;
#btime my_freqs1(100,Float64)
28.528 μs (10 allocations: 312.80 KiB)
Julia, converting to a static array (in the hope for better performance of other code later on)
function my_freqs2(Nnu::Int,T)
### the same as above ###
return SMatrix{Nnu^2,2,T}(nu)
end;
#btime my_freqs2(100,Float64)
94.540 μs (36 allocations: 470.38 KiB)
Python
def my_fftfreqs(xy):
freqs = np.fft.fftfreq(np.shape(xy)[0],d=xy[1]-xy[0])
fx,fy = np.meshgrid(freqs,freqs,indexing="ij")
freq_list = np.transpose(np.asarray( [np.ravel(fx),np.ravel(fy)] ))
return freq_list
%time f=my_fftfreqs(np.linspace(0,1,100));
CPU times: user 1.08 ms, sys: 0 ns, total: 1.08 ms
Wall time: 600 µs
My observation is that while python %time reports a much longer time, it will actually run in a very reasonable time while the julia version has a noticable delay and the version with the static array will hang for a long time and completely crash for larger sizes.
Please help me to understand how i would do this correctly in Julia and whether (why not?) creating a static array seems to be such a bad idea.
Rather than making a SMatrix{Nnu^2,2} I think you probably want to make a Vector{SVector{2}}. The former will require recompiling for each new value of Nnu which is fairly inefficient.
You may also consider:
using FFTW
my_freqs3(ν) = fftfreq(ν)*2/ν |>
(w -> [repeat(w, inner=length(w)) repeat(w, outer=length(w))])
# or
my_freqs3alt(ν) = ( w = fftfreq(ν)*2/ν ;
[repeat(w, inner=length(w)) repeat(w, outer=length(w))] )
which is more Julian and "if-I-understand-correctly" is equivalent.
Usually shorter/simpler functions are also more efficient.
Julia features used:
Unicode nu variable.
Piping |> operator.
Definition with no function keyword.
repeat standard library vector filling function.
Matlab-like hcat [v1 v2] notation.
Multi-statement block enclosed in ( ) separated by ;.

Median of multiple arrays in Julia

It's related to this question
I want to know how to calculate median along specific dimension on huge array, for example with size (20, 1920, 1080, 3). I not sure whether there is any practical purpose but I just wanted to check how well median works in Julia.
It takes ~0.5 seconds to calculate medians on (3,1920,1080,3) with numpy. It works very fast on zeros array (less than 2 seconds on (120, 1920, 1080,3)) and works not so fast but fine on real images (20 seconds on (120, 1920, 1080,3)).
Python code:
import cv2
import sys
import numpy as np
import time
ZEROES=True
N_IMGS=20
print("n_imgs:", N_IMGS)
print("use dummy data:", ZEROES)
imgs_paths = sys.argv[1:]
imgs_paths.sort()
imgs_paths_sparse = imgs_paths[::30]
imgs_paths = imgs_paths_sparse[N_IMGS]
if ZEROES:
imgs_arr = np.zeros((N_IMGS,1080,1920,3), dtype=np.float32)
else:
imgs = map(cv2.imread, imgs_paths)
imgs_arr = np.array(list(imgs), dtype=np.float32)
start = time.time()
imgs_median = np.median(imgs_arr, 0)
end = time.time()
print("time:", end - start)
cv2.imwrite('/tmp/median.png', imgs_median)
In julia I can only calculate median of (3, 1920, 1080,3). After that my earlyoom process kills julia process because of huge amount of used memory.
I tried approach similar to what I tried first on max:
function median1(imgs_arr)
a = imgs_arr
b = reshape(cat(a..., dims=1), tuple(length(a), size(a[1])...))
imgs_max = Statistics.median(b, dims=1)
return imgs_max
end
Or even more simple case:
import Statistics
a = zeros(3,1080,1920,3)
#time Statistics.median(a, dims=1)
10.609627 seconds (102.64 M allocations: 2.511 GiB, 3.37% gc time)
...
So, it takes 10 seconds vs 0.5 seconds on numpy.
I have only 4 CPU cores and it's not simply parallelization.
Is there more or less simple way to optimize it somehow?
Or at least take slices and compute it one-by-one without overuse of memory?
It's hard to know if the fact that the images are loaded separately is a key part of the problem here or not since the setup for the problem in Julia is missing and it's a bit hard for Julia programmers to follow the Python setup or know how much we need to match it. You either need to:
Load or move the image data so that they are, in fact, part of the same array and then take the median of that;
Make a set of spatially unrelated values in different arrays abstractly behave as though they are part of a single array and then take the median of that collection via a method that's generic enough to handle this abstraction.
Fredrik's answer implicitly assumes that you have already loaded the image data so that they're all part of the same contiguous array. If that's the case, however, then you don't even need JuliennedArrays, you can just use the median function from the Statistics stdlib:
julia> a = rand(3, 1080, 1920, 3);
julia> using Statistics
julia> median(a, dims=1)
1×1080×1920×3 Array{Float64,4}:
[:, :, 1, 1] =
0.63432 0.205958 0.216221 0.571541 … 0.238637 0.285947 0.901014
[:, :, 2, 1] =
0.821851 0.486859 0.622313 … 0.917329 0.417657 0.724073
If you can load the data like this, it's the best approach—this is by far the most efficient representation of a bunch of same-sized images and makes vectorize operations across images easy and efficient. The first dimension is the most efficient one to do operations across because Julia is column-major, so the first dimension (columns) is stored contiguously.
The best way to get the images into contiguous memory is to pre-allocate an uninitialized array of the right type and dimensions and then read the data into the array using some in-place API. For some reason your Julia code appears to have loaded the images as a vector of individual arrays while your Python code seems to have loaded all of the images into a single array?
The approach of reshaping and concatenating is an extreme case of the second approach where you move all of the data all at once before then applying a vectorized median operation. Obviously, that involves moving a lot of data around, which is pretty inefficient.
Due to memory locality, it may be more efficient to copy a single slice of the data into a temporary array and compute the median of that. That can be done pretty easily with an array comprehension:
julia> v_of_a = [rand(1080, 1920, 3) for _ = 1:3]
3-element Array{Array{Float64,3},1}:
[0.7206652600431633 0.7675119703509619 … 0.7117084561740263 0.8736518021960584; 0.8038479801395197 0.3159392943734012 … 0.976319025405266 0.3278606124069767; … ; 0.7424260315304789 0.4748658164109498 … 0.9942311708400311 0.37048961459068086; 0.7832577306186075 0.13184454935145773 … 0.5895094390350453 0.5470111170897787]
[0.26401298651503025 0.9113932653115289 … 0.5828647778524962 0.752444909740893; 0.5673144007678044 0.8154276504227804 … 0.2667436824684424 0.4895443896447764; … ; 0.2641913584303701 0.16639100493266934 … 0.1860616855126005 0.04922131616483538; 0.4968214514330498 0.994935452055218 … 0.28097239922248685 0.4980189891952156]
julia> [median(a[i,j,k] for a in v_of_a) for i=1:1080, j=1:1920, k=1:3]
1080×1920×3 Array{Float64,3}:
[:, :, 1] =
0.446895 0.643648 0.694714 … 0.221553 0.711708 0.225268
0.659251 0.457686 0.672072 0.731218 0.449915 0.129987
0.573196 0.328747 0.668702 0.355231 0.656686 0.303168
0.243656 0.702642 0.45708 0.23415 0.400252 0.482792
Try JuliennedArrays.jl
julia> a = zeros(3,1080,1920,3);
julia> using JuliennedArrays
julia> #time map(median, Slices(a,1));
0.822429 seconds (6.22 M allocations: 711.915 MiB, 20.15% gc time)
As Stefan commented below, the built in median does the same thing, but much slower
julia> #time median(a, dims=1);
7.450394 seconds (99.80 M allocations: 2.368 GiB, 4.47% gc time)
at least as of julia> VERSION v"1.5.0-DEV.876"

Why does importing the numpy zeros function fail for parallelization using numba?

According to the Numba docs, numpy array creation functions zeros and ones should be supported. However, testing this with simple functions leads to a nopython error when I import the zeros function from numpy. However, if I do import numpy as np and use np.zeros, there is no problem. Is there some difference in the functions I'm getting from numpy? I'd prefer only to import the functions I need, rather than the entire numpy library.
This code snippet fails:
from numpy import array
from numpy import zeros
from numpy.random import rand
from numba import njit, prange
# #njit()
#njit(parallel=True)
def prange_test(A):
s = 0
z = zeros((3, 3))
for i in prange(A.shape[0]):
s += A[i]
return s
A = rand(10)
test = prange_test(A)
This code snippet works:
from numpy import array
from numpy.random import rand
from numba import njit, prange
import numpy as np
#njit(parallel=True)
def prange_test(A):
s = 0
z = np.zeros((3, 3))
for i in prange(A.shape[0]):
s += A[i]
return s
A = rand(10)
test = prange_test(A)
I'm using Numba version 0.35.0, Numpy version 1.13.2
Let's go step by step
a ) the #numba.njit( parallel = True ) decorator's parallel option is (cit.) "experimental" in its efforts to auto-detect chances in the code to introduce some form of parallelism.
b ) the code is almost exactly the code-snippet from numba documentation, using almost exactly the same prange()-constructor code-block, but inside an #autojit decorated example:
from numba import autojit, prange
#autojit
def parallel_sum(A):
sum = 0.0
for i in prange(A.shape[0]):
sum += A[i]
return sum
c ) error message reports problems inside almost with such auto-detect transformation related to the line 12 which only weakly referenced might be s += A[i], referring to some kind of a problem inside the "automated-understanding" of the intent expressed in the Intermediate Representation of the code-block, where the prange-index ought be used - Var($parfor_index_tuple_var.14) but some type-related or tuple-decoupling-related problem was not able to get resolved by numba.jit-LLVM translator. Yet, the traceback also mentions call_parallel_gufunc to have problems to detect the upper bound of the prange-constructor stop = load_range( stop ), whereas the numba documentation so far mentions that only CPU-directed parallel-code is supported ( not any { GPU | guvectorize | et al }-non-CPU-kernel(s) ), here a better documented MCVE altogether with matching error Traceback would be appreciated, instead of a weakly referring PNG-picture.
d ) last but not least, the numba requires as a mandatory step in the documentation the parallel=True to be used only (cit.) "in conjunction with nopython=True"
How to proceed?
1 ) test the above copied numba-published code as-is, to see, whether the newer release of numba still keeps all the promises that were already working in the previous releases. I.e. use #numba.autojit-decorator and re-run the exact code copy to { POSACK | NACK }-this test.
2 ) test the code, POSACK-ed from step 1, this time under #numba.njit( parallel = True, nopython = True ) decorator ( no other change except the decorator ) to
{ POSACK | NACK }-influence of the decorator-policy.
3 ) test the code, POSACK-ed from step 2, this time with other modifications
Conceptual remarks:
With all due respect to the numba-team, there could hardly be a worse example of parallel and prange() anti-pattern than this one.
Besides the awfully immense overhead costs of the [PAR]-process section setup and an absolutely nothing to efficiently compute in parallel ( just notice the actual value dependency-graph .. ) the criticism on the Amdahl's Law initial, add-on overheads-agnostic, formulation shows how much one can pay for principally just worse than original performance. Parallel process scheduling typically has exactly the opposite motivation.
If indeed interested in smarter code-execution, use numba.jit having much better performance/cost ratio:
shave off any residual type-analyses related parts of the IR-code using explicit announcements of the calling-interface signatures
avoid memory allocations inside the performance-tuned code, rather pre-allocate and pass as another parameter
extend calling interface, so as to avoid things well known at the caller side to be deferred into the numba-automated code-analyses
#numba.jit( 'float64( float64[:], int64, float64[:,:] )', nogil = True, nopython = True )
def prange_test( vectorA, #
vectorAshape0, # avoids numba-code to speculate on type
arrayZ # avoids "local" new memory allocation
):
sum = 0
...
return sum
Performance?
from zmq import Stopwatch; aClk = Stopwatch()
def a_just_vectorised_sum( vectorA ):
return vectorA.sum()
A = np.random.rand( 1000000 )
aClk.start(); s = a_just_vectorised_sum( A ); aClk.stop()
1145L
1190L
1188L
Benchmark. Always. Always on a real-world sized dataset. Never rely on a schoolbook sized artifacts, but go into real-world scales.
Results show that the 1.000.000 cell-sized vector took about 1,200 [us] ~ 0.0012 [s] to sum(), leaving less than about 1.2 [ns] per cell sum()-ed this sets a yardstick to compare any other implementation against.

F# negative indices in array

In my application there is a need to precompute and keep trigonometric function values for some particular angle parameters, the range varies from -90 to 180 degree.
I can create arrays(one for each sine, cos etc) which will store value for -90 angle on 0th index and while retrieving I can subtract 90 from the index.
but is there any other way in F# to specify range of index, if we want to use [-90 .. 180]
so that I can have more meaningful implementation.
considering alternate solution, will usage of dictionary be as fast as usage of simple 2D arrays.
If I understand well your problem you would need to retrieve precomputed values by the key/index which is a given angle going from -90 to 180. Something like this ?
let value = precomputed.[-90]
You could use Map for that. F# maps are implemented as immutable AVL trees, an efficient data structure which forms a self-balancing binary tree. This can be very efficient if you have a precomputed data and you need to look up by key fre­quently. Its immutabil­ity in this case ensures that the sta­tic data can­not be mod­i­fied by mis­take and has lit­tle impact to per­for­mance as you never need to mutate it once initialized. However if you need to modify it frequently I would advice you to use a regular .NET Dictionary because they are based on hashtable which has a better performance than AVL trees.
You could turn the list into the map where the key would be the angle and the value would be the precomputed one :
let precomputedValus f =
[for i in -90..180 ->
i, f(i)]
|> Map.ofList
Where f is the function doing the precomputation. So you obtain your precomputed map for every angle something like that.
let sinValues = precomputedValus (fun e -> sin (float e))
And you can access the procomputed sin value like that
> sinValues.[-90];;
val it : float = -0.8939966636
A little index arithmetic will be of use:
let inline idx i = (i + 270) % 270
since it's inline the overhead is going to be very, very small. And you can just use myArray.[idx -90]. (you might have to write different modulo values, but you get the picture)
The easiest way is to simply make some function which given some i returns a pre-computed value of sin i:
let array_table =
let a = Array.init 271 (fun i -> sin <| float (i-90))
fun i -> a.[i+90]
To lookup the sine of, say, 42, you simply do table 42.
anushri and Tomasz both mention using Maps instead of Arrays, but in my experience, these are not good candidates for storing precomputed values, as they are much slower than Arrays. Let's try:
let map_table =
let m = Seq.init 271 (fun i -> i-90, sin <| float i) |> Map.ofSeq
fun i -> Map.find i m
let no_table =
fun i -> sin (float i)
// Benchmarking code omitted (100000 lookups of each value in -90..270)
When I run this, array_table is roughly 8 times faster than no_table and 22 times faster than map_table:
> fsharpi --optimize+ memo.fsx
map_table
Real: 00:00:02.922, CPU: 00:00:02.924, GC gen0: 4, gen1: 0
no_table
Real: 00:00:01.091, CPU: 00:00:01.091, GC gen0: 3, gen1: 0
array_table
Real: 00:00:00.130, CPU: 00:00:00.130, GC gen0: 3, gen1: 0

How to train a Support Vector Machine(svm) classifier with openCV with facial features?

I want to use the svm classifier for facial expression detection. I know opencv has a svm api, but I have no clue what should be the input to train the classifier. I have read many papers till now, all of them says after facial feature detection train the classifier.
so far what I did,
Face detection,
16 facial points calculation in every frame. below is an output of facial feature detection![enter image description
A vector which holds the features points pixel address
Note: I know how I can train the SVM only with positive and negative images, I saw this codehere, But I don't know how I combine the facial feature information with it.
Can anybody please help me to start the classification with svm.
a. what should be the sample input to train the classifier?
b. How do I train the classifier with this facial feature points?
Regards,
the machine learning algos in opencv all come with a similar interface. to train it, you pass a NxM Mat offeatures (N rows, each feature one row with length M) and a Nx1 Mat with the class-labels. like this:
//traindata //trainlabels
f e a t u r e 1
f e a t u r e -1
f e a t u r e 1
f e a t u r e 1
f e a t u r e -1
for the prediction, you fill a Mat with 1 row in the same way, and it will return the predicted label
so, let's say, your 16 facial points are stored in a vector, you would do like:
Mat trainData; // start empty
Mat labels;
for all facial_point_vecs:
{
for( size_t i=0; i<16; i++ )
{
trainData.push_back(point[i]);
}
labels.push_back(label); // 1 or -1
}
// now here comes the magic:
// reshape it, so it has N rows, each being a flat float, x,y,x,y,x,y,x,y... 32 element array
trainData = trainData.reshape(1, 16*2); // numpoints*2 for x,y
// we have to convert to float:
trainData.convertTo(trainData,CV_32F);
SVM svm; // params omitted for simplicity (but that's where the *real* work starts..)
svm.train( trainData, labels );
//later predict:
vector<Point> points;
Mat testData = Mat(points).reshape(1,32); // flattened to 1 row
testData.convertTo(testData ,CV_32F);
float p = svm.predict( testData );
Face gesture recognition is a widely researched problem, and the appropriate features you need to use can be found by a very thorough study of the existing literature. Once you have the feature descriptor you believe to be good, you go on to train the SVM with those. Once you have trained the SVM with optimal parameters (found through cross-validation), you start testing the SVM model on unseen data, and you report the accuracy. That, in general, is the pipeline.
Now the part about SVMs:
SVM is a binary classifier- it can differentiate between two classes (though it can be extended to multiple classes as well). OpenCV has an inbuilt module for SVM in the ML library. The SVM class has two functions to begin with: train(..) and predict(..). To train the classifier, you give as in input a very large amount of sample feature descriptors, along with their class labels (usually -1 and +1). Remember the format OpenCV supports: every training sample has to be a row-vector. And each row will have one corresponding class label in the labels vector. So if you have a descriptor of length n, and you have m such sample descriptors, your training matrix would be m x n (m rows, each of length n), and the labels vector would be of length m. There is also a SVMParams object that contains properties like SVM-type and values for parameters like C that you'll have to specify.
Once trained, you extract features from an image, convert it into a single row format, and give to predict() and it'll tell you which class it belongs to (+1 or -1).
There's also a train_auto() with similar arguments with a similar format that gives you the optimum values of the SVM parameters.
Also check this detailed SO answer to see an example.
EDIT:
Assuming you have a Feature Descriptor that returns a vector of features, the algorithm would be something like:
Mat trainingMat, labelsMat;
for each image in training database:
feature = extractFeatures( image[i] );
Mat feature_row = alignAsRow( feature );
trainingMat.push_back( feature_row );
labelsMat.push_back( -1 or 1 ); //depending upon class.
mySvmObject.train( trainingMat, labelsMat, Mat(), Mat(), mySvmParams );
I don't presume that extractFeatures() and alignAsRow() are existing functions, you might need to write them yourself.

Resources