effect of normalizing on mel-spectrogram - speech

what is the effect of normalizing (using mean and std of each channel) on the mel-spectrogram?
this is a image of mel-spectrogram with 20 mel filters and 206 frames and this is a normalized mel-spectrogram with same shape. we see that some properties appeared after normalizing the mel-spectrogram.
does it remove some properties from the original mel-spectrogram?

Related

How to train a custom Object detector from scratch in tensorflow.js?

I followed multiple example, to train a custom object detector in TensorflowJS . The main problem I am facing every where it is using pretrained model.
Pretrained models are fine for general use cases, but custom scenario it fails. For example, take this this is example form official Tensorflowjs examples, here it is using mobilenet, and mobilenet and mobilenet has image size restriction 224x224 which defeats all the purpose, because my images are big and also not of same ratio so resizing is not an option.
I have tried multiple example, all follows same path oneway or another.
What I want ?
Any example by which I can train a custom objector from scratch in Tensorflow.js.
Although the answer sounds simple but trust me I searching for this for multiple days. Any help will be greatly appreciated. Thanks
Currently it is not yet possible to use tensorflow object detection api in nodejs. But the image size should not be a restriction. Instead of resizing, you can crop your image and keep only the part that contain your object to be detected.
One approach will be like partition the image in 224x224 and run for all partitions but what if the object is between two partitions
The image does not need to be partitioned for it. When labelling the image, you will need to know the x, y coordinates (from the top left) and the w, h of the detected box. You only need to crop a part of the image that will contain the box. Cropping at the coordinates x - (224-w)/2, y- (224-h)/2 can be a good start. There are two issues with these coordinates:
the detected boxes will always be in the center, so the training will be biaised. To prevent it, a randomn factor can be used. x - (224-w)/r , y- (224-h)/r where r can be randomly taken from [1-10] for instance
if the detected boxes are bigger than 224 * 224 maybe you might first choose to resize the video keeping it ratio before cropping. In this case the boxe size (w, h) will need to be readjusted according to the scale used for the resizing

.obj file format - alternates between different data types

I'm writing a method to parse the data in wavefront obj files and I understand the format for the most part, however some things are still a bit confusing to me. For instance, I would have expected most files to list all the vertices first, followed by the texture and normal map coordinates and then the face indices. However, some files that I have opened alternate between these different sections. For instance, one .obj file I have of the Venus de Milo (obtained here: http://graphics.im.ntu.edu.tw/~robin/courses/cg03/model/ ) starts off with the vertices (v), then does normal coordinates (vn), then faces (f), then defines more vertices, normals and faces again. Why is the file broken up into two sections like this? Why not list all the vertices up front? Is this meant to signify that there are multiple segments to the mesh? If so, how do I deal with this?
Because this is how the file format was designed. There is no requirement for a specific ordering of the data inside the OBJ, so each modelling package writes it in its own way. Here is one brief summary of the file format, if you haven't read this one yet.
That said, the OBJ format is quite outdated and doesn't support animation by default. It is useful for exchanging of static meshes between modelling tools but not much else. If you need a more robust and modern file format, I'd suggest taking a look at the Collada format or the FBX.
not an direct answer but it will be unreadable in comment
I do not use this file-format but mesh segmentation is usually done for these reasons:
more easy management of the model for editing
separation of parts of model with different material or texture properties
mainly to speed up the rendering by cut down unnecessary material or texture switching
if the mesh has dynamically moving parts then they must be separated
Most 3D mesh file formats contains also transform matrix for each mesh part and some even an skeleton hierarchy
Now how to handle segmented meshes:
if your engine supports only unsegmented models then merge all parts together
This will loose all the advantages of segmented mesh. Do not forget to apply transform matrices of sub segments before merging
or you can implement mesh segmentation into your model class
By adding model hierarchy , transform matrices , ...
Now how to handle mixed model fileformat:
scan file for all necessary chunks of data
remember if they are present
also store their size,and start address in file
and do not forget that there may be more that one chunk of the same data type
preallocate space for all data you need
load/merge all data you need
load chunks of data to you model classes or merge it to single model
of course check if all data needed id present like number of points match number of normals or texture coords ...

Sorting dicom dataset in Matlab

Hi have about 5000 2d images from multiple CT-scans and need to get them sorted. The dataset is directly imported from a GE workstation.
Right now the images are in bunches of about 10 sorted images at a time in some random order.
How could we get these images sorted? If you would suggest dicominfo please tell us exactly which parameter to go after.
Thank you!
How the DICOM CT images should be sorted is ultimately dependent on the usage context, but as a rule of thumb I would recommend that you first group the images based on (patient), study and series using these tags:
(0010,0020) Patient ID
(0020,000D) Study Instance UID
(0020,000E) Series Instance UID
To sort the images within one series, you could use the Instance Number (0020,0013), although there is no guarantee that this value is set since it is a type 2 attribute.
Another alternative is to use the Image Position (Patient) (0020,0032), which is mandatory in CT images. You would need to check the Image Orientation (Patient) (0020,0037) to decide how to sort on position. Often CT image orientation is (1,0,0), (0,1,0), and then the Z (third) component of the image position can be used as the basis for sorting.
If the series also contains a localizer image, this image would have to be excluded from positional sorting.

Detect signs on roads

I have a video which has got turn left,turn right etc marks on the roads.
I have to detect those signs.I am going ahead with template matching in which I am matching the edge detected outputs,But I am not getting satisfactory results,Is there any other way to detect it? Please help.
If you want a solution that is not too complicated but more robust than template matching, I suggest you'd go for Hough voting on SIFT descriptors. This method is provides some degree of robustness to various problems, including partial occlusion of the sign, illumination variations and deformations of the sign. In particular, the method is completely invariant to rotation and uniform scaling of the template object.
The basic idea of the algorithm is as follows:
a) extract SIFT features from the template and query images.
b) set an arbitrary reference point in the template image and calculate, for each keypoint in the template image, the vector from the keypoint to the reference point.
c) match keypoints from the template image to the query image.
d) cast a vote for each matched keypoint for all object locations in the query image that this keypoint agrees with. You do that using the vectors calculated in step (b) and the location, scale and orientation of the matched keypoints in the query image.
e) If the object is indeed located in the image, the votes map should have a strong local maximum at it's location.
f) Optionally, you can verify the detection by using template matching.
You can read more about that method on Wikipedia here or in the original paper (by D. Lowe) here.
Using SIFT or SURF. You can get the invariable descriptor with training you can determine if the vector that represent the road marks (turn left, right or stop) match with the new in the video.
You might try extracting features and training a classifier (linear discriminant, neural network, naive Bayes, etc.). There are many candidate features you might try, but I'd think that you wouldn't need anything too complicated, even if the edge detection is poor, assuming that isolation of the sign is good. Some features to consider are: horizontal and vertical projections (row and column totals) and simple statistics of edge pixels (mean, standard deviation, skewness, etc. For more feature ideas, see any of these books:
"Shape Classification and Analysis: Theory and Practice", by Costa and Cesar
"Algorithms for Image Processing and Computer Vision", by J. R. Parker
"Digital Image Processing", by Gonzalez and Woods

Information Modeling

The sensor module in my project consists of a rotating camera, that collects noisy information about moving objects in the surrounding environment.
The information consists of distance, angle and relative change of the moving objects..
The limiting view range of the camera makes it essential to rotate the camera periodically to update environment information...
I was looking for algorithms / ways to model these information, in order to be able to guess / predict / learn motion properties of these object..
My current proposed idea is to store last n snapshots of each object in a queue. I take weighted average of positions and velocities of moving object, but I think it is a poor method...
Can you state some titles that suit this case?
Thanks
Kalman {Extended, unscented, ... } filters and particle filters only after reading about Kalman filters.
Kalman filters learn and predict the correct data from noisy data with a Gaussian assumption, so it may be of use to you. If you need non-Gaussian methods, look at the particle filter.

Resources