How to convert an arbitrary dataset to the siamese network dataset format in caffe? - dataset

I have a dataset that I have created of gray scale images which i want to use with the siamese network example in caffe in which the documentation uses mnist dataset. I want to replace the mnist dataset with my own dataset
I see that for doing this I need my dataset to be in the format required by the siamese netwrk. This can be created using the '' which loads the mnist dataset in the idx3-ubyte format and creates a dataset lmdb database with two images and a matching/non matching label in each location of the lmdb database.
So I figured for me to use the '' script, my dataset also needs to be in the idx-ubyte format. I tried to convert my dataset to the idx-ubyte format using 'mnisten'. However i get the error 'error:total images are less than num_tests'. I guess the script is not identifying my images. The folder structure of the dataset is like this:
- subfolder
- subfolder
-txt file
parent directory name - 'generated dataset'
subfolders - 1 ,2 ,3 ... (the subfolders are titled 1 - 30 as I want to label the data in each subfolder by the name of the subfolder)
The txt file contains image title on each row with the class label.
How do I work with my dataset on the siamese network in caffe? Is there a direct way to convert my dataset to the lmdb format for the siamese network? Or do I have to use mnisten? If I do then how do I fix my error ? Anu help will be much appreciated. Thanks.

You don't need to use the exact same format - this is just a tutorial.... All you need to do is provide one or multiple data layers, with a total of three top Blobs: data, data_p, and sim. You can do that in any way you'd like, e.g. LMDB (like in the MNIST example), HDF5, or whatever.
General explanation
In the tutorial, they further show and easy way to load the image pairs: you concatenate two images in the channel dimension. For gray-scale, you take two input images, where each has for example the dimension [1, 1, 28, 28] (i.e. 1 image, 1 channel, 28x28 resolution). Then you concatenate them to be one image of size [1, 2, 28, 28] and save them e.g. to an LMDB.
In the network, the first step after loading the data is a "Slice" layer, which takes this image, and slices it (i.e. it splits it up) along that axis, thus creating two Top blobs, data and data_p.
 How to create the data files?
There is no single right way to do that. The code from the tutorial is only for the MNIST set, so unless you have the exact same format, you can't use it without changes. You have a couple of possibilities:
Convert your images to the MNIST-format. Then, the code from the Caffe tutorial works out-of-the-box. It appears that you are trying this - if you need help on that, please be specific: what is "mnisten", include your code, and so on.
Write your own script to convert the images.
This is actually very simple: all you need to do is read the images in your favorite programming language, select the pairs, calculate the labels, and re-save as LMDB.
This is definitely the more flexible way.
Create HDF5 files with multiple Top blobs. This is very simple to do, but will probably be a bit slower than using LMDB.
What you use is up to you - I'd probably go with HDF5, as this is an easy and very flexible way to start.
 How to generate the pairs?
Now, this is the difficult question here. The code from the tutorial just selects random pairs, which is not really optimal, and will make learning rather slow. You don't just need random pairs, you needmeaningful, difficult, but still solvable pairs. How to do that depends entirely on your dataset.
A very sophisticated example is presented, in (Radenović, 2016): they use a Siamese network for learning a representation for image retrieval on buildings. They use a Structure-from-Motion (SfM) algorithm to create a 3-D reconstruction of a building, and then sample image pairs from those reconstructions.
How exactly you create the pairs depends on your data - maybe you are fine with random pairs - maybe you need a sophisticated method.
F. Radenović, G. Tolias, and O. Chum. "CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples". In: European Conference on Computer Vision (ECCV), 2016. arXiv: 1604.02426.

Generating pairs is the most important step in a Siamese network. However, there is a simple way to do that using caffe.
Load the data as separate lmdbs
Create 2 lmdbs data_1 and data_2 using the or convert_imageset.cpp script. Use the same data for both the sets except data_2 containing one image less that data_1.
This will ensure that at each epoc, a different pair of images will be compared, thus allowing us to cover all nC2 combinations(n^2actually)
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
source: "/home/subho/SSD2/data_1/"
batch_size: 8
backend: LMDB
layer {
name: "data_p"
type: "Data"
top: "data_p"
top: "label_p"
data_param {
source: "/home/subho/SSD2/data_2/"
batch_size: 8
backend: LMDB
Introduce a Similarity layer in the prototxt
layer {
name: "sim_check"
type: "Similarity"
bottom: "label"
bottom: "label_p"
top: "sim_check"
propagate_down: false # for each bottom_blob
propagate_down: false
layer {
name: "loss"
type: "ContrastiveLoss"
contrastive_loss_param {
margin: 1.0
bottom: "feat"
bottom: "feat_p"
bottom: "sim_check"
top: "loss"
Create files for Similarity layer
Download files for similarity layer
Place the similarity_layer.cpp in caffe/src/caffe/layers/ and similarity_layers.hpp in caffe/include/caffe/layers/ and rebuild caffe.
cd build
cmake ..
make -j12
If your network does not converge using the above technique, then you should have a look at the following :
Selection of image pairs using hard negatives
Ensuring balance of positive and negative pairs(dissimilar pairs)


How to create Datasets Like MNIST in Pytorch?

I have looked Pytorch source code of MNIST dataset but it seems to read numpy array directly from binaries.
How can I just create train_data and train_labels like it? I have already prepared images and txt with labels.
I have learned how to read image and label and write get_item and len, what really confused me is how to make train_data and train_labels, which is torch.Tensor. I tried to arrange them into python lists and convert to torch.Tensor but failed:
for index in range(0,len(self.files)):
fn, label = self.files[index]
img = self.loader(fn)
if self.transform is not None:
img = self.transform(img)
self.train_data = torch.tensor(train_data)
ValueError: only one element tensors can be converted to Python scalars
There are two ways to go. First, the manual. Torchvision.datasets states the following:
datasets are subclasses of i.e, they have __getitem__ and __len__ methods implemented. Hence, they can all be passed to a which can load multiple samples parallelly using torch.multiprocessing workers.
So you can just implement your own class which scans for all the images and labels, keeps a list of their paths (so that you don't have to keep them in RAM) and has the __getitem__ method which given index i reads the i-th file, its label and returns them. This minimal interface is enough to work with the parallel dataloader in
Secondly, if your data directory can be rearranged into either structure, you can use DatasetFolder and ImageFolder pre-built loaders. This will save you some coding and automatically provide support for data augumentation routines from torchvision.transforms.

Iterating through sequence of Images in tensorflow

I have a database with images numbered from 1 till 7500.
I need to feed these images into my model in tensorflow in the following manner:
grab the 1st 100 images, that is, from 1 till 100, then grab another 100 images such that the next batch is from 1 till 101. As well, the following batch is from 2 till 102 and so on...
The purpose for using the following behavior is that I am using a recurrent neural network where the images to be fed are faces detected from a video. Therefore, I need to feed sequences of images such that these images are directly following one another.
Any help is much appreciated!!
I don't have a perfect solution for your question, but this one might help you.
I'm assuming that you are using tfrecords to build inputs because if not, feeding numpy to model doesn't meet this problem.
supporing your image files are list like this ["image_0", ..., "imgae_N"], you can build i-th tf.example with ["image_i", ..., "image_i+100"] as a feature.
After dequeuing, you get a tensor contains the names of there images, and then unstack them, read image content from there image names with tf.read_file and decode them to images with tf.image.decode_image, and concat them back into one tensor and send it to your model as input.

.obj file format - alternates between different data types

I'm writing a method to parse the data in wavefront obj files and I understand the format for the most part, however some things are still a bit confusing to me. For instance, I would have expected most files to list all the vertices first, followed by the texture and normal map coordinates and then the face indices. However, some files that I have opened alternate between these different sections. For instance, one .obj file I have of the Venus de Milo (obtained here: ) starts off with the vertices (v), then does normal coordinates (vn), then faces (f), then defines more vertices, normals and faces again. Why is the file broken up into two sections like this? Why not list all the vertices up front? Is this meant to signify that there are multiple segments to the mesh? If so, how do I deal with this?
Because this is how the file format was designed. There is no requirement for a specific ordering of the data inside the OBJ, so each modelling package writes it in its own way. Here is one brief summary of the file format, if you haven't read this one yet.
That said, the OBJ format is quite outdated and doesn't support animation by default. It is useful for exchanging of static meshes between modelling tools but not much else. If you need a more robust and modern file format, I'd suggest taking a look at the Collada format or the FBX.
not an direct answer but it will be unreadable in comment
I do not use this file-format but mesh segmentation is usually done for these reasons:
more easy management of the model for editing
separation of parts of model with different material or texture properties
mainly to speed up the rendering by cut down unnecessary material or texture switching
if the mesh has dynamically moving parts then they must be separated
Most 3D mesh file formats contains also transform matrix for each mesh part and some even an skeleton hierarchy
Now how to handle segmented meshes:
if your engine supports only unsegmented models then merge all parts together
This will loose all the advantages of segmented mesh. Do not forget to apply transform matrices of sub segments before merging
or you can implement mesh segmentation into your model class
By adding model hierarchy , transform matrices , ...
Now how to handle mixed model fileformat:
scan file for all necessary chunks of data
remember if they are present
also store their size,and start address in file
and do not forget that there may be more that one chunk of the same data type
preallocate space for all data you need
load/merge all data you need
load chunks of data to you model classes or merge it to single model
of course check if all data needed id present like number of points match number of normals or texture coords ...

Simple Multi-Blob Detection of a Binary Image?

If there is a given 2d array of an image, where threshold has been done and now is in binary information.
Is there any particular way to process this image to that I get multiple blob's coordinates on the image?
I can't use openCV because this process needs to run simultaneously on 10+ simulated robots on a custom simulator in C.
I need the blobs xy coordinates, but first I need to find those multiple blobs first.
Simplest criteria of pixel group size should be enough. But I don't have any clue how to start the coding.
PS: Single blob should be no problem. Problem is multiple blobs.
Just a head start ?
Have a look at QuickBlob which is a small, standalone C library that sounds perfectly suited for your needs.
QuickBlob comes with a small command-line tool (csv-blobs) that outputs the position and size of each blob found within the input image:
./csv-blobs white image.png
Here's an example (output image is produced thanks to the tiny Python utility that comes with QuickBlob):
You can go through the binary image labeling the connected parts with an algorithm like the following:
Create a 2D array of ints, labelArray, that will hold the labels of the connected regions and initiate it to all zeros.
Iterate over each binary pixel, p, row by row
A. If p is true and the corresponding value for this position in the labelArray is 0 (unlabeled), assign it to a new label and do a breadth-first search that will add all surrounding binary pixels that are also true to that same label.
The only issue now is if you have multiple blobs that are touching each other. Because you know the size of the blobs, you should be able to figure out how many blobs are in a given connected region. This is the tricky part. You can try doing a k-means clustering at this point. You can also try other methods like using binary dilation.
I know that I am very late to the party, but I am just adding this for the benefipeople who are researching this problem.
Here is a nice description that might fit your needs.

How to represent voxel volume in VTK file format?

I have 3D binary array which represents a volume, where a[x,y,z] = 0 indicates no object and a[x,y,z] = 1 indicates the object region.
I want to save this as a VTK file and view it in ParaView. What is the simplest way to achieve this? Suggestions for other approaches are welcome.
I looked through the VTK file format, but I have not found direct way to achieve what I need, just via other structures.
It seems Paraview accepts raw data
So why not just write out your data in a triple for-loop to raw binary data?
How to open raw data file in Paraview (edit):
Example: Fuel from Uni Tuebingen
open .raw file
properties: Data Scalar Type: unsigned char
properties: Data Extend: 1<tab>64<tab>1<tab>64<tab>1<tab>64
properties: Apply
click on Contour (next to the calculator symbol)
properties: Apply
Now you should see something. From here you can play around a bit.
In VTK itself (i.e. calling from C++) I remember there were some nice volume render algorithms available (ray casting, 2D textures, etc) but I could not find them in paraview right now. Edit: But Robert could (see comment).
