I have a database with images numbered from 1 till 7500.
I need to feed these images into my model in tensorflow in the following manner:
grab the 1st 100 images, that is, from 1 till 100, then grab another 100 images such that the next batch is from 1 till 101. As well, the following batch is from 2 till 102 and so on...
The purpose for using the following behavior is that I am using a recurrent neural network where the images to be fed are faces detected from a video. Therefore, I need to feed sequences of images such that these images are directly following one another.
Any help is much appreciated!!
I don't have a perfect solution for your question, but this one might help you.
I'm assuming that you are using tfrecords to build inputs because if not, feeding numpy to model doesn't meet this problem.
supporing your image files are list like this ["image_0", ..., "imgae_N"], you can build i-th tf.example with ["image_i", ..., "image_i+100"] as a feature.
After dequeuing, you get a tensor contains the names of there images, and then unstack them, read image content from there image names with tf.read_file and decode them to images with tf.image.decode_image, and concat them back into one tensor and send it to your model as input.
Related
I have been searching all over the internet for a way to extract a meaningful page structure from an uploaded document (headlines/titles and paragraphs). The document could be of any format but I'm currently testing with PDF.
Example of what I'm trying to do:
Upload PDF file client-side
Save it to S3
Request AWS textract to detect or analyze text in that S3 object
Classify the output into: Headlines and Paragraphs
My application is working fine until step 3 and AWS textract outputs the result as blocks, block types can be either page, line or words and each block has a Geometry object which includes bounding box details and Polygon object as well (More info here: AnalayzeCommandOutput(JS_SDK) and AnalayzeCommandOutput(General)
However, I still need to process the output and classify it into headlines (e.g. 1 block of type line could be a headline and the following 3 blocks of type line are a single paragraph) so the output of step 4 would be:
{
"Headlines": ["Headline1", "Headline2", "Headline3"],
"Paragraphs": [{"Paragraph": "Paragraph1", "Headline": "Headline1"}, {"Paragraph": "Paragraph2", "Headline": "Headline1}
The unsuccessful methods I tried:
Calculate the size of bounding box of a line relative to the page size and comparing it the average bounding box sizes if it's greater then it's a headline if it's smaller than or equal it's a paragraph (not practical)
Use other PDF parsers but most of them just output unformatted text
Use the "Query" option of analyze document input but it would require to define each line in the PDF as key value pairs to output something meaningful. As per here So the PDF content would be something like:
Headline1: Headline
Paragraph1: Paragraph
Paragraph2: Paragraph
Headline2: Headline
Paragraph1: Paragraph
I'm not asking for a coding solution. Maybe I'm overcomplicating things and there is a simpler way to do it. Maybe someone has tried something similar and can point me into the right direction or approach.
I am trying to write a script which loads the camera parameters from Meshroom and imports them into a CAD program. My first understanding was that these parameters (position, rotation matrix, focal length etc.) are contained in the JSON-file cameras.sfm in the StructureFromMotion-subdirectory.
After importing these parameters into Rhino3D and comparing the resulting views onto the 3D-mesh with the undistorted photographs in the PrepareDenseScene-directory, I find surprising large discrepancies. The mesh which was the result of the run was good, so I think that the deviation is because of the parameters in cameras.sfm being not the final ones. This assumption is also supported by the fact that the file only contains the focal length as read from the input images' EXIF information and no refined values. So my question is:
How can I access the final camera parameters from the output of Meshroom?
Knowing this would help me a lot for re-building a photogrammetry/CAD pipeline I had previously implemented for VisualSFM + CMPMVS.
Many thanks!
EDIT: As this is my first post, I am not able to create a new tag for Meshroom. Perhaps this could be added by someone else? Thanks!
I have a dataset that I have created of gray scale images which i want to use with the siamese network example in caffe in which the documentation uses mnist dataset. I want to replace the mnist dataset with my own dataset
I see that for doing this I need my dataset to be in the format required by the siamese netwrk. This can be created using the 'create_mnist_siamese.sh' which loads the mnist dataset in the idx3-ubyte format and creates a dataset lmdb database with two images and a matching/non matching label in each location of the lmdb database.
So I figured for me to use the 'create_mnist_siamese.sh' script, my dataset also needs to be in the idx-ubyte format. I tried to convert my dataset to the idx-ubyte format using 'mnisten'. However i get the error 'error:total images are less than num_tests'. I guess the script is not identifying my images. The folder structure of the dataset is like this:
parent-directory
- subfolder
- subfolder
.
.
.
-txt file
parent directory name - 'generated dataset'
subfolders - 1 ,2 ,3 ... (the subfolders are titled 1 - 30 as I want to label the data in each subfolder by the name of the subfolder)
The txt file contains image title on each row with the class label.
How do I work with my dataset on the siamese network in caffe? Is there a direct way to convert my dataset to the lmdb format for the siamese network? Or do I have to use mnisten? If I do then how do I fix my error ? Anu help will be much appreciated. Thanks.
You don't need to use the exact same format - this is just a tutorial.... All you need to do is provide one or multiple data layers, with a total of three top Blobs: data, data_p, and sim. You can do that in any way you'd like, e.g. LMDB (like in the MNIST example), HDF5, or whatever.
General explanation
In the tutorial, they further show and easy way to load the image pairs: you concatenate two images in the channel dimension. For gray-scale, you take two input images, where each has for example the dimension [1, 1, 28, 28] (i.e. 1 image, 1 channel, 28x28 resolution). Then you concatenate them to be one image of size [1, 2, 28, 28] and save them e.g. to an LMDB.
In the network, the first step after loading the data is a "Slice" layer, which takes this image, and slices it (i.e. it splits it up) along that axis, thus creating two Top blobs, data and data_p.
How to create the data files?
There is no single right way to do that. The code from the tutorial is only for the MNIST set, so unless you have the exact same format, you can't use it without changes. You have a couple of possibilities:
Convert your images to the MNIST-format. Then, the code from the Caffe tutorial works out-of-the-box. It appears that you are trying this - if you need help on that, please be specific: what is "mnisten", include your code, and so on.
Write your own script to convert the images.
This is actually very simple: all you need to do is read the images in your favorite programming language, select the pairs, calculate the labels, and re-save as LMDB.
This is definitely the more flexible way.
Create HDF5 files with multiple Top blobs. This is very simple to do, but will probably be a bit slower than using LMDB.
What you use is up to you - I'd probably go with HDF5, as this is an easy and very flexible way to start.
How to generate the pairs?
Now, this is the difficult question here. The code from the tutorial just selects random pairs, which is not really optimal, and will make learning rather slow. You don't just need random pairs, you needmeaningful, difficult, but still solvable pairs. How to do that depends entirely on your dataset.
A very sophisticated example is presented, in (Radenović, 2016): they use a Siamese network for learning a representation for image retrieval on buildings. They use a Structure-from-Motion (SfM) algorithm to create a 3-D reconstruction of a building, and then sample image pairs from those reconstructions.
How exactly you create the pairs depends on your data - maybe you are fine with random pairs - maybe you need a sophisticated method.
Literature:
F. Radenović, G. Tolias, and O. Chum. "CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples". In: European Conference on Computer Vision (ECCV), 2016. arXiv: 1604.02426.
Generating pairs is the most important step in a Siamese network. However, there is a simple way to do that using caffe.
Load the data as separate lmdbs
Create 2 lmdbs data_1 and data_2 using the create_imagenet.sh or convert_imageset.cpp script. Use the same data for both the sets except data_2 containing one image less that data_1.
This will ensure that at each epoc, a different pair of images will be compared, thus allowing us to cover all nC2 combinations(n^2actually)
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
data_param{
source: "/home/subho/SSD2/data_1/"
batch_size: 8
backend: LMDB
}
}
layer {
name: "data_p"
type: "Data"
top: "data_p"
top: "label_p"
data_param {
source: "/home/subho/SSD2/data_2/"
batch_size: 8
backend: LMDB
}
}
Introduce a Similarity layer in the prototxt
layer {
name: "sim_check"
type: "Similarity"
bottom: "label"
bottom: "label_p"
top: "sim_check"
propagate_down: false # for each bottom_blob
propagate_down: false
}
layer {
name: "loss"
type: "ContrastiveLoss"
contrastive_loss_param {
margin: 1.0
}
bottom: "feat"
bottom: "feat_p"
bottom: "sim_check"
top: "loss"
}
Create files for Similarity layer
Download files for similarity layer
Place the similarity_layer.cpp in caffe/src/caffe/layers/ and similarity_layers.hpp in caffe/include/caffe/layers/ and rebuild caffe.
cd build
cmake ..
make -j12
Note
If your network does not converge using the above technique, then you should have a look at the following :
Selection of image pairs using hard negatives
Ensuring balance of positive and negative pairs(dissimilar pairs)
I'm trying to write a neural network that (after being properly trained) identifies certain road signs and returns a different output for each type of sign.
Before I started to train my network, I noticed on the pybrain website that their datasets are always an array of values, each entry containing an input and a target. The images I have for my NN have been converted to grayscale pixel data (a simple array of numbers). To train each set of data, do I need to somehow add a target value for each pixel? And if so, how would I go about doing that?
QUICK ANSWER
No, you don't need target for every single pixel, you treat pixels from single image as your input data and you add target to that data.
LONG ANSWER
What you trying to do is to solve classification problem. You have image represented by array of numbers and you need to classify it as some class from limited set of classes.
So lets say that you have 2 classes: prohibitions signs (I'm not native speaker, I don't know how you call signs that forbid something), and information signs. Lets say that prohibition signs is our class 1 and information signs is class 2.
Your data set should look like this:
([representation of sign in numbers], class) - single sample
After that, since it's classification problem, I recommend using _convertToOneOfMany() method of DataSet class, to convert your targets into multiple outputs.
I've answered similar question here, go check it out.
If there is a given 2d array of an image, where threshold has been done and now is in binary information.
Is there any particular way to process this image to that I get multiple blob's coordinates on the image?
I can't use openCV because this process needs to run simultaneously on 10+ simulated robots on a custom simulator in C.
I need the blobs xy coordinates, but first I need to find those multiple blobs first.
Simplest criteria of pixel group size should be enough. But I don't have any clue how to start the coding.
PS: Single blob should be no problem. Problem is multiple blobs.
Just a head start ?
Have a look at QuickBlob which is a small, standalone C library that sounds perfectly suited for your needs.
QuickBlob comes with a small command-line tool (csv-blobs) that outputs the position and size of each blob found within the input image:
./csv-blobs white image.png
X,Y,size,color
28.37,10.90,41,white
51.64,10.36,42,white
...
Here's an example (output image is produced thanks to the show-blobs.py tiny Python utility that comes with QuickBlob):
You can go through the binary image labeling the connected parts with an algorithm like the following:
Create a 2D array of ints, labelArray, that will hold the labels of the connected regions and initiate it to all zeros.
Iterate over each binary pixel, p, row by row
A. If p is true and the corresponding value for this position in the labelArray is 0 (unlabeled), assign it to a new label and do a breadth-first search that will add all surrounding binary pixels that are also true to that same label.
The only issue now is if you have multiple blobs that are touching each other. Because you know the size of the blobs, you should be able to figure out how many blobs are in a given connected region. This is the tricky part. You can try doing a k-means clustering at this point. You can also try other methods like using binary dilation.
I know that I am very late to the party, but I am just adding this for the benefipeople who are researching this problem.
Here is a nice description that might fit your needs.
http://www.mcs.csueastbay.edu/~grewe/CS6825/Mat/BinaryImageProcessing/BlobDetection.htm