Can I train object segmentation using RLE for individual objects(COCO dataset format and YOLACT) - dataset

I want to train Yolact, I have generated mask images for my data. My question is can I use RLE to define the segmentation for all my individual objects since in every image every object is there exactly once.
Any suggestion will be appreciated.

Related

Meshroom: how to access the final camera parameters?

I am trying to write a script which loads the camera parameters from Meshroom and imports them into a CAD program. My first understanding was that these parameters (position, rotation matrix, focal length etc.) are contained in the JSON-file cameras.sfm in the StructureFromMotion-subdirectory.
After importing these parameters into Rhino3D and comparing the resulting views onto the 3D-mesh with the undistorted photographs in the PrepareDenseScene-directory, I find surprising large discrepancies. The mesh which was the result of the run was good, so I think that the deviation is because of the parameters in cameras.sfm being not the final ones. This assumption is also supported by the fact that the file only contains the focal length as read from the input images' EXIF information and no refined values. So my question is:
How can I access the final camera parameters from the output of Meshroom?
Knowing this would help me a lot for re-building a photogrammetry/CAD pipeline I had previously implemented for VisualSFM + CMPMVS.
Many thanks!
EDIT: As this is my first post, I am not able to create a new tag for Meshroom. Perhaps this could be added by someone else? Thanks!

Images and Filters in OpenCL

Lets say I have an image called Test.jpg.
I just figured out how to bring an image into the project by the following line:
FILE *infile = fopen("Stonehenge.jpg", "rb");
Now that I have the file, do I need to convert this file into a bmp image in order to apply a filter to it?
I have never worked with images before, let alone OpenCl so there is a lot that is going over my head.
I need further clarification on this part for my own understanding
Does this bmp image also need to be stored in an array in order to have a filter applied to it? I have seen a sliding window technique be used a couple of times in other examples. Is the bmp image pretty much split up into RGB values (0-255)? If someone can provide a link on this item that should help me understand this a lot better.
I know this may seem like a basic question to most but I do not have a mentor on this subject in my workplace.
Now that I have the file, do I need to convert this file into a bmp image in order to apply a filter to it?
Not exactly. bmp is a very specific image serialization format and actually a quite complicated one (implementing a BMP file parser that deals with all the corner cases correctly is actually rather difficult).
However what you have there so far is not even file content data. What you have there is a C stdio FILE handle and that's it. So far you did not even check if the file could be opened. That's not really useful.
JPEG is a lossy compressed image format. What you need to be able to "work" with it is a pixel value array. Either an array of component tuples, or a number of arrays, one for each component (depending on your application either format may perform better).
Now implementing image format decoders becomes tedious. It's not exactly difficult but also not something you can write down on a single evening. Of course the devil is in the details and writing an implementation that is high quality, covers all corner cases and is fast is a major effort. That's why for every image (and video and audio) format out there you usually can find only a small number of encoder and decoder implementations. The de-facto standard codec library for JPEG are libjpeg and libjpeg-turbo. If your aim is to read just JPEG files, then these libraries would be the go-to implementation. However you also may want to support PNG files, and then maybe EXR and so on and then things become tedious again. So there are meta-libraries which wrap all those format specific libraries and offer them through a universal API.
In the OpenGL wiki there's a dedicated page on the current state of image loader libraries: https://www.opengl.org/wiki/Image_Libraries
Does this bmp image also need to be stored in an array in order to have a filter applied to it?
That actually depends on the kind of filter you want to apply. A simple threshold filter for example does not take a pixel's surroundings into account. If you were to perform scanline signal processing (e.g. when processing old analogue television signals) you may require only a single row of pixels at a time.
The universal solution of course to keep the whole image in memory, but then some pictures are so HUGE that no average computer's RAM can hold them. There are image processing libraries like VIPS that implement processing graphs that can operate on small subregions of an image at a time and can be executed independently.
Is the bmp image pretty much split up into RGB values (0-255)? If someone can provide a link on this item that should help me understand this a lot better.
In case you mean "pixel array" instead of BMP (remember, BMP is a specific data structure), then no. Pixel component values may be of any scalar type and value range. And there are in fact colour spaces in which there are value regions which are mathematically necessary but do not denote actually sensible colours.
When it comes down to pixel data, an image is just a n-dimensional array of scalar component tuples where each component's value lies in a given range of values. It doesn't get more specific for that. Only when you introduce colour spaces (RGB, CMYK, YUV, CIE-Lab, CIE-XYZ, etc.) you give those values specific colour-meaning. And the choice of data type is more or less arbitrary. You can either use 8 bits per component RGB (0..255), 10 bits (0..1024) or floating point (0.0 .. 1.0); the choice is yours.

.obj file format - alternates between different data types

I'm writing a method to parse the data in wavefront obj files and I understand the format for the most part, however some things are still a bit confusing to me. For instance, I would have expected most files to list all the vertices first, followed by the texture and normal map coordinates and then the face indices. However, some files that I have opened alternate between these different sections. For instance, one .obj file I have of the Venus de Milo (obtained here: http://graphics.im.ntu.edu.tw/~robin/courses/cg03/model/ ) starts off with the vertices (v), then does normal coordinates (vn), then faces (f), then defines more vertices, normals and faces again. Why is the file broken up into two sections like this? Why not list all the vertices up front? Is this meant to signify that there are multiple segments to the mesh? If so, how do I deal with this?
Because this is how the file format was designed. There is no requirement for a specific ordering of the data inside the OBJ, so each modelling package writes it in its own way. Here is one brief summary of the file format, if you haven't read this one yet.
That said, the OBJ format is quite outdated and doesn't support animation by default. It is useful for exchanging of static meshes between modelling tools but not much else. If you need a more robust and modern file format, I'd suggest taking a look at the Collada format or the FBX.
not an direct answer but it will be unreadable in comment
I do not use this file-format but mesh segmentation is usually done for these reasons:
more easy management of the model for editing
separation of parts of model with different material or texture properties
mainly to speed up the rendering by cut down unnecessary material or texture switching
if the mesh has dynamically moving parts then they must be separated
Most 3D mesh file formats contains also transform matrix for each mesh part and some even an skeleton hierarchy
Now how to handle segmented meshes:
if your engine supports only unsegmented models then merge all parts together
This will loose all the advantages of segmented mesh. Do not forget to apply transform matrices of sub segments before merging
or you can implement mesh segmentation into your model class
By adding model hierarchy , transform matrices , ...
Now how to handle mixed model fileformat:
scan file for all necessary chunks of data
remember if they are present
also store their size,and start address in file
and do not forget that there may be more that one chunk of the same data type
preallocate space for all data you need
load/merge all data you need
load chunks of data to you model classes or merge it to single model
of course check if all data needed id present like number of points match number of normals or texture coords ...

pybrain image input to dataset for Neural Network

I'm trying to write a neural network that (after being properly trained) identifies certain road signs and returns a different output for each type of sign.
Before I started to train my network, I noticed on the pybrain website that their datasets are always an array of values, each entry containing an input and a target. The images I have for my NN have been converted to grayscale pixel data (a simple array of numbers). To train each set of data, do I need to somehow add a target value for each pixel? And if so, how would I go about doing that?
QUICK ANSWER
No, you don't need target for every single pixel, you treat pixels from single image as your input data and you add target to that data.
LONG ANSWER
What you trying to do is to solve classification problem. You have image represented by array of numbers and you need to classify it as some class from limited set of classes.
So lets say that you have 2 classes: prohibitions signs (I'm not native speaker, I don't know how you call signs that forbid something), and information signs. Lets say that prohibition signs is our class 1 and information signs is class 2.
Your data set should look like this:
([representation of sign in numbers], class) - single sample
After that, since it's classification problem, I recommend using _convertToOneOfMany() method of DataSet class, to convert your targets into multiple outputs.
I've answered similar question here, go check it out.

Simple Multi-Blob Detection of a Binary Image?

If there is a given 2d array of an image, where threshold has been done and now is in binary information.
Is there any particular way to process this image to that I get multiple blob's coordinates on the image?
I can't use openCV because this process needs to run simultaneously on 10+ simulated robots on a custom simulator in C.
I need the blobs xy coordinates, but first I need to find those multiple blobs first.
Simplest criteria of pixel group size should be enough. But I don't have any clue how to start the coding.
PS: Single blob should be no problem. Problem is multiple blobs.
Just a head start ?
Have a look at QuickBlob which is a small, standalone C library that sounds perfectly suited for your needs.
QuickBlob comes with a small command-line tool (csv-blobs) that outputs the position and size of each blob found within the input image:
./csv-blobs white image.png
X,Y,size,color
28.37,10.90,41,white
51.64,10.36,42,white
...
Here's an example (output image is produced thanks to the show-blobs.py tiny Python utility that comes with QuickBlob):
You can go through the binary image labeling the connected parts with an algorithm like the following:
Create a 2D array of ints, labelArray, that will hold the labels of the connected regions and initiate it to all zeros.
Iterate over each binary pixel, p, row by row
A. If p is true and the corresponding value for this position in the labelArray is 0 (unlabeled), assign it to a new label and do a breadth-first search that will add all surrounding binary pixels that are also true to that same label.
The only issue now is if you have multiple blobs that are touching each other. Because you know the size of the blobs, you should be able to figure out how many blobs are in a given connected region. This is the tricky part. You can try doing a k-means clustering at this point. You can also try other methods like using binary dilation.
I know that I am very late to the party, but I am just adding this for the benefipeople who are researching this problem.
Here is a nice description that might fit your needs.
http://www.mcs.csueastbay.edu/~grewe/CS6825/Mat/BinaryImageProcessing/BlobDetection.htm

Resources