How to create Datasets Like MNIST in Pytorch? - dataset

I have looked Pytorch source code of MNIST dataset but it seems to read numpy array directly from binaries.
How can I just create train_data and train_labels like it? I have already prepared images and txt with labels.
I have learned how to read image and label and write get_item and len, what really confused me is how to make train_data and train_labels, which is torch.Tensor. I tried to arrange them into python lists and convert to torch.Tensor but failed:
for index in range(0,len(self.files)):
fn, label = self.files[index]
img = self.loader(fn)
if self.transform is not None:
img = self.transform(img)
train_data.append(img)
self.train_data = torch.tensor(train_data)
ValueError: only one element tensors can be converted to Python scalars

There are two ways to go. First, the manual. Torchvision.datasets states the following:
datasets are subclasses of torch.utils.data.Dataset i.e, they have __getitem__ and __len__ methods implemented. Hence, they can all be passed to a torch.utils.data.DataLoader which can load multiple samples parallelly using torch.multiprocessing workers.
So you can just implement your own class which scans for all the images and labels, keeps a list of their paths (so that you don't have to keep them in RAM) and has the __getitem__ method which given index i reads the i-th file, its label and returns them. This minimal interface is enough to work with the parallel dataloader in torch.utils.data.
Secondly, if your data directory can be rearranged into either structure, you can use DatasetFolder and ImageFolder pre-built loaders. This will save you some coding and automatically provide support for data augumentation routines from torchvision.transforms.

Related

Issue with writeVideo/ VideoWriter MATLAB

I'm a beginner so sorry in advance for the mistakes.
I have a set of data from a camera recording saved in a 4D array with these dimensions (250x300x10603x12).
The first is the dimensions of the video (pixels). The 10603 are the FrameRatexTime. 12 are the subjects I recorded.
I extract one subject at a time for analysis in this way:
subj1 = data(:,:,:,1);
This brings me to an array containing the frames of subject 1, which I can display with implay.
Now I would like to write a video of this new array and save it in .avi format, I use this code:
v = VideoWriter('subj1.avi')
open(v)
writeVideo(v,subj1)
close(v)
but it keeps giving me this error
Error using VideoWriter/writeVideo (line 410) IMG must be an array of
either grayscale or RGB images.
In fact, looking at the shape of the array, there is nothing that points to a grayscale or RGB index. How can I get a .avi file in this case? Do I have to transform the array?
Why does it still display the video with implay?
clarification: the fact that I have to transform the array into an .avi file is because I will have to analyse it by exporting it to Python with OpenCv.
In fact, if I export the .mat file directly to Python, I can't get the list of Frames.
Matlab's documentation for writeVideo says that for a sequence of grayscale images like you have, it is expecting a "height-by-width-by-1-by-frames" array. You are only passing it "height-by-width-by-frames".
So, you need to reshape your subj1. Maybe try doing it like this:
newsubj = zeros(250, 300, 1, 10603)
newsubj(:,:,1,:) = subj1
and then save newsubj instead of subj1:
writeVideo(v,newsubj)
Finally, I think you may get some lossy compression when you save as an avi, so it may not be the best way to export it from Matlab and importing it to Python.

How to save a tensor to a file in TensorFlow using the low level (C) interface?

Using the low level Operations in TensorFlow, I try to save a Tensor (its actual value) to a disk, but cannot find how to.
If the data is e.g. a UInt8 matrix representing an image then I can easily do EncodeJpeg to create the Content and then use WriteFile to write the generated Content to a file with a given name. Similarly EncodeWav works the same way.
On the other hand, if I just want to save a matrix with numbers, there is no "EncodeData", "TensorToContent" or similar Operation to convert the Tensor into a Content, what can be saved with WriteFile.
I can get the Tensor as an output from my Graph and then save it outside the Graph, but my purpose is to do it inside.
It took me some time, but finally found a solution, though it might not be the best. If you can improve, you're welcome.
So, I use the Save/Restore raw operation pair. It is relatively straight forward, once found.
"Save" has two tensor Inputs, one tensor InputList and one Attribute. The first tensor (scalar string) has to give the filename. The second tensor (1D string) contains the names you want to give to the tensors in the file (for Restore purposes later). The InputList is actually a collection of the tensors to be saved. The Attribute is a list of data types of the tensors to be saved.
"Restore" is also simple. It has two tensor Inputs, one tensor Output, one type Attribute and one optional integer Attribute. The first Input (scalar string) gives the file name or file pattern if more files can be looked at. The second Input (scalar string) gives the name (see above in "Save") of the tensor you want to find in the files. The Output is the loaded tensor itself. The type Attribute specifies what type is the tensor that will be loaded. The optional Attribute can be used if tensors with the same name exist in multiple files matching the pattern, telling which one is preferred.
The only thing to be careful with is the InputList in "Save". When the "Save" raw operation is added to the Graph at design time, the number of inputs in the Inputlist has to be given, as well as the (same number!) of types in the type list argument. This means that at runtime, when giving the actual names of the tensors and giving the tensorlist as the InputList, you cannot change anymore the number and data type of the tensors.

Meshroom: how to access the final camera parameters?

I am trying to write a script which loads the camera parameters from Meshroom and imports them into a CAD program. My first understanding was that these parameters (position, rotation matrix, focal length etc.) are contained in the JSON-file cameras.sfm in the StructureFromMotion-subdirectory.
After importing these parameters into Rhino3D and comparing the resulting views onto the 3D-mesh with the undistorted photographs in the PrepareDenseScene-directory, I find surprising large discrepancies. The mesh which was the result of the run was good, so I think that the deviation is because of the parameters in cameras.sfm being not the final ones. This assumption is also supported by the fact that the file only contains the focal length as read from the input images' EXIF information and no refined values. So my question is:
How can I access the final camera parameters from the output of Meshroom?
Knowing this would help me a lot for re-building a photogrammetry/CAD pipeline I had previously implemented for VisualSFM + CMPMVS.
Many thanks!
EDIT: As this is my first post, I am not able to create a new tag for Meshroom. Perhaps this could be added by someone else? Thanks!

.obj file format - alternates between different data types

I'm writing a method to parse the data in wavefront obj files and I understand the format for the most part, however some things are still a bit confusing to me. For instance, I would have expected most files to list all the vertices first, followed by the texture and normal map coordinates and then the face indices. However, some files that I have opened alternate between these different sections. For instance, one .obj file I have of the Venus de Milo (obtained here: http://graphics.im.ntu.edu.tw/~robin/courses/cg03/model/ ) starts off with the vertices (v), then does normal coordinates (vn), then faces (f), then defines more vertices, normals and faces again. Why is the file broken up into two sections like this? Why not list all the vertices up front? Is this meant to signify that there are multiple segments to the mesh? If so, how do I deal with this?
Because this is how the file format was designed. There is no requirement for a specific ordering of the data inside the OBJ, so each modelling package writes it in its own way. Here is one brief summary of the file format, if you haven't read this one yet.
That said, the OBJ format is quite outdated and doesn't support animation by default. It is useful for exchanging of static meshes between modelling tools but not much else. If you need a more robust and modern file format, I'd suggest taking a look at the Collada format or the FBX.
not an direct answer but it will be unreadable in comment
I do not use this file-format but mesh segmentation is usually done for these reasons:
more easy management of the model for editing
separation of parts of model with different material or texture properties
mainly to speed up the rendering by cut down unnecessary material or texture switching
if the mesh has dynamically moving parts then they must be separated
Most 3D mesh file formats contains also transform matrix for each mesh part and some even an skeleton hierarchy
Now how to handle segmented meshes:
if your engine supports only unsegmented models then merge all parts together
This will loose all the advantages of segmented mesh. Do not forget to apply transform matrices of sub segments before merging
or you can implement mesh segmentation into your model class
By adding model hierarchy , transform matrices , ...
Now how to handle mixed model fileformat:
scan file for all necessary chunks of data
remember if they are present
also store their size,and start address in file
and do not forget that there may be more that one chunk of the same data type
preallocate space for all data you need
load/merge all data you need
load chunks of data to you model classes or merge it to single model
of course check if all data needed id present like number of points match number of normals or texture coords ...

Ansi C dynamic include

I was assigned to edit part of Ansi C application but my knowledge of pure C is just basics. Anyway current situation is I have map1_data1.h, map1_data2.h, map2_data1.h, map2_data2.h and variables in those files are always connected to the map name = map1_structure in map1_data1.h and so on.
In app there is #include for each file and in code then something like
if (game->map == 1){
mapStructure = map1_structure
} else {
mapStructure = map2_structure
}
I have to extend this to be able to load the map dynamicly so something like
void loadMap(int mapId){
mapStructure = map*mapId*_structure // just short for what i want to achieve
}
My first idea to do so was removing map name connection in variables name in map1_data.h and have just structure variable in there. That requires only one header file at time to be loaded and thats where I'm stucked. Havent found any clues to do so on google.
I would like to have it as variable as possible so something like #include "map*mapId*_data1.h" but should be ok to have 1 switch in one place in whole app to decide on what map to be loaded.
One more thing, the app keeps running for more than 1 game = it will load various maps in one run.
Judging from the comments, you have a single type, call it Map, which is a structure type containing a collection of different data types, including 3D arrays and points and so on. You need to have some maps built into the program; later on, you will need to load new maps at runtime.
You have two main options for the runtime loading the maps:
Map in shared object (shared library, dynamically loaded library, aka DLL).
Map in data file.
Of these two, you will choose the data file over the shared object because it is, ultimately, simpler and more flexible.
Shared Object
With option 1, only someone who can compile a shared library can create the new maps. You'd have a 'library' consisting of one or more data objects, which can be looked up by name. On most Unix-like systems, you'd end up using dlopen() to load the library, and then dlsym() to find the symbol name in that library (specifying the name via a string). If it is present in the library, dlsym() will return you a pointer.
In outline:
typedef void *SO_Handle;
const char *path_to_library = "/usr/local/lib/your_game/libmap32.so";
const char *symbol_name = "map32_structure";
SO_Handle lib = dlopen(path_to_library, RTLD_NOW);
if (lib == 0)
...bail out...
map_structure = dlsym(lib, symbol_name);
if (map_structure == 0)
...bail out...
You have to have some way of generating the library name based on where the software is installed and where extensions are downloaded. You also have to have some way of knowing the name of the symbol to look for. The simplest system is to use a single fixed name (map_structure), but you are not constrained to do that.
After this, you have your general map_structure read for use. You can invent endless variations on the theme.
Data file
This is the more likely way you'll do it. You arrange to serialize the map structure into a disk file that can be read by your program. This will contain a convenient representation of the data. You should consider the TLV (type-length-value) encoding scheme, so that you can tell by looking at the type what sort of data follows, and the length tells you how many of them, and the value is the data. You can do this with binary data or with text data. It is easier to debug text data because you can look at and see what's going on. The chances are that the difference in performance between binary and text is small enough (swamped by the I/O time) that using text is the correct way to go.
With a text description of the map, you'd have information to identify the file as being a map file for your game (perhaps with a map format version number). Then you'd have sections describing each of the main elements in the Map structure. You'd allocate the Map (malloc() et al), and then load the data from the file into the structure.

Resources