Problems implementing LSH with SIFT feature - sift

I have a problem in ANN Search when using LSH with SIFT Feature. With some feature detector tool (Sift Demo) or some available dataset, I received descriptors with 128 dimension for an image. But I don't know how to store it into a .mat file (database) and query vector to received k-closest images from query image.
Please help me. Thanks a lot.

Have a look at the Matlab demo provided by David Lowe
You can save the descriptors in .mat file. match.m shows how matching is done.

Related

Batch size for Feed-Forward Neural Network

I have a 100k dataset for ML, how much should I set up the batch size for training, ?
FYI - I am using the train_test_split library to splitting the data into train and test sets.
Thank you!
## Fit network
history = model.fit(X_train, Y_train, epochs = 1500, validation_split=0.2, verbose=1, shuffle=False, batch_size=60)
Wannees, You need to give more details so that others can help you. Such as type of data (numeric or image), application, number of features/ targets, network architecture, etc.
You may learn something from the existing question in this web: How to calculate optimal batch size, https://stats.stackexchange.com/questions/153531/what-is-batch-size-in-neural-network

What is the best way to load a large folder of files into Julia to compare the columns of each file?

New to Julia and programming in general so this is a two part question. Suppose I have a Folder with 3,000 CSV files. Each file is roughly 7,000 x 7. (The number of rows may vary from file to file but the number of columns is constant.) I am trying to read each of these files into an 3000 x N x M tensor or other data structure in julia to compare the outputs by column. (This would mostly involve comparing the sum of the lags in each column vector of each file)
Question 1: What is the most efficient data structure to parse through this data? I would essentially be calculating the max of the sum of the lags of each column for all files. I've been told by a more experienced user that I should be using NamedArrays for this. I was wondering if anyone could provide some insights as to why? Would DataFrames be able to perform similar calculations?
Question 2: Is there an efficient way to read all these files into named arrays? I can read these files into Dataframes with Glob using the following code.
Folder="/Users/Desktop/Data"
Files=glob("*.csv", Folder)
df=DataFrame.(CSV.File.(Files))
But I don't know how to read it into NamedArrays directly. Any insights would be greatly appreciated thanks!

Geometrical extraction from Step file

Can anyone help me the proceedure how to extract geometrical data from the Step file in python? I have already tried but the no result.
Thank you

Multiple small files as input to map reduce

I have lots of small files , say more than 20000.
I want to save time spent on mapper initialization, so is it possible to use just 500 mapper , each processes 40 small files as its input?
I need guidance about how to implement this kind of inputformat if possible , thanks !
BTW, I know I should merge these small files, this step is also needed.
CombineFileInputFormat can be used. It's there in the old and the new MR API. Here is a nice blog entry on how to use it.

Dataset for Apriori algorithm

I am going to develop an app for Market Basket Analysis (using apriori algorithm) and I found a dataset which has more than 90,000 Transaction records .
the problem is this dataset doesn't have the name of items in it and only contains the barcode of the items .
I just start the project and doing research on apriori algorithm , can anyone help me about this case , how is the best way to implement this algorithm using the following dataset ?
these kind of datasets are consider critical information and chain stores will not give you these information but you can generate some sample dataset yourself using SQL Server .
The algorithm is defined independent of the identifiers used for the object. Also, you didn't post the 'following data set' :P If your problem is that the algorithm expects your items to be numbered 0,1,2,... then just scan your data set and map each individual barcode to a number.
If you're interested, there's been some papers on how to represent frequent item sets very efficiently: http://www.google.de/url?sa=t&source=web&cd=1&ved=0CB8QFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.163.4827%26rep%3Drep1%26type%3Dpdf&ei=QdVuTsn7Cc6WmQWD7sWVCg&usg=AFQjCNGDG8etNN2B4GQ52pSNIfQaTH7ajQ&sig2=7r3buh8AcfJmn2CwjjobAg
The algorithm does not need the name of the items.

Resources