Which is the best SIFT feature to use in a particular scenario? - sift

I know about SIFT features of an image. But The ImageClef challenge mentions about various SIFT feature descriptors - SIFT, C-SIFT, RGB-SIFT and OPPONENT-SIFT.
I am unable to find any resource which mentions about how these feature descriptors are different. How does each of these feature descriptors differ and how to determine which feature is good for a particular application?

Related

Database of scientific paper abstracts

I am trying to find a database with scientific papers which will allow me to:
1. Get metadata of papers by doi (including abstracts);
2. Do this stuff regularly (e.g. daily updated);
3. Ability to download whole existing database.
I know about Crossref API, however, only 3% of all publications presented have abstract (and none of biggest publishers like Springer or Elsevier provide them). On the other side I see some projects like Dimensions or Researcher which already implemented mentioned functionality. So the question is: does somebody know such services (possibly not free) and had experience working with them?
Have you looked at Semantic Scholar (https://www.semanticscholar.org/)? They have an API that supports the first of your requirements (http://api.semanticscholar.org/) and also provide the "Open Research Corpus" (http://labs.semanticscholar.org/corpus/) which should satisfy your third requirement. It is a smaller database than what is provided by Scopus or Web of Science, but both of those require subscriptions to fully use their APIs and don't (as far as I know) have a real way for you to purchase a full download of the database.

Explanation of the IMAP protocol?

Im looking for information about how the IMAP protocol works. Google yields only high level information, but not enough to understand the details. I'd like to know enough to be able to create my own implementation. I found a c library which does it, but is poorly documented.
Some basic questions are: what are the IMAP uid's and what are their guaruntees? For example, will an id ever change? will it be reused if deleted?
This looks like a good starting point:
http://www.imapwiki.org/ImapRFCList
In general, the keyword you want when searching for details on an internet protocol is "RFC". Add that to your search along with the name of the protocol and you should get off to a good start.
Google yields only high level information, but not enough to understand the details.
Google is a general search engine, and its results will only be as good as the search terms you supplied. If you want to get detailed and definitive technical information about a protocol or standard or programming language, you should start by searching for the specification; i.e. use "specification" as one of your search terms.
I'd like to know enough to be able to create my own implementation. I found a c library which does it, but is poorly documented.
If you've already found an implementation, why would you want to create another? Or even know enough to (hypothetically) create another?
I'm sure there are other open source implementations of IMAP around in various languages.
It is a bit much to expect an implementation of IMAP to be sufficiently well documented as to serve as a specification.
Some basic questions are: what are the IMAP uid's and what are their guaruntees? For example, will an id ever change? will it be reused if deleted?
I expect that these questions can be answered by reading the IMAP specification; see RFC 3501

What are Advantages to Content Repositories (not talking about CMS's)

Given that a lot of people use content repositories. There must be a good reason. I'm building out a new web application that will need to store content. Can someone help me understanding this?
What are the advantages to using a content repository like Apache Jackrabbit as opposed to writing your own code/API to store images or text pages? Writing your own requires time etc. but so too does implementing and learning a new framework like the content repository API. A benefit to rolling your own seems to me that you know your code and have immediate expertise if you need to enhance or fix it. Using another framework you need to learn its foibles, and it is always easier to modify code you know that don't know... i.e. you don't know that underlying framework code as well as your own.
As I said a lot of people use them. There must be a reason. I can't see it as being just another "everyone is using them so, so should we." At least I hope it isn't that. :)
A JCR repository allows you to store all your content (from structured database-type data to large multimedia files) in a single place and with a single API, which is extremely convenient and makes your code simpler, avoiding the impedance mismatch between files and data that you usually have in content-based systems.
JCR also provides a lot of infrastructure functionality that you won't have to build or assemble yourself: search (including full-text), observation (callbacks when something changes) versioning, data types including multi-value, ordered nodes, etc...
If you allow a shameless plug, my "JCR - best of both worlds" article at http://java.dzone.com/articles/java-content-repository-best describes this in more detail and also provides a reading list for the JCR spec, that should allow you go get a good overview without reading the whole thing.
The article uses Apache Sling for its examples, which combined with a JCR repository provides a very nice (IMO, but as a Sling committer I'm biased ;-) platform for content-based applications.
My most recent projects have involved both choices: a custom-built data store (MySQL and image files) wtih a multi-level caching mechanism, and a JCR-based commercial repository.
A few thoughts:
In the short run, a DIY solution offers reduced complexity: you only have to build and learn what you need. And there is at least the opportunity to optimize
the data store for your particular application's needs -- more than likely speed of retrieval, but possibly storage footprint, security, or reliability concerns are foremost for you.
However, in the long run, you're looking at a significant increment of work to extend the home-grown system to a new content type (video, e.g.) or to provide new functionality (maybe,
versioning).
Also, it's difficult to separate the choice of a data store approach from the choice of tools that content providers will use to populate and maintain the data store. You'll have to give
your authors something more than an HTML form with a textarea and a submit button.
This is related to the advantages of standardization: compatibility and interchangeability. If everybody writes his own library and API, there is no compatibility and interchangeability, leading to higher cost.

Organizing lots of file uploads

I'm running a website that handles multimedia uploads for one of its primary uses.
I'm wondering what are the best practices or industry standard for organizing alot of user uploaded files on a server.
Your question is exceptionally broad, but I'll assume you are talking about storage/organisation/hierarchy of the files (rather than platform/infrastructure).
A typical approach for organisation is to upload files to a 3 level hierarchical structure based on the filename itself.
Eg. Filename = "My_Video_12.mpg"
Which would then be stored in,
/M/Y/_/My_Video_12.mpg
Or another example, "a9usfkj_0001.jpg"
/a/9/u/a9usfkj_0001.jpg
This way, you end up with a manageable structure that makes it easy to locate a file's location simply based on its name. It also ensures that directories do not grow to a huge scale and become incredibly slow to access.
Just an idea, but it might be worth being more explicit as to what your question is actually about.
I don't think you are going get any concrete answers unless you give more context and describe what the use-case are for the files. Like any other technology decision, the 'best practice' is always going to be a compromise between the different functional and non-functional requirements, and as such the question needs a lot more context to yield answers that you can go and act upon.
Having said that, here are some of the strategies I would consider sound options:
1) Use the conventions dictated by the consumer of the files.
For instance, if the files are going to be used by a CMS/publishing solution, that system probably has some standardized solution for handling files.
2) Use a third party upload solution. There are a bunch of tools that can help guide you to a solution that solves your specific problem. Tools like Transloadit, Zencoder and Encoding all have different options for handling uploads. Having a look at those options should give you and idea of what could be considered "industry standard".
3) Look at proved solutions, and mimic the parts that fit your use-case. There are open-source solutions that handles the sort of things you are describing here. Have a look at the different plugins to for example paperclip, to learn how they organize files, or more importantly, what abstractions do they provide that lets you change your mind when the requirements change.
4) Design your own solution. Do a spike, it's one of the most efficient ways of exposing requirements you haven't thought about. Try integrating one of the tools mentioned above, and see how it goes. Software is soft, so no decision is final. Maybe the best solution is to just try something, and change it when it doesn't fit anymore.
This is probably not the concrete answer you were looking for, but like I mentioned in the beginning, design decisions are always a trade-off, "best-practice" in one context could be the worst solution in another context :)
Best off luck!
From what I understand you want a suggestion on how to store the files. If is that what you want, I would suggest you to have 2 different storage systems for your files.
The first storage would be a place to store the physical file, like a directory on your server (w/o FTP enabled, accessible or not to browsers, ...) or go for Amazon s3 (aws.amazon.com/en/s3/), Rackspace CloudFiles (www.rackspace.com/cloud/cloud_hosting_products/files/) or any other storage solution (you can even choose dropbox, if you want). All of these options offers APIs to save/retrieve the files.
The second storage would be a database, to index and control the files. On the DB, that could be MySQL, MSSQL or a non-relational database, like Amazon DynamoDB or SimpleSQL, you set the link to you file (http link, the path to the file or anything like this).
Also, on the DB you can control and store any metadata of the file you want and choose one or many #ebaxt's solutions to get it. The metadata can be older versions of the file, the words of a text file, the camera-model and geo-location of a picture, etc. Of course it depends on your needs and how it will be really used. You have a very large number of options, but without more info of what you intend to do is hard to suggest you a solution.
On Amazon tutorials area (http://aws.amazon.com/articles/Amazon-S3?browse=1) you can find many papers about it, like Netflix's Transition to High-Availability Storage Systems, Using the Java Persistence API with Amazon SimpleDB and Petboard: An ASP.NET Sample Using Amazon S3 and Amazon SimpleDB
Regards.

How to read and write extended windows file attributes with win32

I would like to embed some meta data in a windows file.
I came across the concept of extended file attributes, which I believe are used for this very purpose. For example, camera name in jpgs, episode name in avis.
Apart from some very obscure non-documented kernel APIs, I cannot find how to do this in c/c++ using the win32 api.
Extended Attributes are a property of the filesystem, i.e. NTFS. The tags associated with jpegs and AVIs are stored within the file itself. The Win32 API's will only provide you with the EA's from the filesystem, not the ones embedded within the files. You'll have to look into third-party libraries to retrieve the embedded attributes.
In the general case, metadata can be formatted in any way that is easy for your application to access. The RDF specification was created to provide a standard set of metadata capabilities that cover most of the generally useful kinds of information.
However, the problem is always finding a way to store it alongside the real data in a way that doesn't disturb applications that think they know how to handle the format. This can be particularly tricky for well-known formats.
Adobe has done a lot of research on this problem, and is backing a technology they call XMP to achieve a good result. XMP includes metadata in a style closely related to RDF, along with conventions for packing it inside many other file formats, or in side-car files for those cases where there just is no portable way to fit the data inside.
On a Windows system with all files stored on NTFS volumes, it is conceivable that extended attributes and alternate data streams could be used to store metadata. The big issue with this is one of portability. The alternate streams will be lost if the file is copied to media that does not support them, such as any flavor of FAT as well as the file systems used on CDs and DVDs.
This is a serious defect that makes keeping a valid and complete backup of such a file more difficult than is practical for most users.
There are applications that use alternate data streams, but they do so knowing that the value they add can be lost when the file is copied.

Resources