How to visualize large dataset ? Such as ucf101, hmdb51? - dataset

Nowadays, I want to visualize some video datasets. However, I don't know which tool I should select. Tensorboard ? But I think it can't solve my problem, because it is based on traditional machine learning mathod, such as pca. So is there any more tool for me to visualize large dataset , such as a common video dataset named ucf101, about 6 G ?

I've been building Sieve recently for exactly this. Upload your data and it splits it up into individual frames, tags them with information about the people, objects, and more - and then lets you visualize + export samples super quickly using the dashboard. You can request access for an API key for free if it's for personal use.

Related

Any examples of using a Wandsearcher in vespa ? (After a weighted set query)

Currently i am using the REST interface to query vespa, which seems to work great but something tells me that i should be using searchers in the application to make the client(server side code) a bit lighter (bundle the jar file in the application package) to make it a bit smoother. I have managed to do some simple searcher/processor applications. But this is a bit overwhelming.
So are there any readily available examples ?
Basicially i want to:
Send to /search?query=someId
Do a ordinary search for the weighted set on this documentID (I guess this one can be handy: https://docs.vespa.ai/documentation/reference/inspecting-structured-data.html)
Take those items in the response and add it to a wand item(s) and query for a wand with wandsearcher on a given field. Similar to the yql:
"select * from sources * where wand(interest, some weightedsets));","ranking":"combined_score" and return the matches.
Just curious also, apart from the trouble of string building with the http request i am doing at the moment are there any performance gains of using a searcher or go the java route vs rest?
thanks for any insight or code help i can start with.
There is an example of using the WandItem (YQL wand)here https://docs.vespa.ai/documentation/advanced-ranking.html and see also https://docs.vespa.ai/documentation/using-wand-with-vespa.html as there are two wand implementations available in Vespa, it sounds from the description that the wand() is what you want to use for this use case. For the first call you probably want to have a dedicated document summary to reduce the amount of data fetched for your first query and also the option of serving it out of memory only (See https://docs.vespa.ai/documentation/document-summaries.html)
Also see https://docs.vespa.ai/documentation/searcher-development.html as a general resource on writing searchers.
For your use case it makes a lot of sense to write a searcher to perform these two queries as your second query depends on the first and you avoid the cost of rendering/http/yql parsing which might matter if your client is remote with high network latency.

Unbalance and small dataset for image classification

I have two datasets of images for classification :
Damage chocolate packing of 27 images
Undamaged chocolate packing of 161 images
I have to write a python classifier to distinguish both and alarm production.
What is the best way to solve my problem ? a CNN with transfer learning or a K-mean solution ?
It looks like you have really few data. For me, transfer learning is the best way to tackle this. However, try to find more data or use data augmentation (you can read this).
Also, if you are looking for transfer learning with CNN, I think this tutorial can help you.

Custom Searcher - Blending of hits from different sources

We have a need for "Blending of hits from different sources", as per your documentation it is recommended to write a custom-searcher in JAVA. Is there a demo of this written somewhere on Github ? I wouldn't even know where to start :( I understand I can create search "chains" , preferably Asynchronous, and then blend results in JAVA before returning them...but then how would I handle paginations, limits...etc ? This all seems very complicated, for someone who doesn't even know JAVA that much. So, I am hoping someone has already written a demo for this ? Please ? Anyone ?
Thank you so much
EDIT to make my quesion clearer:
We are writing a search engine that fetches data from various websites. Some websites have 10mil indexable items, other websites only 100,000. When we present the results to end user, we want to include results from all our sources ( when match applies ). Let's say 10 results from each of the websites we crawl, so that they all get equal amount of attention on page. If we don't do custom blending, what happens is that the largest website with most items wins all our traffic.
I understand that we can send 10 separate queries to VESPA, and blend the results in our front end, but that seems very inefficient. Thus, the quesion of "Custome Searcher". Thank you so much !
That documentation covers some very advanced use cases which you do not have. Are your sources different Vespa schemas or content clusters? If so Vespa will by default blend the hits returned from each according to their relevance scores so there's nothing you need to do.
The two other most common use-cases are:
Some (or all) the data sources are external, so you need to write a Searcher component to fetch the external data and turn it into a Result.
You want the data to be blended in some custom way (rather than by relevance score). If so you need to exclude the default blending Searcher (com.yahoo.prelude.searcher.BlendingSearcher) and write your own.
If you provide some more information about your use cases I can give you some code examples.
EDIT: Use grouping to solve the need explained under "EDIT" in the question:
Create a "siteid" field when feeding (e.g in document processing).
Use the grouping expression all(group(siteid) each(max(10) output(summary())))
See http://docs.vespa.ai/documentation/grouping.html

(bing) maps: +5000 pinpoints

I am building a map application with the silverlight bing maps control.
In the map control I want to show all of the subscribed customers.
The amount of customers is somewhere between 5000 and 7000, this means I can't show them all at once. This would result in a crash I guess.
How would you solve this issue?
I've read about events on zoomlevels etc. about tile layers about spatial sql
but I have no idea what the right solution is in this situation and where to begin.
This seems like a pretty basic problem when working with maps but there is little to no information on how to handle lots of data when working with bing maps.
Can anyone explain or point me to a good tutorial?
You can use a space-filling-curve or a spatial-index to get those points nested with the zoom-level of your map application to achieve a cluster effect http://blog.notdot.net/2009/11/Damn-Cool-Algorithms-Spatial-indexing-with-Quadtrees-and-Hilbert-Curves. There are many implementation of sfc and hilbert-curves. I've uploaded my own at phpclasses.org (hilbert-curve, bsd licence) and with a quadkey function for a cluster function. I've succesful implemented it for some customers. The idea is to search for a quadkey from left to right to get only a portion of pois. www.maptiler.org uses a quadkey with a z-curve. Probably you are getting better answers at gis.stackexchange. A sfc has usually a constraint of power of 2.

Google maps perfomance with polylines and markers

We are on decision point - which technology will be used for our highly loaded flight deals map.
There is simple test - http://buruki.com/gmap but if i choose London or Moscow( they have ~200-300 flights destinations) most of browsers( firefox 3.5 and IE for sure :-) ) are extremely slow.
Now there are simple markers and simple polylines, MarkerManager or other things are not use.
I would like to ask gmap experts - is it possible to have almost immediate response time with ~200-300 polylines and markers on map. If yes - any live examples from existing projects.
PS we already have silverlight( http://buruki.com/map ) implementation, it has great speed and great disadvantages :-( - plugin is required, linux users are out of bossiness. Is it possible to achieve same speed(or close) as silverlight has with gmaps?
Answer : yes it is possible not only for 200-300 but also you can add more then 500. I worked on one of the airline site same like your and i attached different markers and polyline more then 200 within a 5 milisecond with google map V3. I made an javascript file for that and an array for whole data with lat long after that i used for loop for placing a marker images with lat long (which data is comes from an array). Till now no any bug appear in that and every thing with heavy data working fine. thanks

Resources