I am a master student (in applied maths) and in one year I will start my master project (which I am willing to continue in a phd thesis).
The question is about the feasibility of my project and how can I improve/modify it !
My project is about sport videos (example: freestyle snowboard).
There are a lot of Professional snowboarders that upload their tricks on the internet (which constitute a huge data basis) and what I want to do is to collect all the videos (I guess it won't be a problem) and try to find a pattern of the tricks (the figure made by the riders). By 'analyse them', I mean create a kind of artificial intelligence that first recognize the trick (I will construct a model for each trick) and then try to give advise on how to improve the trick (by analysing the position you have before the jump and the position of your body in the air).
This AI could be useful for judges in contest and for learning snowboarders.
I tried to imagine how to do it even if I do not have finish my master so this is why I am asking the question here: Is it a totally impossible algorithm (because of the time it would require or else) ? Should I focus on one part of this project (I guess that this project will mix différent topics maybe I should just do one step of my project).
Sorry for the long post, I thank you to read this unusual question and I hope someone will have the answer to my problem.
In a broader perspective, you might want to use video intelligence. A video is just a number of frames or images. These images could be feed to a Convolutional neural network. But the network needs to remember what it saw in the earlier frame. So you need to use Recurrent neural network.
A hybrid of the above networks will be a Deep Convolutional Recurrent neural network.
Place some Conv2D layers and input them some frames of the video.
Add LSTM layers.
Add Dense layers and the output layer.
I need to develop a mobile app (primarily for Android, iOS, and Windows Mobile) for face detection. Obviously, OpenCV is the most well known. However, I'm unsure about the compatibility among the different OS'es. Besies OpenCV, are there other options? 2 key requirements:
-Open source/commercial libraries but must run locally/natively in devices without internet connection so Player Service API would not work
-Capable of tracking multiple faces in motion
Anyone can share their experiences/knowledge in this area? Any pointers greatly appreciated!
You are really pushing the margin a whole lot.
Face detection generally consists of three different areas.
1) Recognizing a face as a face (there is a mouth, a nose, eyes) This is useful for focusing a snapshot.
2) Recognizing facial features, looking for emotion (mouth in a smile) or eye tracking.
3) Facial recognition. Using the system to perform identification by attaching a name to a face.
You want to use a face recognition tool to perform tracking and count people entering a particular place, using a mobile phone.
First tracking is pretty difficult. Its one thing to perform simple face identity in a single frame snap shot. That's pretty easy. The problem is, you may find your frame rates so poor that you can only accommodate 1 frame every three or even every five seconds. That will make it nearly impossible to track and count faces. Counting faces is easy, but what's hard is to determine if that face in the screen was counted previously or is a new person entering the screen.
OpenCV has a whole lot of tools and examples out there for facial recognition, image tracking, etc. I'd strongly recommend playing with OpenCV and test its capabilities. I'd recommend the C/C++ versions (unless you are already a Python programmer) Here's a place to start, a blog entry I wrote a few months ago.
I really like the tutorials from Kyle Hounslow... Look him up on youtube. His videos are well thought out, they are interesting and he provides example code for all his work. Go ahead and watch all of those videos, and repeat all of those examples. Get a feel for what is available in frame rates using a laptop.
The next part of your task is porting stuff from OpenCV to Android/iOS. That's no easy task. I'm sure folks have tried, and I'm sure helpful hints are out there.
I don't mean to dissuade you from an awesome investigation but do note what you want to do is mighty difficult. You will have to invest some time to even determine where all the difficult areas are. And unfortunately you won't know effective frame rates and performance until you build some stuff and try it.
Good luck with the journey.
I remember reading that scenekit has a polygon limit of 200k. However I haven't been able to find the source where I read it, so I have no idea if it's current information or even correct at all.
It may have been a wwdc session, either way what I need to know is;
Is this correct?
Is this limitation on the entire scene or just what is being rendered at any one time?
I don't think SceneKit has a hard limit on the number of polygons it will render. Instead, you'll see a gradual performance falloff as that number (and other measures of CPU and GPU usage) goes up. 200,000 is probably well into that falloff on some devices and perfectly fine on others.
You might be thinking about some of the advice in the WWDC 2014 session "Building a Game with SceneKit". That talk shows several metrics you can use to gauge a SceneKit app's performance and strategies for dealing with different kinds if bottlenecks. I'd recommend watching the video if you haven't already.
What are some of the algorithms involved in detecting user gestures based on skeleton movements? The ones I'm aware of include:
a) Hidden markov models. You define a number of parameters for the HMM such as hand position, elbow angle, etc. to feed into your HMM. And then spend some time training the system, tweaking the parameters, until it can recognize your gestures reliably enough. I believe this is how Wii gestures are generally done. Good example with the kinect here.
b) Connect the dots. If you have a limited vocabulary of gestures, you could set up collision spheres along the path that each hand would normally take. You could have the gesture fail if they do not follow the path quickly enough.
Both methods would probably require a lot of tweaking to get the success/fail rate the way you want. I'm wondering if there's other approaches that I'm not aware of and also what the advantages are of each of these.
Here is Matthew Tang's work on Hand Gesture Recognition. Hope the references in the article also helps.
http://www.stanford.edu/class/ee368/Project_11/Reports/Tang_Hand_Gesture_Recognition.pdf
I was wondering if you creative minds out there could think of some situations or applications in the web environment where Neural Networks would be suitable or an interesting spin.
Edit: Some great ideas here. I was thinking more web centric. Maybe bot detectors or AI in games.
To name a few:
Any type of recommendation system (whether it's movies, books, or targeted advertisement)
Systems where you want to adapt behaviour to user preferences (spam detection, for example)
Recognition tasks (intrusion detection)
Computer Vision oriented tasks (image classification for search engines and indexers, specific objects detection)
Natural Language Processing tasks (document/article classification, again search engines and the like)
The game located at 20q.net is one of my favorite web-based neural networks. You could adapt this idea to create a learning system that knows how to play a simple game and slowly learns how to beat humans at it. As it plays human opponents, it records data on game situations, the actions taken, and whether or not the NN won the game. Every time it plays, win or lose, it gets a little better. (Note: don't try this with too simple of a game like checkers, an overly simple game can have every possible game/combination of moves pre-computed which defeats the purpose of using the NN).
Any sort of classification system based on multiple criteria might be worth looking at. I have heard of some company developing a NN that looks at employee records and determines which ones are the least satisfied or the most likely to quit.
Neural networks are also good for doing certain types of language processing, including OCR or converting text to speech. Try creating a system that can decipher capchas, either from the graphical representation or the audio representation.
If you screen scrap or accept other sites item sales info for price comparison, NN can be used to flag possible errors in the item description for a human to then eyeball.
Often, as one example, computer hardware descriptions are wrong in what capacity, speed, features that are portrayed. Your NN will learn that generally a Video card should not contain a "Raid 10" string. If there is a trend to add Raid to GPUs then your NN will learn this over time by the eyeball-er accepting an advert to teach the NN this is now a new class of hardware.
This hardware example can be extended to other industries.
Web advertising based on consumer choice prediction
Forecasting of user's Web browsing direction in micro-scale and very short term (current session). This idea is quite similar, a generalisation, to the first one. A user browsing Web could be proposed with suggestions with other potentially interesting websites. The suggestions could be relevance-ranked according to prediction calculated in real-time during user's activity. For instance, a list of proposed links or categories or tags could be displayed in form of a cloud and font size indicates rank score. Each and every click a user makes is an input to the forecasting system, so the forecast is being constantly refined to provide user with as much accurate suggestions, in terms of match against user's interest, as possible.
Ignoring the "Common web problems" angle request but rather "interesting spin" view.
One of the many ways that a NN can be viewed/configured, is as a giant self adjusting, multi-input, multi-output kind of case flow control.
So when you want to offer match ups that are fuzzy, (not to be confused directly with fuzzy logic per se, which is another area of maths/computing) NN may offer a usable alternative.
So to save energy, you offer a lift club site, one-offs or regular trips. People enter where they are, where they want to go and at what time. Sort by city and display in browse control.
Using a NN you could, over time, offer transport owners to transport seekers by watching what owners and seekers link up. As a owner may not live in the same suburb that a seeker resides. The NN learns over time what variances in owners, seekers physical location difference appear to be acceptable. So it can then expand its search area when offering a seekers potential owners.
An idea.
Search! Recognize! Classify! Basically everything search engines do nowadays could benefit from a dose of neural networks and fuzzy logic. This applies in particular to multimedia content (e.g. content-indexing images and videos) since that's where current search technologies are lagging behind.
One thing that always amazes me is that we still don't have any pseudo-intelligent firewalling technology. Something that says "hey his range of urls is making too much requests when they are not supposed to", blocks them, and sends a report to an administrator. That could be done with a neural network.
On the nasty part of things, some virus makers could find lucrative uses to neural networks. Adaptative trojans that "recognized" credit card numbers on a hard drive (instead of looking for certain cookies) or that "learn" how to mask themselves from detectors automatically.
I've been having fun trying to implement a bot based on a neural net for the Diplomacy board game, interacting via DAIDE protocols. It turns out to be extremely tricky, so I've turned to XCS to simplify the problem.
Suppose EBay used neural nets to predict how likely a particular item was to sell; predict what the best day to list items of that type would be, suggest a starting price or "buy it now price"; or grade your description based on how likely it was to attract buyers? All of those could be useful features, if they worked well enough.
Neural net applications are great for representing discrete choices and the whole behavior of how an individual acts (or how groups of individuals act) when mucking around on the web.
Take news reading for instance:
Back in the olden days, you picked up usually one newspaper (a choice), picked a section (a choice), scanned a page and chose an article (a choice), and read the basics or the entire article (another choice).
Now you choose which news site to visit and continue as above, but now you can drop one paper, pick up another, click on ads, change sections, and keep going with few limits.
The whole use of the web and the choices people make based on their demographics, interests, experience, politics, time of day, location, etc. is a very rich area for NN application. This is especially relevant to news organizations, web page design, ad revenue, and may even be an under explored area.
Of course, it's very hard to predict what one person will do, but put 10,000 of them that are the same age, income, gender, time of day, etc. together and you might be able to predict behavior that will lead to better designs. Imagine a newspaper (or even a game) that could be scaled to people's needs based on demographics. An ad man's dream !
How about connecting users to the closest DNS, and making sure there are as few bounces as possible between the request and the destination?
Friend recommendation in social apps (Linkedin,facebook,etc)