What are some of the algorithms involved in detecting user gestures based on skeleton movements? The ones I'm aware of include:
a) Hidden markov models. You define a number of parameters for the HMM such as hand position, elbow angle, etc. to feed into your HMM. And then spend some time training the system, tweaking the parameters, until it can recognize your gestures reliably enough. I believe this is how Wii gestures are generally done. Good example with the kinect here.
b) Connect the dots. If you have a limited vocabulary of gestures, you could set up collision spheres along the path that each hand would normally take. You could have the gesture fail if they do not follow the path quickly enough.
Both methods would probably require a lot of tweaking to get the success/fail rate the way you want. I'm wondering if there's other approaches that I'm not aware of and also what the advantages are of each of these.
Here is Matthew Tang's work on Hand Gesture Recognition. Hope the references in the article also helps.
http://www.stanford.edu/class/ee368/Project_11/Reports/Tang_Hand_Gesture_Recognition.pdf
Related
I need to develop a mobile app (primarily for Android, iOS, and Windows Mobile) for face detection. Obviously, OpenCV is the most well known. However, I'm unsure about the compatibility among the different OS'es. Besies OpenCV, are there other options? 2 key requirements:
-Open source/commercial libraries but must run locally/natively in devices without internet connection so Player Service API would not work
-Capable of tracking multiple faces in motion
Anyone can share their experiences/knowledge in this area? Any pointers greatly appreciated!
You are really pushing the margin a whole lot.
Face detection generally consists of three different areas.
1) Recognizing a face as a face (there is a mouth, a nose, eyes) This is useful for focusing a snapshot.
2) Recognizing facial features, looking for emotion (mouth in a smile) or eye tracking.
3) Facial recognition. Using the system to perform identification by attaching a name to a face.
You want to use a face recognition tool to perform tracking and count people entering a particular place, using a mobile phone.
First tracking is pretty difficult. Its one thing to perform simple face identity in a single frame snap shot. That's pretty easy. The problem is, you may find your frame rates so poor that you can only accommodate 1 frame every three or even every five seconds. That will make it nearly impossible to track and count faces. Counting faces is easy, but what's hard is to determine if that face in the screen was counted previously or is a new person entering the screen.
OpenCV has a whole lot of tools and examples out there for facial recognition, image tracking, etc. I'd strongly recommend playing with OpenCV and test its capabilities. I'd recommend the C/C++ versions (unless you are already a Python programmer) Here's a place to start, a blog entry I wrote a few months ago.
I really like the tutorials from Kyle Hounslow... Look him up on youtube. His videos are well thought out, they are interesting and he provides example code for all his work. Go ahead and watch all of those videos, and repeat all of those examples. Get a feel for what is available in frame rates using a laptop.
The next part of your task is porting stuff from OpenCV to Android/iOS. That's no easy task. I'm sure folks have tried, and I'm sure helpful hints are out there.
I don't mean to dissuade you from an awesome investigation but do note what you want to do is mighty difficult. You will have to invest some time to even determine where all the difficult areas are. And unfortunately you won't know effective frame rates and performance until you build some stuff and try it.
Good luck with the journey.
I am trying to implement face detection in OpenCv. But using Haar Cascades it becomes very slow and cannot be a real time. I heard about SURF.
Can anyone help me to implement fast face detection using SURF or other method?
If you are looking for usage example of SURF, take a look at samples/c/find_obj.cpp. However, I doubt that it will work faster than Haar Cascade classifier. Cascade classifier uses quite simple features - just rectangular regions of an image, and SURF is much more complicated.
You can also try other algorithms starting from very simple but inefficient Eigenfaces and ending with complicated but full-featured Active Appearance Models (see list of implementations on Wiki's page). Though it will require a lot programming and is still unlikely to beat Cascade classifier results. So, I would suggest reconsidering other parts of the system. For example, I believe it is possible to detect faces in background thread and show it with little delay. Also if you want to use it for head/face tracking, you can run detector only for some region, close to the previous location of face.
I am writing bot for one rts game.
I am using fuzzy logic to evaluate current position (mine and enemies') and to issue commands.
I have couple fuzzy variables: military_buildings, civilian_building, army_power, enemy_power and distance. I also have couple fuzzy linguistic values like VERY_GOOD, GOOD, NORMAL, BAD, VERY_BAD.
My next task is to make bots to learn, to avoid to all behave on same way. Any advice or idea how to solve this?
To use GA for tuning parameters (but I don't know ratings of players so I don't know if bot wins over a weak player or loses to a strong player).
Does anyone have experience with similar problems (I can change implementation and replace fuzzy logic if there is easier way to learn bots from experience)?
Have a look at reinforcement learning. Here are a quick preview and a book that can help you.
Based on your description, this is what I'd use :)
The idea of using GAs to tune the parameters to Fuzzy Linguistic Variables is a good one (I wish I thought of it!); the fuzzy logic gives you a nice continuous response curve while the GA will search through a large solution space. I think it's definitely a strategy worth pursuing; you should write up your results.
If I were you I would look at the AIIDE annual Starcraft Competition, it is sponsored in part by AAAI so there are some really high quality approaches to this problem. In particular if you are concerned with higher-level reasoning like resource management etc. Starcraft Competition Site Also, the competitors source code is all available open source so if you want to check out some other techniques I recommend it. FYI, most of the top competitors for this type of problem have historically used some variant of a Probabilistic State Machine Paper on Probabilistic FSMs, so this may make a good test bed for parameter tuning. FYI this is also the approach that some of the top Game AI middleware software uses for Game AI, like XAIT.
Related to my previous question, is there some realistic chance to extract surveillance camera positions out of google streetview pictures by means of computer vision algorithms? I'm no expert in that area. But it should be easier than face detection and the like.
I think you're wrong about it being an easier problem than face recognition (though I suspect you mean face detection).
Consider that faces are of a reasonably regular shape, generally have 2 eyes a nose and a mouth in a specific configuration whilst surveillance cameras from one manufacturer will look different from those of another and look different from different angles.
With faces, if you can't see the person's face you're not interested in it, but in your scenario you're interested in detecting the camera regardless of it's relative position to you.
Whilst not impossible (i.e. humans can do it!) I don't think computer science is quite up to the task just yet.....
This sounds like the class of problem for which Amazon's Mechanical Turk was invented. I don't believe that an image processing or image recognition algorithm is within our current understanding and hardware/software capabilities.
I definitely agree with Rob that extracting the camera locations is going to be more difficult than face detection (or even recognition).
How about a different tack on your question: how to find the location of the camera taking surveillance images.
There are standard (if complicated) photogrammetry techniques to map 2D or 3D coordinates of objects using photographs from multiple cameras or a single camera at multiple angles. What you're looking for would be "reverse photogrammetry" which I haven't seen before, but this interesting legal anecdote suggests it's feasible.
When desinging UI for mobile apps in general which resolution could be considered safe as a general rule of thumb. My interest lies specifically in web based apps. The iPhone has a pretty high resolution for a hand held, and the Nokia E Series seem to oriented differently. Is 240×320 still considered safe?
Not enough information...
You say you're targeting a "Mobile App" but the reality is that mobile could mean anything from a cell phone with 128x128 resolution to a MID with 800x600 resolution.
There is no "safe" resolution for such a wide range, and if you're truly targeting all of them you need to design a custom interface for each major resolution. Add some scaling factors in and you might be able to cut it down to 5-8 different interface designs.
Further, the UI means "User Interface" and includes a lot more than just the resolution - you can't count on a touchscreen, full keyboard, or even software keys.
You need to either better define your target, or explain your target here so we can better help you.
Keep in mind that there are millions of phone users that don't have PDA resolutions, and you can really only count on 128x128 or better to cover the majority of technically inclined cell phone users (those that know there's a web browser in their phone, nevermind those that use it).
But if you're prepared to accept these losses, go ahead and hit for 320x240 and 240x320. That will give you most current PDA phones and up (older blackberries and palm devices had smaller square orientations). Plan on spending time later supporting lower resolution devices and above all...
Do not tie your app to a particular resolution.
Make sure your app is flexible enough that you can deploy new UI's without changing internal application logic - in other words separate the presentation from the core logic. You will find this very useful later - the mobile world changes daily. Once you gauge how your app is being used you can, for instance, easily deploy an iPhone specific version that is pixel perfect (and prettier than an upscaled 320x240) in order to engage more users. Being able to do this in a few hours (because you don't have to change the internals) is going to put you miles ahead of the competition if someone else makes a swipe at your market.
-Adam
Right now I believe it would make sense for me to target about 2 resolutions and latter learn my customers best needs through feedback?
It's a chicken and egg problem.
Ideally before you develop the product you already know what your customers use/need.
Often not even the customers know what they need until they use something (and more often than not you find out what they don't need rather than what they need).
So in this case, yes, spend a little bit of time developing a prototype app that you can send out there to a few people and get feedback. They will have better feedback because they can try it out, and you will have a springboard to start from. The ability to quickly release UI updates without changing core logic will allow you test several interfaces quickly without a huge time investment.
Further, to customers you will seem really responsive to their needs, which will be a big benefit to people who's jobs depend on reaction time.
-Adam
You mentioned Web based apps. Any particular framework you have in mind?
In many cases, WALL seems to help to large extent.
Here's one Article, Adapting to User Devices Using Mobile Web Technology exploiting WALL.