Is culling child nodes a valid WURFL search strategy? - mobile

I've implemented a WURFL based detection routine based on a similar strategy to the two phase one outlined at http://wurfl.sourceforge.net/newapi/ .
This is working well but I would like to improve the worst case scenario if I can.
In the worst case scenario, at the moment, every device's user agent string is compared against the current user agent string.
What I'm curious about is how valid it would be to search the tree of devices and cull entire branches where device matches don't a minimum match threshold?
(Obviously ignoring 'root' devices that don't have user agent strings intended for matching)
Do user agent strings tend to follow a general pattern of ever closer matches as one decends down the tree... and thus make the aforemention strategy valid?
... Or are user agent strings a completely random beast in terms of parent verse child device matches and I really am forced to search the entire tree every single time?

WURFL-Pro, the company beside wurfl project, has a dual licensing strategy. You can ask them to obtain a GPL-free library.
For know the wurfl java api implementation you can just browse the source available on svn...or ask to the authors.

Related

How to secure fingerprint data?

Our apartment association is planning to implement, biometrics gate passes (fingerprint turnstile) for all residents. But residents are bothered about the data privacy of the fingerprints that are stored in databases. This data resides in association harddisks, which is intended to access by some contract employees working in our apartment.
How I can make sure, data is secure and not misused/sold?
I found something here, can someone explain how?
Actually, fingerprint template is nothing but the features of the finger like crosses, deltas, parallel lines, curves and etc. So, using fingerprint template provided by the regular attendance/access control machines, you can not generate image of the real fingerprint. The templates are not unique, every time you register, you will get different string all the time. Also only machines will have the matching algorithm, hence only if the user gives the thumb impression to the the machine it can validate. So, you will not have the security threads. If see anything as threat, you can specifically say what thread, we can provide you solution.

Looking for direction in database/platform choice before beginning a project

I am making a web application and am not sure what is the best tech to use. This will be for a cash strapped charity. Ideally, I'd like the software to be as free to cheap as possible. But, there are corners which shouldn't be cut (security).
The site will accept input from users.
The output must be permission based.
Losing control of the information would be beyond disastrous.
I would like for a user to request a filtered sets of data.
I would like to automate finding duplicates, which are allowed in some columns. However, strings that match may in fact not be actual matches, but must be reviewed.
And I would like to facilitate looking for matches manually (as some matches will not be literal matches and would be difficult to predict).
The users will be non-technical, point and click types. So, that may affect security and usability options.
What is the best platform/database/whatever to secure the data and securely deliver the data (or subsets) to users with appropriate permissions.
The easiest and cheapest platform for this would be Linux, however it's entirely up to you to configure it safely.
Containerize what can be and use a whitelist approach rather than a blacklist approach for your input sanitation.
Use prepared statements in you SQL implementation and hash and salt all user passwords.
Use the latest versions and practices for the language you write the application in.
Your question actually is really broad, I'd recommend you check out OWASP ZAP for securing your web application and perhaps even hire a pentester.

Are there any algorithms for creating playlists that don't require a massive database (or that work with a publicly accessible one)?

Is there any algorithm with which I can automatically create a playlist of songs that well with each other -- similarly to services like iTunes Genius -- that a single developer can actually implement? It should either a) not require any sort of remote database of listening habits etc. or b) require such a database, but work with one that is freely available.
i did this, and i used the last.fm database as described by tomasz. i didn't use "related artist" directly, but instead constructed my own relationship graph by comparing tags associated with different artists (this is not the approach suggested by lcfseth btw - i have quite a large range of music and i wanted to explore "natural" connections that might not be common partners in "normal" playlists; also i wasn't sure how uniform the related artists were).
i also used a local database to cache data from last.fm, because calls to the api are rate limited, and i experimented with using other parts of the api to improve / normalize the information i was reading from mp3 tags.
generating a useful graph of related artists was actually quite hard; largely because some nodes in the graph naturally tend to be more important than others. if you don't "even out" the graph then your playlist will keep returning to the "important" artists.
the final result did work well, in that the selection of music had a good balance between "central theme" and variation. but the implementation is not at all polished, the calculation of the graph can take a long time (many hours), the program takes up a fair amount of memory when running, and it still seems to play elvis costello a little more than expected ;o)
if you are interested, the code is at http://code.google.com/p/uykfe/
the best part of all, from my point of view as a user, is that it can update logitech media server (squeezeserver) playlists in "realtime", adding a new track whenever the list is empty. that works really well in continuing from whatever music you select "by hand". it can also generate one-off playlists, of course, and, finally, by tweaking parameters you can get a kind of "random walk" through your music collection - it will play related tunes but slowly drift from one style to another (in fact, this is really the "default" mode - to get it to stay on a single theme i needed extra logic that biased it towards whatever music it had played earlier).
ps also, the dump of the final graph to gephi was really cool - i had it printed out and it's now pinned to the wall...
pps i also experimented with the musicbrainz database, which in theory sounds like a fantastic resource. but in practice it is over-complex and poorly documented.
I don't know iTunes Genius, but I think last.fm database and API might be useful for you. Every time you see any track it shows you a list of similar tracks, based on other users preferencs. The same information can be obtained using track.getSimilar API method.
The idea behind most of these databases, is to see what other users listens to after they listen to a given song. The accuracy of these statistics depends on the number of users therefor it is probably hard to use this locally. The algorithm itself is not that hard to implement.
The alternative would be to sort song based on genre, singer... which are informations that are usually embedded in the songs but not always. Winamp have this feature, but it won't work for old songs, unless you manually set the informations or use an On-line song database.

Save Informations as a Data Net

My aim is to write an intelligent ChatBot. He should save known informations likely to the human brain.
That is why I am looking for a filetype wich stores data as a net of connected keywords. What filetype or database system could reach this?
Further Informations:
The information input will be wikipedia, google search, and facts teached by a human during a conversation.
I could give specific informations about my requirements and wishes but I don't know if there exists even any approach to this. Maybe there are more useful specifications as my thoughts.
Just one example: the connections should have weights. Requesting an information net should increase the weights of the used connections.
What I expect is that the ChatBot could get real associations (or ideas) using the data net.
As an extension to my above comments:
A graph is definitely the way you want to go in terms of data representation...it maps perfectly to your problem description.
What you seem to be asking is how you can [persistently] store this information on disk (rather than memory). That completely depends on what constraints you need. There is a "Graph Database" which is more geared to storing graphs than say relational or hierarchical databases, and would be perform far better than say pushing your adjacency matrix or list to a flat file. Here's the wikipedia entry:
http://en.wikipedia.org/wiki/Graph_database
Now, there is the issue of what happens when you have so many nodes and edges that you can't load them all into memory at once, and unfortunately if you have nodes that are connected to every other node, that can be a problem (because you won't be able to load the complete/valid graph. I can't answer that right now, but I'm sure there are paradigms to address this problem. I will update my answer after some digging.
Edit-You'll probably have to consult someone who knows more about graph databases. It's possible that there are ways to load chunks of the graph from the database without loading the whole thing. If that's what your issue is, you may want to reform a question about working with large graphs stored on graph databases and post it again, tagged with graphs,databases,algorithms, stuff like that, and just post it again in a more specific manner.

Common web problems where Neural Networks could help

I was wondering if you creative minds out there could think of some situations or applications in the web environment where Neural Networks would be suitable or an interesting spin.
Edit: Some great ideas here. I was thinking more web centric. Maybe bot detectors or AI in games.
To name a few:
Any type of recommendation system (whether it's movies, books, or targeted advertisement)
Systems where you want to adapt behaviour to user preferences (spam detection, for example)
Recognition tasks (intrusion detection)
Computer Vision oriented tasks (image classification for search engines and indexers, specific objects detection)
Natural Language Processing tasks (document/article classification, again search engines and the like)
The game located at 20q.net is one of my favorite web-based neural networks. You could adapt this idea to create a learning system that knows how to play a simple game and slowly learns how to beat humans at it. As it plays human opponents, it records data on game situations, the actions taken, and whether or not the NN won the game. Every time it plays, win or lose, it gets a little better. (Note: don't try this with too simple of a game like checkers, an overly simple game can have every possible game/combination of moves pre-computed which defeats the purpose of using the NN).
Any sort of classification system based on multiple criteria might be worth looking at. I have heard of some company developing a NN that looks at employee records and determines which ones are the least satisfied or the most likely to quit.
Neural networks are also good for doing certain types of language processing, including OCR or converting text to speech. Try creating a system that can decipher capchas, either from the graphical representation or the audio representation.
If you screen scrap or accept other sites item sales info for price comparison, NN can be used to flag possible errors in the item description for a human to then eyeball.
Often, as one example, computer hardware descriptions are wrong in what capacity, speed, features that are portrayed. Your NN will learn that generally a Video card should not contain a "Raid 10" string. If there is a trend to add Raid to GPUs then your NN will learn this over time by the eyeball-er accepting an advert to teach the NN this is now a new class of hardware.
This hardware example can be extended to other industries.
Web advertising based on consumer choice prediction
Forecasting of user's Web browsing direction in micro-scale and very short term (current session). This idea is quite similar, a generalisation, to the first one. A user browsing Web could be proposed with suggestions with other potentially interesting websites. The suggestions could be relevance-ranked according to prediction calculated in real-time during user's activity. For instance, a list of proposed links or categories or tags could be displayed in form of a cloud and font size indicates rank score. Each and every click a user makes is an input to the forecasting system, so the forecast is being constantly refined to provide user with as much accurate suggestions, in terms of match against user's interest, as possible.
Ignoring the "Common web problems" angle request but rather "interesting spin" view.
One of the many ways that a NN can be viewed/configured, is as a giant self adjusting, multi-input, multi-output kind of case flow control.
So when you want to offer match ups that are fuzzy, (not to be confused directly with fuzzy logic per se, which is another area of maths/computing) NN may offer a usable alternative.
So to save energy, you offer a lift club site, one-offs or regular trips. People enter where they are, where they want to go and at what time. Sort by city and display in browse control.
Using a NN you could, over time, offer transport owners to transport seekers by watching what owners and seekers link up. As a owner may not live in the same suburb that a seeker resides. The NN learns over time what variances in owners, seekers physical location difference appear to be acceptable. So it can then expand its search area when offering a seekers potential owners.
An idea.
Search! Recognize! Classify! Basically everything search engines do nowadays could benefit from a dose of neural networks and fuzzy logic. This applies in particular to multimedia content (e.g. content-indexing images and videos) since that's where current search technologies are lagging behind.
One thing that always amazes me is that we still don't have any pseudo-intelligent firewalling technology. Something that says "hey his range of urls is making too much requests when they are not supposed to", blocks them, and sends a report to an administrator. That could be done with a neural network.
On the nasty part of things, some virus makers could find lucrative uses to neural networks. Adaptative trojans that "recognized" credit card numbers on a hard drive (instead of looking for certain cookies) or that "learn" how to mask themselves from detectors automatically.
I've been having fun trying to implement a bot based on a neural net for the Diplomacy board game, interacting via DAIDE protocols. It turns out to be extremely tricky, so I've turned to XCS to simplify the problem.
Suppose EBay used neural nets to predict how likely a particular item was to sell; predict what the best day to list items of that type would be, suggest a starting price or "buy it now price"; or grade your description based on how likely it was to attract buyers? All of those could be useful features, if they worked well enough.
Neural net applications are great for representing discrete choices and the whole behavior of how an individual acts (or how groups of individuals act) when mucking around on the web.
Take news reading for instance:
Back in the olden days, you picked up usually one newspaper (a choice), picked a section (a choice), scanned a page and chose an article (a choice), and read the basics or the entire article (another choice).
Now you choose which news site to visit and continue as above, but now you can drop one paper, pick up another, click on ads, change sections, and keep going with few limits.
The whole use of the web and the choices people make based on their demographics, interests, experience, politics, time of day, location, etc. is a very rich area for NN application. This is especially relevant to news organizations, web page design, ad revenue, and may even be an under explored area.
Of course, it's very hard to predict what one person will do, but put 10,000 of them that are the same age, income, gender, time of day, etc. together and you might be able to predict behavior that will lead to better designs. Imagine a newspaper (or even a game) that could be scaled to people's needs based on demographics. An ad man's dream !
How about connecting users to the closest DNS, and making sure there are as few bounces as possible between the request and the destination?
Friend recommendation in social apps (Linkedin,facebook,etc)

Resources