Data source of English speech phrases - speech

I am doing a research on developing a simulated environment for the students (who use English as second language) to practice English speaking.
In one part of my development, I need a data source which contains mostly using English speech phrases which are tagged against the real incident. As an exmaple,
“Ways to Apologise
Sorry.
I’m sorry.
I’m so sorry!
Sorry for your loss.”
I could find several sites which are providing this service http://edition.englishclub.com, but not a data source.
Has somebody used such a data source , which can be used like ‘wordnet’ ? If so please help me to carry on this forward. Otherwise I have to develop such a data source which I feel like reinventing the wheel.

Related

Is it possible to adapt and existing NLP tool in english to Swedish? and what´s the best approach?

Whats the best approach of using existing NLP tools in english with another language ex.spanish ?
That's an awfully broad question, and you'd need to provide some more pointers. However, if you're interested in general research on the topic, you can try Hana, Feldman, Brew (2004) "Tagging Russian using Czech morphology" and Resnik's 2004 "Using bilingual text for monolingual annotation" and start from there.
In general, you'd want to have a bicorpus (say, English/Swedish). Then establish mappings using alignment (that's a common topic in machine translation with many established results.)
You can then tag the English side, and use the mapping to "translate" these mappings into the Swedish side. Then you can train the same tool that created the mappings on the English side using the newly annotated Swedish corpus.
It goes without saying that you'll lose quite a bit of quality and that this technique only works for supervised methods. You should probably try to find properly annotated Swedish corpora and tools. There are a few out there.

New language in voice recognition

I am interesting in voice recognition applications and algorithms but actually didn't use them for development yet.
I have a several questions and will appreciate your advice.
I think I need to know:
What kind of open source softwares are available now? (does the Google's voice search program uses any open source voice recognition soft?).
Whether the existing voce recognition softwares, even paid, provide an interface for adding a new language to be recognized?
(it is prefferable for me to find/use libraries which could be addapted into a mobile application)
If you think that it would be better to know something else also, please let me know!!
Thank you all very much.
Arsen
to add an new language, you must make your own tts and stt. Without them, speech recognition is not possible. Its way too complicated to create a text to speech for an new language. Just Google for your language and surely you will find a tts. Or just proceed with making one. :P

Cooperation tool group: coding in C

Is there any tool, similar to codepad, writing code in C language that I can share my code with a group and my group can make changes and simultaneous views in real time editing?
I can't tell you enough that this is going to make your work more difficult if you're planning on using this for anything other than something like a code review. However, it's called a real-time collaborative editor. There are a ton of them. I used one on linux a while back that I can't remember the name of, but in the mean-time, let wikipedia start you off...
http://en.wikipedia.org/wiki/Collaborative_real-time_editor
Edit:
The tool I used on Linux that worked well was called Gobby.
There are a bunch of others in this question on SO Real time tool for collaborative coding
Sorry for resurrecting an old question but I thought I should share this.
I usually use Collab.Center (http://collab.center). Some features I like about it better than others are:
Online, real-time collaborative coding
Support for a lot of languages (40+, I think) (EX: C, C++, Java, HTML/CSS/JS, PHP, etc)
Text and Video (Webcam) chat (Requires Sign-In)
Syntax highlighting, auto-closing brackets, matching brackets, etc.
Ability to manage all your documents (Requires Sign-In)
Private documents (Requires Sign-In)
I think it would be great for you and your group, if you haven't already found an alternative.

How to collect data from a website

Preface: I have a broad, college knowledge, of a handful of languages (C++, VB,C#,Java, many web languages), so go with which ever you like.
I want to make an android app that compares numbers, but in order to do that I need a database. I'm a one man team, and the numbers get updated biweekly so I want to grab those numbers off of a wiki that gets updated as well.
So my question is: how can I access information from a website using one of the languages above?
What I understand the problem to be: Some entity generates a data set (i.e. numbers) every other week and you have a need to download that data set for treatment (e.g. sorting).
Ideally, the web site maintaining the wiki would provide a Service, like a RESTful interface, to easily gather the data. If that were the case, I'd go with any language that provides easy manipulation of HTTP request & response, and makes your data manipulation easy. As a previous poster said, Java would work well.
If you are stuck with the wiki page, you have a couple of options. You can parse the HTML your browser receives (Perl comes to mind as a decent language for that). Or you can use tools built for that purpose such as the aforementioned Jsoup.
Your question also mentions some implementation details such as needing a database. Evidently, there isn't enough contextual information for me to know whether that's optimal, so I won't address this aspect of the problem.
http://jsoup.org/ is a great Java tool for accessing content on html pages
Consider https://scraperwiki.com/ - it's a site where users can contribute scrapers. It's free as long as you let your scraper be public. The results of your scraper are exposed as csv and JSON.
If you don't know what a "scraper" is, google "screen scraping" - it's a long and frustrating tradition for coders, who have dealt with the same problem you have since the beginning of networked computing.
You could check out :http://web-harvest.sourceforge.net/
For Python, BeautifulSoup is one of the most tolerant HTML parsers out there. The documentation also lists similar libraries in Ruby and Java, so you'll probably find something relevant there.

Cakephp website with English and Arabic support for the same database

Im building a website in CakePHP 1.3. My requirement is to have a website with arabic and english support. I want that if a user is entering the information in arabic so when the english user sees the same information it should be in english and vice versa.
As far as localing the labels ive done that using po files. Its pretty straight forward.
But for the database im using the Cakephp's built-in Translate Behaviour. But it again doesn't translate anything and creates another copy of the data with the current locale that is in use.
Please help me in which direction i should move.
I want to know the best practices that should be followed for this kind of scenario.
May be translating db values is not the best solution and should save the values as in whatever language they are coming.
Any help and suggestions would be highly appreciated.
It isn't actually possible to have CakePHP automatically translate data that is entered.
The Translate Behavior allows you to enter the same content in multiple languages and then retrieve the appropriate language from the database, based on the language that you currently have set in your config. It doesn't actually translate anything for you.
Theoretically, you could add a function to the Model::beforeSave() callback that would submit the Arabic text to a service like Google Translate and then save both Arabic and English versions to their appropriate tables, but the results won't necessarily be very good. As #deceze said in his comment to your question, machine translation is a hard problem.

Resources