comments in multilingual support - multilingual

I am making a website application to support factory automation, which will have users from various countries knowing different languages. I have internationalized all the string in the website so it is understandable by all users. However users have to write comments on the website related to factory operations, which they will write in their own language and it may not be understandable by users in other countries.
I wanted to know what are the best practices to help with this scenario.
One way I was thinking to not let users write comments- rather I provide possibilities of comments in a drop down which they can select. And I can internationalize those possibilities. But this is not an elegant solution, since the 'possible comments' may not be comprehensive.

There isn't really a solid no-fail solution available for this kind of problem, but here are some possibilities:
Leverage a translation engine and computer-translate the comments. How well this works depends on the engine used and the language, but it gives the reader a gist of the meaning. This solution loses a lot of use when there are a lot of technical or proprietary terms used. A lot of international webshops actually use this technique.
Encourage your users to post comments in a common language, or a language that most of your users will know, like English, Chinese of Spanish, depending on your markets.
Employ translators to regularly translate essential comments
The solution you mentioned is also pretty decent when the possible text is limited, otherwise it will spin out of control very fast.

Related

Why is convention and consistency important while working with data fields/names?

The issue is about good practice with database, form fields, and coding in general.
We run a content providing platform, much like Buzzfeed and Wired. I am currently implementing the OpenGraph meta tags for each posts, so that the post links are nicely presented in external websites such as Facebook.
A co-worker from the marketing team insisted that we should put something else other than title in the 'title' field for marketing reasons.
I argued that the Open Graph meta tags should truthfully represent the content of the link, to conserve consistency and convention - that the meta tags should not be considered 'one-off's.
However I couldn't further explain as to why I should! I'm not really good with words myself.
Most of the quarrels involve other workers wanting to 'hack' with perfectly fine APIs or implementations and I have to convince them why it is important to at least stay in the safe zone while possible.
I know convention and consistency is one of the most important practice with technology but I think I just got used to the fact and forgot my university lectures on why it is so.
Could I get some thoughts on this issue?
A co-worker from the marketing team insisted that we should put something else other than title in the 'title' field for marketing reasons.
That's a valid decision. Your job is to help save costs or make money for the business. It is not your job to maintain the Facebook ecosystem as a whole. That's not what you are payed to do.
If you don't have any business reason why this should not be done you have no case. Such a reason could be that Facebook would penalize this or that this creates some development cost or risk.
If this is not a technical decision at all, and I see no reason it would be in the question, it's his decision anyway. In that case you need to inform him of the concerns that you see and let him decide.
You clearly try to work in a mindset that lets self-discipline prevail over short-term gains and quick-and-dirty hacks. Managing to do that is always beneficial in the long run, but convincing managers and/or sales people to let go of the short-term gain is never easy (on the contrary, most of the time it is simply impossible).
Just want to let you know that there are many, many IT folks who "feel your pain". Don't give up your laudable mindset too easily.
Convention in naming makes for source code that is more readily understandable by others who follow the same conventions. That in turn makes for less costlier maintenance. Consistency in choosing "appropriate" names for things has similar benefits. Saying on the tin what's inside (and not something completely different or something way too vague and ambiguous) is the best possible practice in computing, but it is the worst possible one in marketing.

Conversation bot source or API

I would like to make a bot that can carry on a simple conversation. I would like to be able to supply the bot with parameters about the things it knows and how it responds to certain subjects. I am wondering if anyone knows of any freely available source code or an API for a decent conversational bot.
I would like to use this to facilitate gaming by having computer-controlled characters that interact with the real players without having completely pre-scripted, static dialog. I am hoping that I can find something capable of holding a simple, generic conversation unless asked about a specific topic, at which point it can give specific replies to a pre-set list of specific topics.
I am asking more about the conversational-processing aspect and not so much about a front end or hooks to other apps or anything like that. Initially, I will just make this a local command-line based thing, then if satisfied I am looking into libpurple as an API to access various communication networks once I have the dialog processing ready.
So, does anyone know of any source code or API for something like this? Google brings up mostly tools for things like imified. I'm not expecting there to be a lot. A source code for something that exists that can handle various emotions and topics and such would be awesome, but I'd be happy with something that just holds the simplest of conversations, as there should be something somewhere that does this, seeing how there are multiple IM bots in existence.
In the absence of a good source or API, would anyone happen to know of any good materials about programming an AI that can have a conversation? Again, I'm not talking about PhD papers discussing robots that can pass believably as humans or anything like that; I mean materials that discuss some simple programming techniques that common conversational bots use to hold rudimentary conversations.
Because of the libpurple API, I'll probably be doing this in C++. So C++ resources are preferable but not required.
(edit) I just stumbled onto AIML (Artificial Intelligence Markup Language). I am currently looking into that, and it sounds like it might be promising, especially if there are any pre-made conversational resources available for it, as then I could just add topics to it in the manner I mentioned, if I am understanding it correctly.
AIML is old, obsolete and is a torture to create his database. I suggest you follow this gamasutra's article about chatbot languages. This article describes the ChatScript language, is a great alternative for AIML.
Another language is RiveScript that have a cool clean style, but it seem like a copy of AIML with the same bad concepts.
I'm developing the Aerolito language that is based on YAML, it's just a hobby project and it's not usable yet. =]
In my opinion, ChatScript is the best option for now.
I understand this question is old, but things have changed in the time since the question has posted. Check out the following projects, these bots learn from either text files, irc chat logs or in the case of triplie, they can read websites (albeit not perfectly).
triplie-ng: https://github.com/spion/triplie-ng
cobe: https://github.com/pteichman/cobe
Giorgio Robino mentioned http://superscriptjs.com/ but it's more than just chatscript - it's a superset of rivescript and chatscript and also includes a built-in triple store to implement WordNet etc.

What is the best approach of creating a talking bot?

When creating a AI talking bot what kind of methods of design should I use? Should it be one function, multiple modules, should it have classes?
Understanding language is complicated, so the goal you need to determine first is what aspect of language you want to understand.
An AI must be able to understand what the person says to it, then relate it to what it already knows, and then generate a legitimate response.
These three steps can all be thought of as nearly independent, so you need to address each on its own.
The brain, the world's best language processor, uses a Neural Network, but that's not likely to work well for you.
A logic-based proof solving system, where facts that follow from facts are derived would probably work best, and I know of at least one system that uses it fairly effectively.
I'd start with an existing AI program (like the famous Eliza) and run its output through a speech synthesizer.
Some source for Eliza is available here. One open source speech synthisizer is FreeTTS.
If you're using a language other than Java, there are similar candidates AI bots and text-to-speech code out there.
I've started to do some work in this space using this open source project called Talkify:
https://github.com/manthanhd/talkify
It is a bot framework intended to help orchestrate flow of information between bot providers like Microsoft (Skype), Facebook (Messenger) etc and your backend services. The framework doesn't really provide implementation for the bot providers yet but does provide hooks into its natural language recognition engine.
The built in natural language recognition library can be used to classify sentences to topics which you can then map to skill functions.
Give it a try! I'd really like people's input to see if how it can be improved.

How to identify ideas and concepts in a given text

I'm working on a project at the moment where it would be really useful to be able to detect when a certain topic/idea is mentioned in a body of text. For instance, if the text contained:
Maybe if you tell me a little more about who Mr Jones is, that would help. It would also be useful if I could have a description of his appearance, or even better a photograph?
It'd be great to be able to detect that the person has asked for a photograph of Mr Jones. I could take a really naïve approach and just look for the word "photo" or "photograph", but this would obviously be no good if they wrote something like:
Please, never send me a photo of Mr Jones.
Does anyone know where to start with this? Is it even possible?
I've looked into things like nltk, but I've yet to find an example of someone doing something similar and am still not entirely sure what this kind of analysis is called. Any help that can get me off the ground would be great.
Thanks!
The best thing out there that might be useful to you is automatic sentiment analysis. This is used, for example, to judge whether, say, a customer review is positive or negative. I cannot give you direct pointers to available tools, but this is what you are looking for.
I must say, though, that this is a current hot topic in natural language processing and I’ve seen a number of papers at conferences. It’s definitely quite a complex matter and if you’re starting from scratch, it might take quite some time before you get the results that you want.
NLTK is not a bad framework for parsing natural language but beware that this is not a simple matter. Doing stuff like this is really research level programming.
A good thing that makes it much easier is if you have a very limited domain - say your application focuses on information about famous writers, then you can avoid some complexities of natural language like certain types of ambiguities.
Where to start? Good question. I don't know of any tutorials on the topic (and I presume you tried the Google option) but I'd imagine that iTunes U would have a course on the topic. If not I can post a link to a course I've done that mentions the subject and wasn't completely horrible: http://www.inf.ed.ac.uk/teaching/courses/inf2a/lecturematerials/index.html#lecture01
The problem that u tackle is very challenging.
I would start by first identifying the entities in the text (problem referred as Named Entity Recognition, google it), and then a I would try to identify concepts.
If want to roughly identify what is the text about, I suggest that you start by using WordNet and according to the words and their places in the hierarchy to identify the concepts involved.
If you want to produce a system which show real intelligence than you should start researching about resources such as CYC (OpenCYC) which will allow you to convert the sentences into FOL sentences.
This hardcore AI, approach to solving your problem. For simple chat bot, it would be easier to rely on simple statistical methods.
good luck

What programming language is used to IMPLEMENT google algorithm?

It is known that google has best searching & indexing algorithm.
The also have good relevancy.
They are also quicker in getting down the latest results.
All that's fine.
What programming language (c, c++, java, etc...) & database (oracle, MySQL, etc...) have they used in achieving this (since they have to manipulate with volume of data quickly and effectively)?.
Though I'm not looking for their in-depth architecture (if in case violates their company policies) an overview of all such things could be useful.
Anybody please add you valuable suggestions and insight on this?
Google internally use C++, Java and Python. See Rhino on Rails:
One of the (hundreds of) cool things
about working for Google is that they
let teams experiment, as long as it's
done within certain broad and
well-defined boundaries. One of the
fences in this big playground is your
choice of programming language. You
have to play inside the fence defined
by C++, Java, Python, and JavaScript.
Google's search algorithm is essentially MapReduce, which stems from functional programming techniques, implemented in C++.
Google has its own storage mechanism for this called the Google File System.
Mainly pigeons:
PigeonRank's success relies primarily on the superior trainability of the domestic pigeon (Columba livia) and its unique capacity to recognize objects regardless of spatial orientation. The common gray pigeon can easily distinguish among items displaying only the minutest differences, an ability that enables it to select relevant web sites from among thousands of similar pages.
Relevance of search results is governed by quality of information retrieval algorithms they use, not the programming language.
But C++ is what most of their backend code is written in (for most services).
They don't use any off-the-shelf RDBMS products for data storage. All of that is written in-house.
Check it out, the Bigtable.

Resources