Determining locale with Babel and webapp2 - google-app-engine

I've recently added Babel to my GAE/webapp2 site, and it works, but I'm confused about the proper way to get the locale.
In this answer, he gets the locale with self.request.get('locale', 'es_ES') but this only seems to work if something like ?locale=fr_FR is in the URL. I've never seen anyone set a locale in the URL. Is there any reason to get a locale from the URL?
Getting the locale from the headers seems to make more sense:
self.request.headers.get('Accept-Language').split(",")
One quirk is that browsers provide locale with a hyphen and lower case ("fr-fr") but Babel expects an underscore and mixed case (fr_FR). Where the browser provides a locale header of only "fr-fr", the following:
locale = self.request.headers.get('Accept-Language').split(",")[0]
i18n.get_i18n().set_locale(locale)
will end up using the default language rather than French because Babel is looking for "fr_FR". Should I take the locale from the header and convert the hyphen to underscore and lower case to mixed case? Seems that Babel should do that for me.
(I know that the header may have multiple locales and that I should check them all, but I'd still need to do the conversion for each one.)

I'll provide an answer based on the research I've done since I asked the question.
If you are providing links to allow a user to change the locale (e.g., a drop down of languages), then putting the locale in the URL seems like a good solution. For my situation, I'm only interested in automatic detection of locale so I won't implement getting locale from a URL.
Regarding the different locale formats (browsers vs. Babel), it seems that you just need to convert yourself. This answer has great suggestions.
For my use case, it is unlikely that I will implement multiple locales for the same language in different countries any time soon (e.g., I will implement fr_FR but not any other fr_*). I'm just going to take the first two letters of the locale and drop the rest, and that makes the options much simpler.

I set locale in my code this way.
def dispatch(self):
# Get a session store for this request.
self.session_store = sessions.get_store(request=self.request)
if self.request.host.find('.br') > 0:
i18n.get_i18n().set_locale('pt-br')
elif self.request.host.find('klok') > 0:
i18n.get_i18n().set_locale('sv')
elif self.request.host.find('business') > 0:
i18n.get_i18n().set_locale('en')
else:
lang_code_get = self.request.get('hl', None)
if lang_code_get is None:
lang_code = self.session.get('HTTP_ACCEPT_LANGUAGE', None)
lang_code_browser = os.environ.get('HTTP_ACCEPT_LANGUAGE')
if lang_code:
i18n.get_i18n().set_locale(lang_code)
if lang_code_browser and lang_code is None:
self.session['HTTP_ACCEPT_LANGUAGE'] = lang_code_browser
i18n.get_i18n().set_locale(lang_code_browser)
else:
i18n.get_i18n().set_locale(lang_code_get)
try:
# Dispatch the request.
#logging.info('trying to dispatch')
webapp2.RequestHandler.dispatch(self)
except Exception, ex:
logging.error(ex)
self.error(404)
finally:
# Save all sessions.
self.session_store.save_sessions(self.response)

Related

How to make Text::slug() convert german umlauts properly?

I am using CakePHP 3.6, and when I am using words with german umlauts like:
Text::slug('Grundstücke')
I will get:
Grundstucke (where ü = u)
but that's not correct, I should get:
Grundstuecke (where ü = ue)
Is there an option to set so that umlauts are being converted the way I want them to?
Change your transliterator
The Text::slug() uses internally transliterator_transliterate (see php doc).
So you need to change the default transliterator that is being used.
After some research I found one that will work for you.
At the end of your bootstrap.php file add:
\Cake\Utility\Text::setTransliteratorId( 'de-ASCII; Any-Latin; Latin-ASCII; [\u0080-\u7fff] remove ');
Then your text will be converted as you expect.
Notes
Resources I've used to find this answer:
CakekPHP Text::transliterate()
Transliteration Identifiers Documentation
transliterator_list_ids - to get a list of valid identifiers - this is how I found the one that finally worked: de-ASCII
Text Utility API - to set the a new default transliterator id.

GAE: Writing the API: a simple PATCH method (Python)

I have a google-cloud-endpoints, in the docs, I did'nt find how to write a PATCH method.
My request:
curl -XPATCH localhost:8080/_ah/api/hellogreeting/1 -d '{"message": "Hi"}'
My method handler looks like this:
from models import Greeting
from messages import GreetingMessage
#endpoints.method(ID_RESOURCE, Greeting,`
path='hellogreeting/{id}', http_method='PATCH',
name='greetings.patch')
def greetings_patch(self, request):
request.message, request.username
greeting = Greeting.get_by_id(request.id)
greeting.message = request.message # It's ok, cuz message exists in request
greeting.username = request.username # request.username is None. Writing the IF conditions in each string(checking on empty), I think it not beatifully.
greeting.put()
return GreetingMessage(message=greeting.message, username=greeting.username)
So, now in Greeting.username field will be None. And it's wrong.
Writing the IF conditions in each string(checking on empty), I think it not beatifully.
So, what is the best way for model updating partially?
I do not think there is one in Cloud Endpoints, but you can code yours easily like the example below.
You will need to decide how you want your patch to behave, in particular when it comes to attributes that are objects : should you also apply the patch on the object attribute (in which case use recursion) or should you just replace the original object attribute with the new one like in my example.
def apply_patch(origin, patch):
for name in dir( patch ):
if not name.startswith( '__' ):
setattr(origin,name,getattr(patch,name))

How to query atom field with unicode value in Google App Engine production search?

I wrote some text search with use Google App Engine search.
In SDK I tested such query on atom field:
u'tag:"wartości"'
In production I run the same query but it not works on same data.
How can I do unicode query on atom field?
Is it possible to use unicode in Google App Engine search?
We are aware of this issue and plan to fix ASAP. The fix that we're currently planning will require that the atom field value include exactly the same accent characters in order to match. Matches will continue to be case-insensitive. We expect that at least initially, values that use combining diacritical marks will be treated as different values than those using precomposed characters. We may revisit that decision depending on feedback, but it's the most straightforward fix on our end.
For more on the precomposed characters vs. combining diacritical marks, see this Wikipedia article:
http://en.wikipedia.org/wiki/Precomposed_character
Chris
It looks that I need translate AtomField values into new string and I need to translate queries too. This workaround will allow only Polish unicode search. I do not know tonkenization rules so I use 'q', 'x' to expand alphabet since not used in Polish.
# coding=utf-8
translate = {
u'ą': u'aq',
u'Ą': u'Aq',
u'ć': u'cq',
u'Ć': u'Cq',
u'ę': u'eq',
u'Ę': u'Eq',
u'ł': u'lq',
u'Ł': u'Lq',
u'ń': u'nq',
u'Ń': u'Nq',
u'ó': u'oq',
u'Ó': u'Oq',
u'ś': u'sq',
u'Ś': u'Sq',
u'ż': u'zx',
u'Ż': u'Zx',
u'ź': u'zq',
u'Ź': u'Zq',
}
import re
reTranslate = re.compile(u'(%s)' % u'|'.join(translate))
print reTranslate.pattern
test = u"""\
Właściwie prowadzona komunikacja wewnętrzna w firmie,\
zwłaszcza dużej czy posiadającej rozproszoną sieć oddziałów,\
może przynieść oszczędność czasu, a co za tym idzie, również pieniędzy."""
print reTranslate.sub(lambda match: translate[match.group(0)], test)

Plotting a word-cloud by date for a twitter search result? (using R)

I wish to search twitter for a word (let's say #google), and then be able to generate a tag cloud of the words used in twitts, but according to dates (for example, having a moving window of an hour, that moves by 10 minutes each time, and shows me how different words gotten more often used throughout the day).
I would appreciate any help on how to go about doing this regarding: resources for the information, code for the programming (R is the only language I am apt in using) and ideas on visualization. Questions:
How do I get the information?
In R, I found that the twitteR package has the searchTwitter command. But I don't know how big an "n" I can get from it. Also, It doesn't return the dates in which the twitt originated from.
I see here that I could get until 1500 twitts, but this requires me to do the parsing manually (which leads me to step 2). Also, for my purposes, I would need tens of thousands of twitts. Is it even possible to get them in retrospect?? (for example, asking older posts each time through the API URL ?) If not, there is the more general question of how to create a personal storage of twitts on your home computer? (a question which might be better left to another SO thread - although any insights from people here would be very interesting for me to read)
How to parse the information (in R)? I know that R has functions that could help from the rcurl and twitteR packages. But I don't know which, or how to use them. Any suggestions would be of help.
How to analyse? how to remove all the "not interesting" words? I found that the "tm" package in R has this example:
reuters <- tm_map(reuters, removeWords, stopwords("english"))
Would this do the trick? I should I do something else/more ?
Also, I imagine I would like to do that after cutting my dataset according to time (which will require some posix-like functions (which I am not exactly sure which would be needed here, or how to use it).
And lastly, there is the question of visualization. How do I create a tag cloud of the words? I found a solution for this here, any other suggestion/recommendations?
I believe I am asking a huge question here but I tried to break it to as many straightforward questions as possible. Any help will be welcomed!
Best,
Tal
Word/Tag cloud in R using "snippets" package
www.wordle.net
Using openNLP package you could pos-tag the tweets(pos=Part of speech) and then extract just the nouns, verbs or adjectives for visualization in a wordcloud.
Maybe you can query twitter and use the current system-time as a time-stamp, write to a local database and query again in increments of x secs/mins, etc.
There is historical data available at http://www.readwriteweb.com/archives/twitter_data_dump_infochimp_puts_1b_connections_up.php and http://www.wired.com/epicenter/2010/04/loc-google-twitter/
As for the plotting piece: I did a word cloud here: http://trends.techcrunch.com/2009/09/25/describe-yourself-in-3-or-4-words/ using the snippets package, my code is in there. I manually pulled out certain words. Check it out and let me know if you have more specific questions.
I note that this is an old question, and there are several solutions available via web search, but here's one answer (via http://blog.ouseful.info/2012/02/15/generating-twitter-wordclouds-in-r-prompted-by-an-open-learning-blogpost/):
require(twitteR)
searchTerm='#dev8d'
#Grab the tweets
rdmTweets <- searchTwitter(searchTerm, n=500)
#Use a handy helper function to put the tweets into a dataframe
tw.df=twListToDF(rdmTweets)
##Note: there are some handy, basic Twitter related functions here:
##https://github.com/matteoredaelli/twitter-r-utils
#For example:
RemoveAtPeople <- function(tweet) {
gsub("#\\w+", "", tweet)
}
#Then for example, remove #d names
tweets <- as.vector(sapply(tw.df$text, RemoveAtPeople))
##Wordcloud - scripts available from various sources; I used:
#http://rdatamining.wordpress.com/2011/11/09/using-text-mining-to-find-out-what-rdatamining-tweets-are-about/
#Call with eg: tw.c=generateCorpus(tw.df$text)
generateCorpus= function(df,my.stopwords=c()){
#Install the textmining library
require(tm)
#The following is cribbed and seems to do what it says on the can
tw.corpus= Corpus(VectorSource(df))
# remove punctuation
tw.corpus = tm_map(tw.corpus, removePunctuation)
#normalise case
tw.corpus = tm_map(tw.corpus, tolower)
# remove stopwords
tw.corpus = tm_map(tw.corpus, removeWords, stopwords('english'))
tw.corpus = tm_map(tw.corpus, removeWords, my.stopwords)
tw.corpus
}
wordcloud.generate=function(corpus,min.freq=3){
require(wordcloud)
doc.m = TermDocumentMatrix(corpus, control = list(minWordLength = 1))
dm = as.matrix(doc.m)
# calculate the frequency of words
v = sort(rowSums(dm), decreasing=TRUE)
d = data.frame(word=names(v), freq=v)
#Generate the wordcloud
wc=wordcloud(d$word, d$freq, min.freq=min.freq)
wc
}
print(wordcloud.generate(generateCorpus(tweets,'dev8d'),7))
##Generate an image file of the wordcloud
png('test.png', width=600,height=600)
wordcloud.generate(generateCorpus(tweets,'dev8d'),7)
dev.off()
#We could make it even easier if we hide away the tweet grabbing code. eg:
tweets.grabber=function(searchTerm,num=500){
require(twitteR)
rdmTweets = searchTwitter(searchTerm, n=num)
tw.df=twListToDF(rdmTweets)
as.vector(sapply(tw.df$text, RemoveAtPeople))
}
#Then we could do something like:
tweets=tweets.grabber('ukgc12')
wordcloud.generate(generateCorpus(tweets),3)
I would like to answer your question in making big word cloud.
What I did is
Use s0.tweet <- searchTwitter(KEYWORD,n=1500) for 7 days or more, such as THIS.
Combine them by this command :
rdmTweets = c(s0.tweet,s1.tweet,s2.tweet,s3.tweet,s4.tweet,s5.tweet,s6.tweet,s7.tweet)
The result:
This Square Cloud consists of about 9000 tweets.
Source: People voice about Lynas Malaysia through Twitter Analysis with R CloudStat
Hope it help!

the best way to make codeigniter website multi-language. calling from lang arrays depends on lang session?

I'm researching hours and hours, but I could not find any clear, efficient way to make it :/
I have a codeigniter base website in English and I have to add a Polish language now. What is the best way to make my site in 2 language depending visitor selection?
is there any way to create array files for each language and call them in view files depends on Session from lang selection? I don't wanna use database.
Appreciate helps! I'm running out of deadline :/ thanks!!
Have you seen CodeIgniter's Language library?
The Language Class provides functions
to retrieve language files and lines
of text for purposes of internationalization.
In your CodeIgniter system folder you'll
find one called language containing sets
of language files. You can create your
own language files as needed in order
to display error and other messages in
other languages.
Language files are typically stored in
your system/language directory. Alternately
you can create a folder called language
inside your application folder and store
them there. CodeIgniter will look first
in your application/language directory.
If the directory does not exist or the
specified language is not located there
CI will instead look in your global
system/language folder.
In your case...
you need to create a polish_lang.php and english_lang.php inside application/language/polish
then create your keys inside that file (e.g. $lang['hello'] = "Witaj";
then load it in your controller like $this->lang->load('polish_lang', 'polish');
then fetch the line like $this->lang->line('hello'); Just store the return value of this function in a variable so you can use it in your view.
Repeat the steps for the english language and all other languages you need.
Also to add the language to the session, I would define some constants for each language, then make sure you have the session library autoloaded in config/autoload.php, or you load it whenever you need it. Add the users desired language to the session:
$this->session->set_userdata('language', ENGLISH);
Then you can grab it anytime like this:
$language = $this->session->userdata('language');
In the controller add following lines when you make the cunstructor
i.e, after
parent::Controller();
add below lines
$this->load->helper('lang_translate');
$this->lang->load('nl_site', 'nl'); // ('filename', 'directory')
create helper file lang_translate_helper.php with following function and put it in directory system\application\helpers
function label($label, $obj)
{
$return = $obj->lang->line($label);
if($return)
echo $return;
else
echo $label;
}
for each of the language, create a directory with language abbrevation like en, nl, fr, etc., under
system\application\languages
create language file in above (respective) directory which will contain $lang array holding pairs label=>language_value as given below
nl_site_lang.php
$lang['welcome'] = 'Welkom';
$lang['hello word'] = 'worde Witaj';
en_site_lang.php
$lang['welcome'] = 'Welcome';
$lang['hello word'] = 'Hello Word';
you can store multiple files for same language with differently as per the requirement
e.g, if you want separate language file for managing backend (administrator section) you can use it in controller as $this->lang->load('nl_admin', 'nl');
nl_admin_lang.php
$lang['welcome'] = 'Welkom';
$lang['hello word'] = 'worde Witaj';
and finally
to print the label in desired language, access labels as below in view
label('welcome', $this);
OR
label('hello word', $this);
note the space in hello & word you can use it like this way as well :)
whene there is no lable defined in the language file, it will simply print it what you passed to the function label.
I second Randell's answer.
However, one could always integrate a GeoIP such as http://www.maxmind.com/app/php
or http://www.ipinfodb.com/. Then you can save the results with the codeigniter session class.
If you want to use the ipinfodb.com api You can add the ip2locationlite.class.php file to your codeigniter application library folder and then create a model function to do whatever geoip logic you need for your application, such as:
function geolocate()
{
$ipinfodb = new ipinfodb;
$ipinfodb->setKey('API KEY');
//Get errors and locations
$locations = $ipinfodb->getGeoLocation($this->input->ip_address());
$errors = $ipinfodb->getError();
//Set geolocation cookie
if(empty($errors))
{
foreach ($locations as $field => $val):
if($field === 'CountryCode')
{
$place = $val;
}
endforeach;
}
return $place;
}
For easier use CI have updated this so you can just use
$this->load->helper('language');
and to translate text
lang('language line');
and if you want to warp it inside label then use optional parameter
lang('language line', 'element id');
This will output
// becomes <label for="form_item_id">language_key</label>
For good reading
http://ellislab.com/codeigniter/user-guide/helpers/language_helper.html
I've used Wiredesignz's MY_Language class with great success.
I've just published it on github, as I can't seem to find a trace of it anywhere.
https://github.com/meigwilym/CI_Language
My only changes are to rename the class to CI_Lang, in accordance with the new v2 changes.
When managing the actual files, things can get out of sync pretty easily unless you're really vigilant. So we've launched a (beta) free service called String which allows you to keep track of your language files easily, and collaborate with translators.
You can either import existing language files (in PHP array, PHP Define, ini, po or .strings formats) or create your own sections from scratch and add content directly through the system.
String is totally free so please check it out and tell us what you think.
It's actually built on Codeigniter too! Check out the beta at http://mygengo.com/string
Follow this https://github.com/EllisLab/CodeIgniter/wiki/CodeIgniter-2.1-internationalization-i18n
its simple and clear, also check out the document # http://ellislab.com/codeigniter/user-guide/libraries/language.html
its way simpler than
I am using such code in config.php:
$lang = 'ru'; // this language will be used if there is no any lang information from useragent (for example, from command line, wget, etc...
if (!empty($_SERVER['HTTP_ACCEPT_LANGUAGE'])) $lang = substr($_SERVER['HTTP_ACCEPT_LANGUAGE'],0,2);
$tmp_value = $_COOKIE['language'];
if (!empty($tmp_value)) $lang = $tmp_value;
switch ($lang)
{
case 'ru':
$config['language'] = 'russian';
setlocale(LC_ALL,'ru_RU.UTF-8');
break;
case 'uk':
$config['language'] = 'ukrainian';
setlocale(LC_ALL,'uk_UA.UTF-8');
break;
case 'foo':
$config['language'] = 'foo';
setlocale(LC_ALL,'foo_FOO.UTF-8');
break;
default:
$config['language'] = 'english';
setlocale(LC_ALL,'en_US.UTF-8');
break;
}
.... and then i'm using usualy internal mechanizm of CI
o, almost forget! in views i using buttons, which seting cookie 'language' with language, prefered by user.
So, first this code try to detect "preffered language" setted in user`s useragent (browser). Then code try to read cookie 'language'. And finaly - switch sets language for CI-application
you can make a function like this
function translateTo($language, $word) {
define('defaultLang','english');
if (isset($lang[$language][$word]) == FALSE)
return $lang[$language][$word];
else
return $lang[defaultLang][$word];
}
Friend, don't worry, if you have any application installed built in codeigniter and you wanna add some language pack just follow these steps:
1. Add language files in folder application/language/arabic (i add arabic lang in sma2 built in ci)
2. Go to the file named setting.php in application/modules/settings/views/setting.php. Here you find the array
<?php /*
$lang = array (
'english' => 'English',
'arabic' => 'Arabic', // i add this here
'spanish' => 'Español'
Now save and run the application. It's worked fine.

Resources