Sorting and i18n in Database - database

I have the following data in my db:
ID Name Region
100 Sam North
101 Dam South
102 Wesson East
...
...
...
Now, the region will be different in different languages. I want the Sorting to happen right, based on the display value rather than the internal value.
Any Ideas? (And yeah, sorting in memory using Java is not an option!)

Unfortunately, if you want to do that in the database, you will have to store the internationalized versions as well in the database (or at least their order). Otherwise, how could the database engine possibly know how to sort?
You will have to create a table with three columns: the English version, the language code and the translated version (with the English version and the language code together being the primary key). Then join to this table in in you query by using the English word and a language code and then sort on the internationalized version.

The best approach to use internationalization is to remove the internationalization from your application and keep it in separated i18n databases. In your application you keep a key that can be used to access those separated databases, normally xml or yml.
As rule of thumb i would suggest:
keep all database in one format, one place
extract internationalization strings from your application
lets your application to pull i18n strings from your i18n from your internationalization database.
You can check the RAILS approach to i18n. It's simply, clean and easy to use.

What do you mean "Display Value"? I take it you are somehow converting that into a different language in java itself, and don't store that value (the localised one, I assume) in the db? If so, you're kind of screwed. You need to get that data into the DB to be able to sort with it there.

Related

AngularJS: Store localized user input data in translation json files or database

I have an architecture issue related to localization. My concern is what is the best approach to store and manage localized user data. Let me explain:
I have an AngularJS webapp with a mysql database. For text translations we are using angular-translate with files. For labels, static text, etc is working great.
In the other hand, the user can create items (i.e. houses for rent) and fill a title and description for it. He also is able to edit that information. This information is gathered by a form and stored in DB at the moment.
We would like to provide translations for these user input data and with this scenario in mind, I see two approaches:
User stores data in his language in DB. We store the translations in DB (translations tables...) and provides translations from there.
User stores data in his language in DB. We store the translations in locale.json files and create an key in database to get those translations (angular-translate).
In both scenarios we need to translate whether the user creates or updates a title or description. But it looks like if you store it in database, at least you already have one default translation. If you store it in a json file, you are keeping the default translation data in two places.
From the maintainance point of view, to use the translation files looks a little more complex at first sight. Also, take into account each time a user input text is added or updated a deployment needs to be done.
However, from the performance point of view, probably the translation files is a better approach. Probably you are saving at least one query to the DB when the user change the language.
From the architectural point of view, I would say the user data should be stored in database.
What do you think?
Always store the user input.
Store the translation in DB only if you ALWAYS needs it.
If you rarely needs to do it offer a Transalation button for the user.
Do what's cheaper. If you got only one in a thousand inputs in another language and it's rarely visited there's no sense in wasting precious DB space, let it be done on the fly by demand.
Also how do you know it needs to be translated? Some people are billingual and there are cases where a abroad tourist is (strugling to) using a device set in another language.
Obs:.
Do You knows automatic translations are crap don't you? So how are you translating?
TL;DR: option 1. You may cache access to the translation tables or create materialised views (if your DBMS supports them) to denormalise your Property entity and have one readily-translated row per language.
Personally, I do not see the need for caching - how many times is the user going to change language, in production?

Django startswith unicode

Is it possible to get Author.objects.filter(surname__istartswith='Z') return results that also start with 'Ž', 'Ź' etc.?
The only solution that comes into my mind is to flatten surname with unicode transliteration and save it as surname_flat in db. Then Author.objects.filter(surname_flat__istartswith='Z') would work, but it requires database migration.
I'm using postgres.
The django-unaccent library has been written to provide the functionality you require.
By doing this however you are making your solution database-dependent, which may be an issue if you decide to move database engines in future.
Your solution to add an additional calculated column is the one that I would use, as this keeps your code db-independent. You can also index your column more effectively. Because the django-unaccent library uses a database-function-based search, it will do a column scan of your data every time you use it.

Need advice on multilingual data storage

This is more of a question for experienced people who've worked a lot with multilingual websites and e-shops. This is NOT a database structure question or anything like that. This is a question on how to store a multilingual website: NOT how to store translations. A multilingual website can not only be translated into multiple languages, but also can have language-specific content. For instance an english version of the website can have a completely different structure than the same website in russian or any other language. I've thought up of 2 storage schemas for such cases:
// NUMBER ONE
table contents // to store some HYPOTHETICAL content
id // content id
table contents_loc // to translate the content
content, // ID of content to translate
lang, // language to translate to
value, // translated content
online // availability flag, VERY IMPORTANT
ADVANTAGES:
- Content can be stored in multiple languages. This schema is pretty common, except maybe for the "online" flag in the "_loc" tables. About that below.
- Every content can not only be translated into multiple languages, but also you could mark online=false for a single language and disable the content from appearing in that language. Alternatively, that record could be removed from "_loc" table to achieve the same functionality as online=false, but this time it would be permanent and couldn't be easily undone. For instance we could create some sort of a menu, but we don't want one or more items to appear in english - so we use online=false on those "translations".
DISADVANTAGES:
- Quickly gets pretty ugly with more complex table relations.
- More difficult queries.
// NUMBER 2
table contents // to store some HYPOTHETICAL content
id, // content id
online // content availability (not the same as in first example)
lang, // language of the content
value, // translated content
ADVANTAGES:
1. Less painful to implement
2. Shorter queries
DISADVANTAGES:
2. Every multilingual record would now have 3 different IDs. It would be bad for eg. products in an e-shop, since the first version would allow us to store different languages under the same ID and this one would require 3 separate records to represent the same product.
First storage option would seem like a great solution, since you could easily use it instead of the second one as well, but you couldn't easily do it the other way around.
The only problem is ... the first structure seems a bit like an overkill (except in cases like product storage)
So my question to you is:
Is it logical to implement the first storage option? In your experience, would anyone ever need such a solution?
The question we ask ourselves is always:
Is the content the same for multiple languages and do they need a relation?
Translatable models
If the answer is yes you need a translatable model. So a model with multiple versions of the same record. So you need a language flag for each record.
PROS: It gives you a structure in which you can see for example which content has not yet been translated.
Separate records per language
But many times we see a different solution as the better one: Just seperate both languages totally. We mostly see this in CMS solutions. The story is not only translated but also different. For example in country 1 they have a different menu structure, other news items, other products and other pages.
PROS: Total flexibility and no unexpected records from other languages.
Example
We see it like writing a magazine: You can write one, then translate to another language. Yes that's possible but in real world we see more and more that the content is structurally different. People don't like to be surprised so you need lots of steps to make sure content is not visible in wrong languages, pages don't get created in duplicate etc.
Sharing logic
So what we do is most time: Share the views, make the buttons, inputs etc. translatable but keep the content seperated. So that every admin can just work in his area. If we need to confirm that some records are available in all languages we can always trick that by creating a link (nicely relational) between them but it is not the standard we use most of the time.
Really translatable records like products
Because we are flexible in creating models etc. we can just use decide how to work with them based on the requirements. I would not try to look for a general solution which works for all because there is none. You need a solution based on your data.
Assuming that you need a translatable model, as it is described by Luc, I would suggest coming up with some sort of special-character-delimited key-value pair format for the value column of the content table. Example:
#en=English Term#de=German Term
You may use UDFs (User Defined Functions in T-SQL) to set/get the appropriate term based on the specified language.
For selecting :
select id, dbo.GetContentInLang(value, #lang)
from content
For updating:
update content
set value = dbo.SetContentInLang(value, #lang, new_content)
where id = #id
The UDFs:
a. do have a performance hit but this also the case for join that you will have to do between the content and content_loc tables
and
b. are somehow difficult to implement but are reusable practically throughout your database.
You can also do the above on the application/UI layer.

The best/recommended way to translate Django database values

I'm trying to figure out the best way to translate actual database values (textual strings, not date formats or anything complicated) when internationalizing my Django application. The most logical ways I could come up with were to either:
hold a database column for every language (e.g. description_en, description_de, description_fr, ...) or
have a different database for every language (e.g. schema_en, schema_fr, schema_de, ...).
Are these the best options, or is there something else I'm missing? Thanks.
I was reading up on my django extensions, and found the django-modeltranslation plugin. It seems to do exactly what you want it to do.
I also found this small project which purpose is to synchronize localized strings into standard message files for fields of registered models.
Example:
import vinaigrette
vinaigrette.register(YourModel, ['name', 'description'])
The standard command
$ manage.py makemessages
Would maintain messages for each distinct values found in registered fields.
I have not had the occasion to try it yet.
But this seems for me to be the simplest way to translate data from db.

Internationalization on database level

Could anyone point me to some patterns addressing internationalization on the database level tasks?
The simplest way would be to add a text column for every language for every text column, but that is somehow smelly - really i want to have ability to add supported languages dynamically.
The solution i'm coming to is one main language that is saved in the model and a dictionary entity that gets queried for translations and saved translations to.
All i want is to hear from other people who have done this.
You could create a table that has three columns: target language code, original string, translated string. The index on the table would be on the first two columns, but I wouldn't bind this table to other tables with foreign keys. You'd need to add a join (possibly a left join to account for missing translations) for each of the terms you need to translate in each query you run. However, this will make all your queries very hairy and possibly kill performance as well.
Another thing you need to be aware of is actually translating the terms and maintaining an up-to-date translation table. This is very inconvenient to do directly against the database and is often done by non-technical people.
Usually when localizing an application you'd use something like gettext. The idea behind this suite of tools is to parse can parse source code to extract strings for translation and then create translation files from them. Since this suite has been around for a long time, there are a lot of different utilities based on it that help with the translation task, one of which is Poedit, a nice GUI editor for translating strings into different languages. It might be simpler to generate the unique list of terms as they appear in the database in a format gettext could parse, and do the translation in the application code. This way you'd be able to do the translation of the hard coded strings in the application and the database values using the same technique.

Resources