What language code should I use to support multiple languages? - database

I want to go with translation tables as described here in the third example.
This is not hard to implement but what I wonder is basically how do I want to encode those languages? I looked up the ISO 639-3 files that contain the latest languages and their codes but is it a good idea to include them all?
The Language table is supposed to provide all kinds of languages for Stores. Those stores are allowed to decide themselves which languages they want to support. However, my database is going to tell them how many languages there are.
So, is there a "common used" list of languages? I don't think that Facebook and/or Google really support 7866 languages which is the number of languages listed in ISO 639-3.
Or would I use the Language Culture Names codes like en-UK, en-US, de-AT, etc.?

If I understand your question, you're asking if you should make a reference table with 7866 rows, one for each language in ISO 639-3. I don't really see the downside as it's not going to incur much storage or performance cost, compared to just storing a subset.
The real question is which languages you want to actually translate and whether you want to support dialects. If you don't, you can just use languages like en, fr, etc, and you could just store those in the reference table if you wanted to (though the savings are minimal).
If you want dialects, then you would need en-us, en-gb etc. Since these should fall back to the non-dialect version (ie just en), it would be a good idea for your columns to distinguish between the language and the dialect. So you can store "en | us", "en | gb" (possibly "en | null" too) and it will be easy to provide translations as overrides, but with most of them falling back to a default dialect.

Related

How to search for questions and content about C programming

I've recently revived my C interest, which means I have a lot of interest in certain articles and questions covering topics within the language.
Over the years I've grown accustomed to using search engines for this, by entering a query like "how to use [library] in [programming language]". This works very well, but frequently it doesn't for C.
Most likely this is due to it being a single letter, and some websites and search engines probably treat it as an insignificant part of the query (like "a" or "to").
When searching on specific websites such as SO, I can use tags, but overall I still experience a lack of content compared to other programming languages.
Is there any "standard" way to include C in queries or inputs like this? With C++ for example, a lot of content can be found using "cpp", so maybe there's a comparable format-friendly term for C.
To search that in Google, if you want the search engines to treat a string as important, put it in double quotations as follows.
"important"
"c"

How to store "meta" source code in a database

I would like to store a computer program in a database instead of a number of text files. It should contain the structure and all objects, methods, dependencies etc. of the program. I do not want to store a specific language in the database but some kind of "meta" programming language. In a second step I would like to transform/export this structure in the database into either source code of a classic language (C#, Java, etc.) or compile it directly for CLR/JVM.
I think I am not the first person with this idea. I searched the internet and I think what I am looking for is called "source code in a database (SCID)" - unfortunately I could not find an implementation of this idea.
So my questions is:
Is there any program that stores "meta" program code inside of a database and let's you generate traditional text source code from it that can be compiled/executed?
Short remarks:
- It can also be a noSQL database
- I currently don't care how the program is imported/entered into the database
It sounds like you're looking for some kind of common markup language that adequately describes the common semantics of each target language - e.g. objects, functions, inputs, return values, etc.
This is less about storing in a database, and more about having a single (I imagine, XML-like) structure that can subsequently be parsed and eval'd by the target language to produce native source/bytecode. If there was such a thing, storing it in a database would be trivial -- that's not the hard part. Even a key/value database could handle that.
The hard part will be finding something that can abstract away the nuances of multiple languages and attempt to describe them in a common format.
Similar questions have already been asked, without satisfying solutions.
It may be that you don't need the full source, but instead just a description of the runtime data-- formats like XML and JSON are intended exactly for this purpose and provide a simplified description of Objects that can be parsed and mapped to native equivalents, with your source code built around the dynamic parsing of that data.
It may be possible to go further in certain languages. For example, if your language of choice converts to bytecode first, you might technically be able to store the binary bytecode in a BLOB and then run it directly. Languages that offer reflection and dynamic evaluation can probably handle this -- then your DB is simply a wrapper for storing that data on compilation, and retrieving it prior to running it. That'd depend on your target language and how compilation is handled.
Of course, if you're only working with interpreted languages, you can simply store the full source and eval it (in whatever manner is preferred by the target language).
If you give more info on your intended use case, I'm sure you'll get some decent suggestions on how to handle it without having to invent a sourcecode Babelfish.

What markup language to store in a DB?

Related: How to store lightweight formatting (Textile, Markdown) in database?
I want to store comment formatting in some markup language in our DB. However, we want to allow multiple formatting languages (markdown, textile, restructuredText). It seems we should store a superset of their features, so that we can convert between them.
Will this work?
Is there such a superset?
Are there libraries to switch between them?
Is there a more structured format we should keep comments in, in the DB?
(Python/Google App Engine if it matters)
Have you considered something simpler: storing the comments in their original form, together with an extra column saying which format it is stored in (markdown, textile, etc...)?
I would think that any superset is either going to result in some loss of information by storing only one of the many possible different ways the syntax can be written in a specific markup or else it will be too complicated as it tries to allow for all the possible encodings of a specific syntax in all the allowable markups.

Tagging content system - with i18n

The Idea is to have a tagging system between Users and Content(images, videos, posts)
Kind of like the tagging system here on SO with questions.
I like the achievements system on SO, meaning that after a certain amount of points
a user can start making his/her own tags. Same Idea for my system
My current table design looks like
Tag UserTag User
--- ------- ----
tag_id user_id user_id
tag_name tag_id username
usage_count ....
It brings me to this question.
Q How can you have a tagging system for content in different languages.
Yet at the same time be able to search for the same content with tags in different languages.
Have auto-complete with different languages for the same tag
When i use autocomplete I search for tag names like the characters the user is typing.
E.g. I have a tag named "nightclub" in English
yet in French if they were tagging that the translation would be "discothèque"
Or is there no way of doing this, and just let people make tags in different languages.
Yes you can. But be aware that some words in one language may have several translations in others.
You may have a languages table, a tags table with only a tag_id, and a many to many table with language_id, tag_id, tag_name.
Like I said previously, you might run into problems when people want to make refinements that their own language allows, but other languages can't. To stay in the french example, talking about bread, you may have 'baguette', 'flûte', 'recuit', 'demi-recuit', etc. tags, whereas the english would merely have a 'bread' tag. The mapping between the tags in then significantly complicated. but that's a general translation problem, not only in programming realm.
Regarding your comment : a compromise would be to add a "tag_related_to_tag" table, allowing to make couplings between tags. Users could tell which tag is related to which other in a different language. This would allow the maximum flexibility with the minimum of complexity, but would need some administration (otherwise you might have evil users making very unexpected relationships between tags, breaking the usefulness of the system).
That's something I actually was thinking to implement for a website which has a very narrow field (stoic philosophy) and target public. If the field is too broad, it might be very ineffective.
Interesting question! Just some thoughts (not intended a a complete solution, it's more a set of questions):
A straightforward approach is having an internal tag ID and for each language a localized name.
If no localized name was created yet, you may need to fall back to the tag name in a 'primary' language - usually english - or the language the tag was created in.
Translation needs to be done by a user ho knows both languages, automatic translations are IMO to imprecise. So probably a user right (bound to rewards?) to rename tags.
Are all languages equal, or are tags only created in a "primary language" understood by most users, and translations added separately? (The latter looks less fair, but would probably make some things easier)
You need an ability to merge tags - e.g. when users independently created "discothèque" and "nightclub".
Do I see only the tags that are available in my language, or can I see tags available in other languages that don't have translations to my language? Can I search for tags in other languages?
Is the tag name included in a query string? Will my german query link work when I send it to a friend in the US?
How to resolve disputes regarding the tag meaning? Example: The closest translation to german is "Nachtclub" for night club, and "Diskothek" for "discothèque". But in german, a "Nachtclub" is quite different from a "Diskothek" (though there is some overlap).

EMR (Electronic Medical Record) standard record format?

A few associates and myself are starting an EMR project (Electronic Medical Records). I have heard talk in the past - and more so lately - about a standard record format - to facilitate the transferring of records when appropriate (HIPAA) from one facility to another. Has anyone seen any information on this?
You can look to HL7 for interoperability between systems (http://www.hl7.org/). Patient demographic information and textual notes can be passed. I've been out of the EMR space too long to know if any standards groups have done anything interesting of late. A standard format that maintains semantic meaning is a really, really difficult problem. See SnoMed (http://www.nlm.nih.gov/research/umls/Snomed/snomed_main.html) for one long-running ontology effort -- barely the start of a rich interchange format.
A word of warning from someone who spent several years with an upstart EMR vendor...This is a very hard business to be in. Sales cycles for large health systems literally can take years, and the amount of hand-holding required for smaller medical practices can quickly erode margins. Integration with existing practice management systems is non-standard, even if those vendors claim otherwise. More and more issues abound. I'm not sure that it's a wise space for an unfunded start-up to enter.
I think it's an error to consider HL7 to be a standard in the sense you seem to mean. It is heavily customized and can be quite different from one customer to the next. It's one of those standards with too much flexibility.
I recommend you read the standard (which should take you a while), then try to find a community of developers working with the standard. Ask them for horror stories, and be prepared for what you'll hear.
A month late, but...
The standard to shoot for is definitely HL7. It is used in many fields, so is highly customizable but there is a well defined standard for healthcare. Each message (ACK, DSR MCF), segment (PID, PV1, OBR, MSH, etc), sequence and event type (A08, A12, A36) has a specific meaning regardless of your system of choice.
We haven't had a problem interfacing MiSYS, Statlan, Oacis, Epic, MUSE, GE Centricity/Lastword and others sending DICOM, ADT, PACS information between the systems we have in use. Most of these systems will be set up with an interface engine to tweak messages where needed, so adding a way to filter HL7 messages as they come through to your system, and as they go out to the downstreams, would be a must.
Even if there would be a new "presidential standard" for interoperability, and I would hazard a guess that it will be HL7 anyway, I would build the system with HL7 messaging as this is currently the industry accepted standard.
While solving interoperability, you shouldn't care only about the interchange format, the local storage formats should be standardized also, to simplify the transformation to the interchange format and vice versa.
openEHR is a great format for storage, it is more expressive than HL7 v2, v3 and CDA, so it can be transformed easily to any of those. The specs are open and here: http://openehr.org/programs/specification/releases/1.0.2
For the interchange format, any of HL7 v2, v3 and CDA are good. Also consider CCR and CCD.
http://www.aafp.org/practice-management/health-it/astm.html
If you want to go outside HL7 thinking and are looking for an comprehensive EMR or EHR with a specified record format rather than a record extract message interchange format, then have a look at openEHR, http://www.openehr.org/. The ISO 13606 extract standard is (almost) a subset of openEHR. You will also find open source reference libraries and openEHR implementations of different maturity available in Java, .NET, Ruby, Python, Groovy etc.
Some organisations are also producing HL7 artifacts like CDA as output from openEHR based EHR/EMR systems.
Have a look at the Continuity of Care Record--IIRC, that's what Google Health uses for input. It's not an HL7-family standard (there's a competing HL7-family standard--don't recall what it's called off-top).
There likely will not be a standard medical record format until the government dictates the format of one and requires its use by force of law.
That almost assuredly will not happen without socialized national health care. So in reality zero chance.
its correct answer but i think some add about meaningful use of emr..... Officials Announce ‘Meaningful Use,’ EHR Certification Criteria
Last week, CMS released proposed regulations defining the “meaningful use” of electronic health records, Reuters reports (Wutkowski/Heavey, Reuters, 12/31/09).
In addition, the Office of the National Coordinator for Health IT released an interim final rule describing the required certification standards for EHR technology (Simmons, HealthLeaders Media, 12/31/09).
Under the 2009 federal economic stimulus package, health care providers who demonstrate meaningful use of certified EHRs will qualify for incentive payments through Medicaid and Medicare.
Officials will offer a 60-day public comment period after both regulations are published in the Federal Register on Jan. 13. The interim final rule on EHR certification is scheduled to take effect 30 days after publication (Goedert, Health Data Management, 12/30/09). http://www.myemrstimulus.com/
This is a very hard problem because data collection starts with an MD and the only coding they know (ICD and CPT) is all about billing, not anything likely to be of use between providers (esp. in a form where the MD can be held legally liable). And they hate even that much paperwork.
Add to that the fact that HIPAA dictates that the patient not the provider owns the data. Not that they could understand it or do anything useful with it if they had it.
Good luck. Whatever happens will result from coercion by the govt and be a long long time coming IMHO.
Interestingly the one source of solid medical info turns out to be the VA (because they don't have the adversarial issues of payment and legal liability.) Go figure. That might be a good place to start for a standard with any existing data and some momentum, though. Here's another question with some info.

Resources