Alternative language to TSQL for data handling - database

If I wanted to create my own relational database with a modern language to replace TSQL, what language would that be? Or if I end up creating my own language, what features would I have to include to make it better than TSQL ?

Chris Date and (to a somewhat lesser extent) Hugh Darwen have spent >20 years trying to expose all the flaws, fallacies and mistakes of the SQL language.
All flaws and fallacies of the SQL language are also flaws and fallacies of any language that has the character combination "SQL" in its name, so it applies to TSQL too.
Hugh Darwen has also spent a signigicant effort trying to expose the flaws, fallacies and mistakes of the TSQL2 language (that is, the 1990's proposal for a new SQL standard that attempted to incorporate temporal features, also the proposal that eventually didn't make it to becoming a standard, and that is, nevertheless and despite all well-founded criticisms, still taken as the implementation basis for every implementation that calls itself "TSQL").
Read (no, I'l make that "study very very carefully") their writings and you'll have more "drawbacks" than you ever dreamed possible.
Study their most recent TTM book ("Databases, Types and the Relational Model") plus its forthcoming sequel (not yet published - alas) too, and you'll know everything that is foundational and prerequisite for the "true" next-generation database programming language.
You'll also have the answer to the following question that was asked in comment here : "Assume you can invent a new database from scratch, without worrying about standards. What language would you use?". Answer : D. Or, more precisely : a language that conforms to all the prescriptions/proscriptions for qualifying as a D.

Related

What language code should I use to support multiple languages?

I want to go with translation tables as described here in the third example.
This is not hard to implement but what I wonder is basically how do I want to encode those languages? I looked up the ISO 639-3 files that contain the latest languages and their codes but is it a good idea to include them all?
The Language table is supposed to provide all kinds of languages for Stores. Those stores are allowed to decide themselves which languages they want to support. However, my database is going to tell them how many languages there are.
So, is there a "common used" list of languages? I don't think that Facebook and/or Google really support 7866 languages which is the number of languages listed in ISO 639-3.
Or would I use the Language Culture Names codes like en-UK, en-US, de-AT, etc.?
If I understand your question, you're asking if you should make a reference table with 7866 rows, one for each language in ISO 639-3. I don't really see the downside as it's not going to incur much storage or performance cost, compared to just storing a subset.
The real question is which languages you want to actually translate and whether you want to support dialects. If you don't, you can just use languages like en, fr, etc, and you could just store those in the reference table if you wanted to (though the savings are minimal).
If you want dialects, then you would need en-us, en-gb etc. Since these should fall back to the non-dialect version (ie just en), it would be a good idea for your columns to distinguish between the language and the dialect. So you can store "en | us", "en | gb" (possibly "en | null" too) and it will be easy to provide translations as overrides, but with most of them falling back to a default dialect.

Convinience for postgresql C custom function Vs plpgsql

I state that my answer to the object question is Yes in my case is convinient but I ask here to the expert.
I developed a lot of plpgsql functions and just one in C but I already understood that the learning curve is definitely more sloped.
In may case I need a real developing language that plpgsql sometimes is not, but also I need performance otherwise I'd looked at python.
But here the question.
Mainly I need to retrieve data with some select and join, make elaboration on them, sametimes complex and return a table of data.
From the time of execution point of view is quicker a c function for this kind of use?
I apreciate any comment
luca
But here the question. Mainly I need to retrieve data with some select and join, make elaboration on them, sametimes complex and return a table of data.
I would go with pl/pgsql for this, as that's what it is designed for. In general, pl/pgsql performs very well within its problem domain, and I doubt you are likely to get significantly better performance by going with C. To the extent you can push your elaborations into the main query, all the better performance-wise.
This is assuming that your elaborations can be done with existing functions and not a huge amount of complex data manipulation (in particular, say, converting between datatypes, like arrays and sets). If that is the case, I would still put the main query and light manipulation in the pl/pgsql, and put the specific operations that need to be tuned in C. There are two reasons for doing this:
It means less C code, which means the C code is easier to read, follow, and prove correct.
It separates concerns so that you can use similar manipulations elsewhere.
There's a lot of performance tuning that has gone into pl/pgsql for its problem domain and reinventing all of that would be a lot of work both in development and testing. To the extent you can leverage tools that are already there you can get the performance you need with a lot less effort and a lot more in the way of guarantees.
EDIT
If you want to write PL/PGSQL code that performs well, you want to have it be a large main query with modest support logic. The more you can push into your query the better, and the more of your elaborations you can do in SQL (with possible C functions as mentioned above), the better. Not only does this mean better performance but it means better maintainability. As ArtemGr mentioned, certain operations are very expensive in PL/PGSQL. and in these cases you want to supplement with C code in order to get the performance you need.
I know C/C++ well and for me it's easier to write a PostgreSQL function in C++ than to learn the intricacies of pgSQL syntax and workaround its limitations. I'd say go with the language you (and the rest of your team) are more familiar with. C should be faster than pgSQL (and Tcl, Perl, Python) for complex data manipulation. Usually 5-10 times faster. Javascript (http://code.google.com/p/plv8js/) might be nearly as fast as C if it has a chance to spin it's JIT. Python code can actually use a Cython extension under the hood which might be nearly as fast as C.
You should probably measure how much time is spent in the data manipulation in question and relative to the time spent in the I/O before making a decision. In some domains C isn't faster, for example Tcl and Javascript has very good regular expression engines.

Relational Algebra instead of SQL

I am studying relational algebra these days and I was wondering...
Don't you thing it would be better if a compiler was existed which could compile relational algebra than compiling SQL?
In which case a database programmer would be more productive?
Is there any research on relational algebra compilers?
Thanks in advance
See Tutorial D by C J Date, he also has a good rant somewhere on the evils of SQL.
Also see datalog, although not exactly relational algebra, is similar.
On my school one student implemented relational algebra parser as a Bachelor thesis. You can test it here:
http://mufin.fi.muni.cz/projects/PA152/relalg/index.cgi
It's in czech language but I think you can get a point.
I tried to write some Relational Algebra queries and it was much better than equivalent queries in SQL! They were much shorter, simplier to write, more straightforward, more understandable. I really enjoyed to write them.
So I don't understand why we all are using SQL when there is Relational Algebra.
There is indeed research on compiling relational algebra
A good place to start:
Thomas Neumann: Efficiently Compiling Efficient Query Plans for Modern Hardware. PVLDB 4(9): 539-550 (2011)

How do I build a domain-specific query language?

I have a biology database that I would like to query. There is also a given terminology bank I have access to that has formalizable predicates. I would like to build a query language for this DB using the predicates mentioned. How would you go about it? My solution is the following:
formalize the predicates
translate into a query language (sql, sparql, depends)
Build a specific language with ANTLR or other such tools
Translate from 3 to 2.
Is this a valid approach? Are there better ones? Any pointers would be much appreciated.
Take a look at Booleano.
Use BNF to get a head-start into the language semantics..GoldParser will help you by playing around with the semantics and syntax (link here: http://www.devincook.com/). Once you have the BNF semantics sorted out, you can then build up actions based on the inputs, for example, a bnf grammar section dealing with extracting a composition of a limb's genetic makeup classification (I do not know if that is in existence, abstract example here but you get the gist) for a particular query...'fetch stats on limb where limb is leg', then behind the scenes you would issue a SQL select on a column alias or name from a predefined table ... I could be wrong on the approach... Hope it helps?
I suggest you take a look at the i2b2 framework, it's a graphical query language and query engine platform for patient databases.
It's probably hard to grasp all first but do take a look at the CRC cell or webservice in there, you'll see how they approached SQL generation from a clinical graphical query language in an interesting way (albeit, not so performance friendly :))
Consider using Irony.NET from here: Irony.NET

EMR (Electronic Medical Record) standard record format?

A few associates and myself are starting an EMR project (Electronic Medical Records). I have heard talk in the past - and more so lately - about a standard record format - to facilitate the transferring of records when appropriate (HIPAA) from one facility to another. Has anyone seen any information on this?
You can look to HL7 for interoperability between systems (http://www.hl7.org/). Patient demographic information and textual notes can be passed. I've been out of the EMR space too long to know if any standards groups have done anything interesting of late. A standard format that maintains semantic meaning is a really, really difficult problem. See SnoMed (http://www.nlm.nih.gov/research/umls/Snomed/snomed_main.html) for one long-running ontology effort -- barely the start of a rich interchange format.
A word of warning from someone who spent several years with an upstart EMR vendor...This is a very hard business to be in. Sales cycles for large health systems literally can take years, and the amount of hand-holding required for smaller medical practices can quickly erode margins. Integration with existing practice management systems is non-standard, even if those vendors claim otherwise. More and more issues abound. I'm not sure that it's a wise space for an unfunded start-up to enter.
I think it's an error to consider HL7 to be a standard in the sense you seem to mean. It is heavily customized and can be quite different from one customer to the next. It's one of those standards with too much flexibility.
I recommend you read the standard (which should take you a while), then try to find a community of developers working with the standard. Ask them for horror stories, and be prepared for what you'll hear.
A month late, but...
The standard to shoot for is definitely HL7. It is used in many fields, so is highly customizable but there is a well defined standard for healthcare. Each message (ACK, DSR MCF), segment (PID, PV1, OBR, MSH, etc), sequence and event type (A08, A12, A36) has a specific meaning regardless of your system of choice.
We haven't had a problem interfacing MiSYS, Statlan, Oacis, Epic, MUSE, GE Centricity/Lastword and others sending DICOM, ADT, PACS information between the systems we have in use. Most of these systems will be set up with an interface engine to tweak messages where needed, so adding a way to filter HL7 messages as they come through to your system, and as they go out to the downstreams, would be a must.
Even if there would be a new "presidential standard" for interoperability, and I would hazard a guess that it will be HL7 anyway, I would build the system with HL7 messaging as this is currently the industry accepted standard.
While solving interoperability, you shouldn't care only about the interchange format, the local storage formats should be standardized also, to simplify the transformation to the interchange format and vice versa.
openEHR is a great format for storage, it is more expressive than HL7 v2, v3 and CDA, so it can be transformed easily to any of those. The specs are open and here: http://openehr.org/programs/specification/releases/1.0.2
For the interchange format, any of HL7 v2, v3 and CDA are good. Also consider CCR and CCD.
http://www.aafp.org/practice-management/health-it/astm.html
If you want to go outside HL7 thinking and are looking for an comprehensive EMR or EHR with a specified record format rather than a record extract message interchange format, then have a look at openEHR, http://www.openehr.org/. The ISO 13606 extract standard is (almost) a subset of openEHR. You will also find open source reference libraries and openEHR implementations of different maturity available in Java, .NET, Ruby, Python, Groovy etc.
Some organisations are also producing HL7 artifacts like CDA as output from openEHR based EHR/EMR systems.
Have a look at the Continuity of Care Record--IIRC, that's what Google Health uses for input. It's not an HL7-family standard (there's a competing HL7-family standard--don't recall what it's called off-top).
There likely will not be a standard medical record format until the government dictates the format of one and requires its use by force of law.
That almost assuredly will not happen without socialized national health care. So in reality zero chance.
its correct answer but i think some add about meaningful use of emr..... Officials Announce ‘Meaningful Use,’ EHR Certification Criteria
Last week, CMS released proposed regulations defining the “meaningful use” of electronic health records, Reuters reports (Wutkowski/Heavey, Reuters, 12/31/09).
In addition, the Office of the National Coordinator for Health IT released an interim final rule describing the required certification standards for EHR technology (Simmons, HealthLeaders Media, 12/31/09).
Under the 2009 federal economic stimulus package, health care providers who demonstrate meaningful use of certified EHRs will qualify for incentive payments through Medicaid and Medicare.
Officials will offer a 60-day public comment period after both regulations are published in the Federal Register on Jan. 13. The interim final rule on EHR certification is scheduled to take effect 30 days after publication (Goedert, Health Data Management, 12/30/09). http://www.myemrstimulus.com/
This is a very hard problem because data collection starts with an MD and the only coding they know (ICD and CPT) is all about billing, not anything likely to be of use between providers (esp. in a form where the MD can be held legally liable). And they hate even that much paperwork.
Add to that the fact that HIPAA dictates that the patient not the provider owns the data. Not that they could understand it or do anything useful with it if they had it.
Good luck. Whatever happens will result from coercion by the govt and be a long long time coming IMHO.
Interestingly the one source of solid medical info turns out to be the VA (because they don't have the adversarial issues of payment and legal liability.) Go figure. That might be a good place to start for a standard with any existing data and some momentum, though. Here's another question with some info.

Resources