I want to create a new data type and new operators in PostgreSQL.
I saw that in the documentation, it is possible to incorporate new source files in C (for example) and create a new data type and operators. PostgreSQL is extensible in that direction. More information at: documentation
But also the PostgreSQL has open source, and I could alter the source code and add a new data type, compiling a new version.
With that, I want to know what the differences, advantages and disadvantages of each method of including a new data type in PostgreSQL. I'm very concerned about the performance in query processing.
Thank you.
If you modify PostgreSQL you have to maintain the whole code-base, and you have to do your patching every time you want to upgrade even between minor versions. If you make an extension you only have your little extension to maintain. And it's also much more easy to distribute a small extension program if you ever want to do that.
I fully agree with Jachim's answer.
Another thing is:
Developing your own C-Language extension in PostgreSQL is (rather) well documented - simple programs can be done by compiling just one function of code and writing the corresponding functions. Adding a custom datatype is a bit more complicated, but still doable. The extension I developed was even written in C++ with just a bit of wrapper glue between PostgreSQL's plain C - that made developing much more flexible.
Altering the PostgreSQL core however is more complicated in terms of how do you start and what do you do. And in the end, you archive the same.
To sum it up: C-Language functions gives you all the advantages:
High performance by utilising PostgreSQLs internal datatypes
Simple programming interface
Just a small bit of code with a documented and presumably very stable function interface
I can not see any advantages of altering PostgreSQLs core, but many disadvantages:
Long compilation times
Maintaining your own code branch and regularly reapplying your patch to the current release
Higher risk of bugs.
If you need an example of a lot of different ways to use the C-Language interface, have a look at the PostGis source code - they use nearly all function types and have a lot of fancy tricks in their code.
There are no differences between internal operators and datatypes and custom operators and datatypes - and then it has same performance. Outer and inner implementation respects same rules and patters. So there is not reason for hacking Postgres for it.
We have patched PostgreSQL in GoodData - and we have a own extensions too - everywhere where it is possible and practical, we use a custom extensions - and where it is not possible, we use a own hacks - backports from 9.2, 9.3, some enhancing pg_dump and psql, statistics - but we have a active PostgreSQL's hackers in company. It is not usual. For users without experience with PostgreSQL hacking are creating extensions safe and good performance solution.
Related
I've been working with yocto for a while. The job involves adding a layer on top of meta-yogurt, and thus adding new or modifying existing recipes in this layer. It has been working out so far.
But through struggling with recipes/rules syntax and relationships between them, my gut told me that it would be much easier for the internal implementation of the build system, as well as how the build system interacts with end-developer like me if a relational database had been involved, which defines things, their attributes and their relationships.
But I am no yocto expert (and cannot spend time on it, considering my project schedule) to evaluate a re-design of yocto with a relational DB. Just wonder if this also makes sense to anybody?
Speaking as one of the developers who's worked extensively on bitbake whilst I can see why you might think this, it doesn't really make sense for many reasons, not least performance. We once did try using sqlite as a data store backend within bitbake and it was orders of magnitude too slow to be useful compared to our python data store.
The flexibility in the metadata also comes from things like the ability to inject python code and with a database engine, this wouldn't be easily possible.
So in summary, I doubt bitbake or the Yocto Project will ever use relational databases in this way.
I tried to post a comment to Richard Purdie answer from Oct 7 '17 but the comment box didn't allow line break and long text...
I was not aware that there was already a python data store besides yocto objects and rules (mixture of variables, string literals, reg.exp., file types eg. layer, bb, bbappend, conf, inc etc and the "half-explicit", "half-documented" syntax and relationship among them :().
Of course in python environment, it makes most sense to use the combination of its data store and language binding and not another relational db. But I imagine with a relational db, every things such as data store and rules can be implemented in one place, querying/modifying is also an integrated part, binding to different languages / scripting should be possible. Was it actually the same, just the "python way" that I am not familiar with?
When it goes to performance, we already needed a powerful machine (Xeon E5 16G DRAM 1T HDD) to clean build one distro (without fetching sources from remote servers) in an hour and believe other people also use their best available PC for such a serious job as building a distro. So without comparing performance of different db engines and programming languages, which can't be different in great magnitude, I would just forget the performance factor.
Injecting code into a database engine might be difficult. But I guess when using a database engine, just a different programming pattern does the job: the logic of the job is implemented outside the db engine and this process contacts the db engine when needed.
I've been actually not working with yocto for awhile because our yocto build setup actually has been working great and producing new distros for us now and then with the previously prepared layer. Just some thoughts ...
I'm taking a database course and I have to write a command line application. The prof wants us to write an ESQL (embed SQL) application.
I have a feeling that this kind of technology is depreciated.
We have to use oracle precompiler to translate a esql code in c++. This kind of applications look terrible to maintain.
A php application would also work well, but they probably want a command line application to do the grading faster (unit test with input feed). What you guys think, is Embed SQL used in the industry, does it worth to ask the prof to do a java application ? Is there another technology more appropriate ?
Embedded SQL was one of the the most popular way to do SQL in C during the "old days" (C++ was not yet invented).
These days mostly we'll be using an ORM library. It is not recommended to do embedded SQL any more because, as you put it well, it depends on a proprietary pre-processor and makes code difficult to debug, manage, and maintain. It also hooks you to one single database vendor and your code will be extremely difficult to move to another database backend. Generally, we don't do it in "real life".
But as it is only a class, your prof is probably interested in teaching you SQL and database concepts. Embedded SQL is only a tool. You're supposed to learn SQL and databases, not embedded SQL in C++.
However, I believe that you're missing the point by asking about PHP and Java. Not to mention that PHP is a scripting language, and Java is another language that you can (potentially) write a processor for embedded SQL.
So your point about embedded SQL really has nothing with language choices. It has to do with the tradeoffs and balance between (1) proprietary embedded system with preprocessor, (2) using an ORM library, or a data-access library (e.g. ODBC).
Off-Topic:
I first started using embedded SQL when I was in College (that was about 30 years ago!). Actually got programming jobs out of College and still used it, but obviously it was on the way out. Never seen it used ever since 1990 or so.
Yes, but no. I have not met a single line of Embedded SQL in my 10 years in the field. I would say (and hope) this technology only exists in (some) legacy systems.
Nowadays, database related development in the industry would involve:
Direct database access using JDBC, ADO .NET, OLE DB, ODBC or native libraries (OCCI in your case).
Some sort of ORM (Hibernate, Entity Framework or a home made solution).
Some sort of data access layer based on frameworks and/or patterns (think Ruby on Rails, Active Record or a home made solution).
IMHO, home made solutions should be eradicated but they are more common than you would think. Part of this would certainly have to do with students having only experimented with outdated and inadapted tools at school...
ORM (and data access layers) related problems can be very complex and I would say very interesting to have a look at. Especially if you are a student. I would recommend delving into Martin Fowler's P of EAA.
In C++, I would have a look at SOCI.
We have to maintain on an old system here (20 years and older).
ESQL is used massively in here. The most of the problems we had while moving the software to a new OS (it was an 15 year old hpux) where with the ESQL code.
The new software we are writing are all making use of the C++ library. This gives us more readable code + our IDE doesn't say 'invalid syntax' all the time. etc..
The C++ library is in general terms very equal as how I connect to a database in .NET or Java.
Using the C++ library whe have an improvement of speed (if used wisely) and much less errors.
ESQL is deprecated by my point of view. But since we have entered an time where a lot of the written software is to update/upgrade or maintain existing systems, it is very handy to have basic knowledge of old techniques!
I haven't seen embedded SQL in an application for 10 years. The last time I saw it was in a legacy mainframe app written in COBOL. Yes, still being used at electrical utility company.
The little C++ programming I do these days doesn't involve SQL. These days most relational DB programming I encounter is one of these:
ORM (object relational mapping - hibernate or JPA)
JDBC
stored procedures (oracle or mySQL)
While this is probably outdated ( I also did ESQL ~15-20 years ago), it still may serve as a good example on how to also approach things - even if it is only for you to more enjoy ORM afterwards.
Also from my understanding LINQ in .NET is somewhat similar from the idea of embedding SQL in the host language - and LINQ seems to be quite popular.
Extracting from this to broader CS, embedded DSLs seem to be a current topic of research, so the example of ESQL as an early version my not be too far fetched from todays world.
ESQL is the prime language being heavily propagated for IBM Middleware products . Its not an Object Oriented Language but a Procedural language . Its extensively used in some places to do mapping between XMLs (alias for XSLTs) .
I am using ESql as of date on a Informix 9.x database in the legacy C++ application code that I work on as a part of my job.
While I agree to everyone that it is an old technique, and there are better options out there, I would still say it is a very neat technique. The good part is the SQL is embedded as a part of the C/C++ code flow, syntax wise and logic wise. The little syntax change that ESql carries is easy to learn, and hence I say its fun to use it.
Like Heiko mentioned, LINQ is close in the idea to ESql.
I was thinking of starting a project that very clearly needs a persistent store. I was about to reluctantly decide on a RDBMS, when I came across an article which briefly mentions CouchDB. Seems some advancements in DB technology have happened since I last looked, so I thought I would ask here about databases before I got into it.
Here are my criteria. ( I list the criteria again at the end, so if you want to skip the explanations just scroll down. )
The project is open source and I will not be asking anything for it, so preferably the database is open source and free. Furthermore the software has to run on both Linux and Windows.
There are parts of the project that have to be in C++. The project is not large enough code wise to justify using a second language. So basically the whole thing will be C++.
This project will not have anything to do with the web, so preferably
the database will not require the detritus of a web library.
The objects I want to store fall into one of two categories: a basic object and a container object. The difference being objects which are containers will contain even more objects, ie: a parts of parts problem. I need a database that can handle such cases cleanly and efficiently.
I also expect the schema to evolve rapidly, at least initially. I alse suspect that some of the old data simply will not fit into the new schemas. So I would like to keep different versions of the schema around. Win possible, I would like to be able to transform data in one to schema into another schema.
For the application to work the way intended, people would have to exchange large chunks of database with each other. So I would want simple ways of importing and exporting data, which I could automate to some degree.
Finally it would be nice if the database could in someway be simulated in unit tests.
THose are my requirements. I have replicated them below to make it easier for people answering.
Thank you
Non Technical requirements
1. Open source preferably free.
2. Run on Windows and Linux
Has a C++ interface.
Is able to handle a non-web application, preferably without REST.
Can handle a "parts of parts" problem fairly well.
Can handle multiple indexes.
Has sort of concept of schema version, can handle multiple schema versions, and can migrate tables from one schema to another.
Should have a simple mechanism for move data from one instance of the database to another.
Preferably has some mechanism for testing.
HDF5 is a binary format which behaves like an hierarchical database. It has binding and libraries for C++ and python (I only use the latter) and it is used to store big amounts of data, like the ones produces in certain physics and astronomy experiments.
http://www.hdfgroup.org/HDF5/
I've looked at a few nosql databases some time ago (had an different requirement than than you though - needed it to be a standalone server). The ones that I remember as particularly interesting are Redis and Kyoto Cabinets. Have a look.
BTW, you don't mention any performance requirement. If so, have you considered SQLite? Simple, embedded, stable, and with the flexibility of SQL after all. With prepared statement the performance penalty of SQL should not be very high.
EDIT: ooops, just noticed that you asked this more than a year ago... Well, perhaps you can tell us what you've chosen :)
I have an experiment streaming up 1Mb/s of numeric data which needs to be stored for later processing.
It seems as easy to write directly into a database as to a CSV file and I would then have the ability to easily retrieve subsets or ranges.
I have experience of sqlite2 (when it only had text fields) and it seemed pretty much as fast as raw disk access.
Any opinions on the best current in-process DBMS for this application?
Sorry - should have added this is C++ intially on windows but cross platform is nice. Ideally the DB binary file format shoudl be cross platform.
If you only need to read/write the data, without any checking or manipulation done in database, then both should do it fine. Firebird's database file can be copied, as long as the system has the same endianess (i.e. you cannot copy the file between systems with Intel and PPC processors, but Intel-Intel is fine).
However, if you need to ever do anything with data, which is beyond simple read/write, then go with Firebird, as it is a full SQL server with all the 'enterprise' features like triggers, views, stored procedures, temporary tables, etc.
BTW, if you decide to give Firebird a try, I highly recommend you use IBPP library to access it. It is a very thin C++ wrapper around Firebird's C API. I has about 10 classes that encapsulate everything and it's dead-easy to use.
If all you want to do is store the numbers and be able to easily to range queries, you can just take any standard tree data structure you have available in STL and serialize it to disk. This may bite you in a cross-platform environment, especially if you are trying to go cross-architecture.
As far as more flexible/people-friendly solutions, sqlite3 is widely used, solid, stable,very nice all around.
BerkeleyDB has a number of good features for which one would use it, but none of them apply in this scenario, imho.
I'd say go with sqlite3 if you can accept the license agreement.
-D
Depends what language you are using. If it's C/C++, TCL, or PHP, SQLite is still among the best in the single-writer scenario. If you don't need SQL access, a berkeley DB-style library might be slightly faster, like Sleepycat or gdbm. With multiple writers you could consider a separate client/server solution but it doesn't sound like you need it. If you're using Java, hdqldb or derby (shipped with Sun's JVM under the "JavaDB" branding) seem to be the solutions of choice.
You may also want to consider a numeric data file format that is specifically geared towards storing these types of large data sets. For example:
HDF -- the most common and well supported in many languages with free libraries. I highly recommend this.
CDF -- a similar format used by NASA (but useable by anyone).
NetCDF -- another similar format (the latest version is actually a stripped-down HDF5).
This link has some info about the differences between the above data set types:
http://nssdc.gsfc.nasa.gov/cdf/html/FAQ.html
I suspect that neither database will allow you to write data at such high speed. You can check this yourself to be sure. In my experience - SQLite failed to INSERT more then 1000 rows per second for a very simple table with a single integer primary key.
In case of a performance problem - I would use CSV format to write the files, and later I would load their data to the database (SQLite or Firebird) for further processing.
I'm investigating some of the newer technologies available in SQL Server 2005/2008. Most of my applications are written in C# and generally have a database component. Most of what I find on Google are the basic, 'This is how you set up a CLR UDT'. I have a few general questions on their real-world application and use.
Are CLR hosted UDTs commonly used in applications? Large or small scale
Are there performance concerns with using them?
Do DBAs generally prefer only using the built-in types?
They seem like a way to just cram an object into a table. Am I correct in assuming that the actual range of problems that have simplified solutions because of their use is minimal?
UDT's only have a benefit when they truely represent some fundamental item of your application. Almost always they can be broken down to in built types but the benefits are that you don't have to cast the returned object. So you probably shouldn't use UDT's that represent complex objects such as Employee, but something fundamental like Size or Location could be a good choice because it's light but very rigid in it's definition.
I investigated using them, but became uneasy over versioning. What happens if you upload a newer version of the assembly onto the server? Because of dependencies, you'd have to drop the UDT and recreate it, but what then happens to the data? We stick with built-in types.