I have an application that uses PostgreSQL but also interacts with a third-party-controlled database on MSSQL. The data are sometimes tied together closely enough that it becomes desirable to do things like:
select thing_from_pg, thing_from_ms_crossover_function(thing_from_pg) -- etc
Currently I implement thing_from_ms_crossover_function in plperl. Is there a way to do this in plpgsql or something, so that I don't need to start a plperl interpreter for such cases?
Another option is obviously to access both databases from my client app, but that becomes far less convenient than the view syntax above.
You have two basic options, well three basic ones rather.
The first is to use DBI-Link and then access this via your pl/pgsql or pl/perl function. The nice thing about DBI-Link is that it is relatively older and mature. If it works for you I would start there.
The second option is to use foreign data wrappers.
The third option is to write a more general framework in something like pl/perl that you can call from pl/pgsql. However at that point you are basically looking at re-inventing DBI-Link so I think you are better off starting with DBI-Link and modifying it as needed.
Related
Imagine a program which operates large hierarhical datasets. The program stores each new such dataset in a dedicated table. The table is created accordingly to what data types the dataset has in it. Well, nothing very unusual. This is a trivial situation. But how do I make this kind of arrangements in Play 2.0, where the evolution paradigm rules? I just cannot start thinking of it.
UPDATE
It turned out, there is no simple way. Ok. The round way.
Is it possible to:
1) Make the program write the evolutions files itself and apply them automatically? Will it cause some distortion with Play's philosophy?
2) Use another DB system in a separate thread and do not use the Play's innate databsae functionality? Would that hurt much?
UPDATE 2
I am reading though MongoDB Casbah documentation and I like it a lot. I am planning to use this with my Play application. Is there any contra-evidence for using MongoDB via Casbah with Play?
Thst's good question. And there's no brilliant answer, unfortunately.
Generally evolutions are good and are desired when you work in group. In such case you should switch to manual evolutions (not these generated by Ebean, they are dangerous to your data in current state) and just put your initial DDL as big as possible with create statements.
In next evolutions you can create new tables or alter existing, but for god's sake do not try to create existing table :)
Other approach I was (or still) thinking about is using Ebean's auto-generated DDLs (which always assumes that your DB is empty) to generate differential schemas with some SQL schema migration tools (ie mybatis) but this is unfortunately additional effort required.
The last thing I sometimes use when I'm not sure about correct evolution syntax is small test-field app where you can add similar models and watch how Ebean's plugin will threat them. Unfortunately even this solution won't create proper alters, but it's better then testing on main app.
Well, after some more experiments, I have concluded to use MongoDB (actually, I had to choose from a wide variety of document-oriented DBMSs, and decided to start with MongoDB). I have established a MongoDB server, incorporated it's Java driver, Casbah (the driver's Scala-wrapper) and all the necessary dependencies into my project, and all works fine. No need for SQL or the evolutions paradigm, whatsoever.
And I am not using any parts of Play that work with database (the config file, anorm, and what's else is there), just ignoring that, and doing all Mongo.
All works JUST FINE!
I'm trying to figure out how to make implementation of android database Cursor to wrap "ORMed" database layer.
To have ORM in MonoDroid we can use sqlite-net project (very lightweight ORM) or ServiceStack.OrmLite
My thoughts are to implement ICursor interface and "wrap" ORM
For now I just can't set it in my mind how it should work, and should it work ever or not.
Should it load "framed" set of results, or fetch it one by one?
Which is better for performance, how to get column values - reflection or..?
So, actually question is: is it possible ever?
Any thoughts will be appreciated.
Thanks.
I'm not sure what "problem" you're trying to solve with an ICursor implementation, perhaps you should be a little more specific as to what specific task you're trying to do. The entire point of an ORM (and you missed this one that also supports SQLite on Android) is to abstract away the whole RDBMS paradigm from the code and give you an object-oriented paradigm instead.
An ICursor gives you back an updatable resultset from a SQL query - which means you have to know about rows, resultsets, queries and all of that. An ORM gives back an object, or a collection of objects. If you want to update one, you update the object and send it back to the ORM.
Now I fully admit that there are times when an ORM's might not provide the cleanest mechanism to do something that a SQL query might do well. For example, if you logically wanted to "delete all parts built yesterday during second shift". A lightweight ORM might give you all parts and then you have to use LINQ or similar to filter that to the right day and shift and then iterate that resulting collection to delete each, whereas with a SQL query you just pass in a DELETE FROM Parts WHERE BornONDate BETWEEN #start AND #end, but that's one of the trade-offs you face.
In some cases the ORM might provide a facility to do what you want. For example in the OpenNETCF ORM linked above, you can cast your DataStore (if it isn't already) to a SQLDataStore and then you have access to the ExecuteNonQuery method, allowing you to pass in a direct SQL statement. If still doesn't have a means to pass you back a record set because, as I said, returning database rows is really the antithesis or an ORM.
There's also some inherent risk in using something like ExecuteNonQuery. If you want to change your backing store, from say a RDBMS like SQLite to something totally different like an object database, an XML file or whatever, then your code that builds and uses a SQL query breaks. Admittedly this might not be common, but if code portability and extensibility and on your radar, then it's at least something to keep in mind.
For the first time in years I've been doing some T-SQL programming in SQL Server 2008 and had forgotten just how bad the language really is:
Flow control (all the begin/end stuff) feels clunky
Exception handling is poor. Exceptions dont bubble in the same way they do in every other language. There's no re-throwing unless you code it yourself and the raiserror function isn't even spelt correctly (this caused me some headaches!)
String handling is poor
The only sequence type is a table. I had to write a function to split a string based on a delimiter and had to store it in a table which had the string parts along with a value indicating there position in the sequence.
If you need to doa lookup in the stored proc then manipulating the results is painful. You either have to use cursors or hack together a while loop with a nested lookup if the results contain some sort of ordering column
I realize I could code up my stored procedures using C#, but this will require permissioning the server to allow CLR functions, which isn't an option at my workplace.
Does anyone know if there are any alternatives to T-SQL within SQL Server, or if there are any plans to introduce something. Surely there's got to be a more modern alternative...
PS: This isn't intended to start a flame-war, I'm genuinely interested in what the options are.
There is nothing wrong with T-SQL; it does the job it was intended for (except perhaps for the addition of control flow structures, but I digress!).
Perhaps take a look at LINQ? You can write CLR Stored Procedures, but I don't recommended this unless it is for some feature that's missing (or heavy string handling).
All other database stored procedure languages (PL/SQL, SQL/PSM) have about the same issues. Personally, i think these languages are exactly right for what they are intended to be used for - they are best used to do code data-driven logic, esp. if you want to reuse that for multiple applications.
So I guess my counter question to you is, why do you want your program to run as part of the database server process? Isn't what you're trying to do better solved at the application or middle-ware level? There you can take any language or data-processing tool of your choosing.
From My point of view only alternative to T-SQL within SQL Server is to NOT use SQL Server
According to your point handling stings whit delimiter ,
From where cames these strings ?
You could try Integration services and "ssis packages" for converting data from one to other.
Also there is nice way to access non SQL data over Linked Serves,
We have a database with ~100K business objects in it. Each object has about 40 properties which are stored amongst 15 tables. I have to get these objects, perform some transforms on them and then write them to a different database (with the same schema.)
This is ADO.Net 3.5, SQL Server 2005.
We have a library method to write a single property. It figures out which of the 15 tables the property goes into, creates and opens a connection, determines whether the property already exists and does an insert or update accordingly, and closes the connection.
My first pass at the program was to read an object from the source DB, perform the transform, and call the library routine on each of its 40 properties to write the object to the destination DB. Repeat 100,000 times. Obviously this is egregiously inefficent.
What are some good designs for handling this type of problem?
Thanks
This is exactly the sort of thing that SQL Server Integration Services (SSIS) is good for. It's documented in Books Online, same as SQL Server is.
Unfortunately, I would say that you need to forget your client-side library, and do it all in SQL.
How many times do you need to do this? If only once, and it can run unattended, I see no reason why you shouldn't reuse your existing client code. Automating the work of human beings is what computers are for. If it's inefficient, I know that sucks, but if you're going to do a week of work setting up a SSIS package, that's inefficient too. Plus, your client-side solution could contain business logic or validation code that you'd have to remember to carry over to SQL.
You might want to research Create_Assembly, moving your client code across the network to reside on your SQL box. This will avoid network latency, but could destabilize your SQL Server.
Bad news: you have many options
use flatfile transformations: Extract all the data into flatfiles, manipulate them using grep, awk, sed, c, perl into the required insert/update statements and execute those against the target database
PRO: Fast; CON: extremly ugly ... nightmare for maintanance, don't do this if you need this for longer then a week. And a couple dozens of executions
use pure sql: I don't know much about sql server, but I assume it has away to access one database from within the other, so one of the fastes ways to do this is to write it as a collection of 'insert / update / merge statements fed with select statements.
PRO: Fast, one technology only; CON: Requires direct connection between databases You might reach the limit of SQL or the available SQL knowledge pretty fast, depending on the kind of transformation.
use t-sql, or whatever iterative language the database provides, everything else is similiar to pure sql aproach.
PRO: pretty fast since you don't leave the database CON: I don't know t-sql, but if it is anything like PL/SQL it is not the nicest language to do complex transformation.
use a high level language (Java, C#, VB ...): You would load your data into proper business objects manipulate those and store them in the database. Pretty much what you seem to be doing right now, although it sounds there are better ORMs available, e.g. nhibernate
use a ETL Tool: There are special tools for extracting, transforming and loading data. They often support various databases. And have many strategies readily available for deciding if an update or insert is in place.
PRO: Sorry, you'll have to ask somebody else for that, I so far have nothing but bad experience with those tools.
CON: A highly specialized tool, that you need to master. I my personal experience: slower in implementation and execution of the transformation then handwritten SQL. A nightmare for maintainability, since everything is hidden away in proprietary repositories, so for IDE, Version Control, CI, Testing you are stuck with whatever the tool provider gives you, if any.
PRO: Even complex manipulations can be implemented in a clean maintainable way, you can use all the fancy tools like good IDEs, Testing Frameworks, CI Systems to support you while developing the transformation.
CON: It adds a lot of overhead (retrieving the data, out of the database, instanciating the objects, and marshalling the objects back into the target database. I'd go this way if it is a process that is going to be around for a long time.
Building on the last option you could further glorify the architectur by using messaging and webservices, which could be relevant if you have more then one source database, or more then one target database. Or you could manually implement a multithreaded transformer, in order to gain through put. But I guess I am leaving the scope of your question.
I'm with John, SSIS is the way to go for any repeatable process to import large amounts of data. It should be much faster than the 30 hours you are currently getting. You could also write pure t-sql code to do this if the two database are on the same server or are linked servers. If you go the t-sql route, you may need to do a hybrid of set-based and looping code to run on batches (of say 2000 records at a time) rather than lock up the table for the whole time a large insert would take.
I am working on a few PHP projects that use MVC frameworks, and while they all have different ways of retrieving objects from the database, it always seems that nothing beats writing your SQL queries by hand as far as speed and cutting down on the number of queries.
For example, one of my web projects (written by a junior developer) executes over 100 queries just to load the home page. The reason is that in one place, a method will load an object, but later on deeper in the code, it will load some other object(s) that are related to the first object.
This leads to the other part of the question which is what are people doing in situations where you have a table that in one part of the code only needs the values for a few columns, and another part needs something else? Right now (in the same project), there is one get() method for each object, and it does a "SELECT *" (or lists all the columns in the table explicitly) so that anytime you need the object for any reason, you get the whole thing.
So, in other words, you hear all the talk about how SELECT * is bad, but if you try to use a ORM class that comes with the framework, it wants to do just that usually. Are you stuck to choosing ORM with SELECT * vs writing the specific SQL queries by hand? It just seems to me that we're stuck between convenience and efficiency, and if I hand write the queries, if I add a column, I'm most likely going to have to add it to several places in the code.
Sorry for the long question, but I'm explaining the background to get some mindsets from other developers rather than maybe a specific solution. I know that we can always use something like Memcached, but I would rather optimize what we can before getting into that.
Thanks for any ideas.
First, assuming you are proficient at SQL and schema design, there are very few instances where any abstraction layer that removes you from the SQL statements will exceed the efficiency of writing the SQL by hand. More often than not, you will end up with suboptimal data access.
There's no excuse for 100 queries just to generate one web page.
Second, if you are using the Object Oriented features of PHP, you will have good abstractions for collections of objects, and the kinds of extended properties that map to SQL joins. But the important thing to keep in mind is to write the best abstracted objects you can, without regard to SQL strategies.
When I write PHP code this way, I always find that I'm able to map the data requirements for each web page to very few, very efficient SQL queries if my schema is proper and my classes are proper. And not only that, but my experience is that this is the simplest and fastest way to implement. Putting framework stuff in the middle between PHP classes and a good solid thin DAL (note: NOT embedded SQL or dbms calls) is the best example I can think of to illustrate the concept of "leaky abstractions".
I got a little lost with your question, but if you are looking for a way to do database access, you can do it couple of ways. Your MVC can use Zend framework that comes with database access abstractions, you can use that.
Also keep in mind that you should design your system well to ensure there is no contention in the database as your queries are all scattered across the php pages and may lock tables resulting in the overall web application deteriorating in performance and becoming slower over time.
That is why sometimes it is prefereable to use stored procedures as it is in one place and can be tuned when we need to, though other may argue that it is easier to debug if query statements are on the front-end.
No ORM framework will even get close to hand written SQL in terms of speed, although 100 queries seem unrealistic (and maybe you are exaggerating a bit) even if you have the creator of the ORM framework writing the code, it will always be far from the speed of good old SQL.
My advice is, look at the whole picture not only speed:
Does the framework improves code readability?
Is your team comfortable with writing SQL and mixing it with code?
Do you really understand how to optimize the framework queries? (I think a get() for each object is not the optimal way of retrieving them)
Do the queries (after optimization) of the framework present a bottleneck?
I've never developed anything with PHP, but I think that you could mix both approaches (ORM and plain SQL), maybe after a thorough profiling of the app you can determine the real bottlenecks and only then replace that ORM code for hand written SQL (Usually in ruby you use ActiveRecord, then you profile the application with something as new relic and finally if you have a complicated AR query you replace that for some SQL)
Regads
Trust your experience.
To not repeat yourself so much in the code you could write some simple model-functions with your own SQL. This is what I am doing all the time and I am happy with it.
Many of the "convenience" stuff was written for people who need magic because they cannot do it by hand or just don't have the experience.
And after all it's a question of style.
Don't hesitate to add your own layer or exchange or extend a given layer with your own stuff. Keep it clean and make a good design and some documentation so you feel home when you come back later.