Migrate stored procedure on SQL Server to HPL/SQL (Hadoop ecosystem) - sql-server

I have a project which required migrating all the stored procedure from SQL Server to Hadoop ecosystem.
So the main point makes me concerned that if HPL/SQL is terminated or not up-to-date as in http://www.hplsql.org/new. It shows latest updated features HPL/SQL 0.3.31-September 2011,2017
Has anyone been using this open source tool and this kind of migration is feasible basing on your experience? Very highly appreciated your sharing.

I am facing the same issue of migrating a huge amount of stored procedures from traditional RDBMS to Hadoop.
Firstly, this project is still active. It is included in Apache Hive since version 2.0 and that's why it stopped releasing individually since September 2017. You could download the latest version of HPL/SQL from the hive repo.
You may have a look at the git history for new features and bug fixes.

Related

Is it possible to upgrade directly from Datastage 7.5.3 to 11.7.1?

We are migrating from Datastage 7.5.3 to 11.7.1. I was wondering whether we need to upgrade to an intermediate version of Datastage? Is there any conversion tool available? Any inputs from people who have experience in a similar upgrade are appreciated. Thanks
There is no option for in-place upgrade from DataStage v7 directly to Information Server v11.
You will need to install Information Server 11.7.1 (either to same machine in side-by-side config if machine has enough resources for both environments, or to a new server). You can then export all of your existing DataStage jobs in v7 environment to dsx file that you can then import into the new environment.
More information on migration steps can be found here:
https://www.ibm.com/support/knowledgecenter/SSZJPZ_11.7.0/com.ibm.swg.im.iis.productization.iisinfsv.migrate.doc/topics/top_of_map.html
Though this document does not list specific steps for DataStage v7.5, the steps for DataStage v8 are equivalent as long as you export jobs as dsx files since istool did not exist in DataStage v7.
There have been many changes to DataStage between versions 7.5 and 11.7 which you need to be aware of when moving jobs from old release to new release. We have documented these changes for DataStage 8.5, 8.7, 9.1 and 11.3 releases. Since you are jumping past all these releases, all the documents are relevant and I will link them below and HIGHLY recommend reviewing them as they can affect behavior of jobs and also result in errors. In some cases we document in these technotes environment variables that can be set which will switch back to the old behavior.
Additionally, in the last few releases a number of the older enterprise database stages for various database vendors have been deprecated in favor of using newer "Connector" stages that did not exist in v7.5. For example, DB2 Enterprise stages should be upgraded to DB2 Connector, Oracle stages to Oracle connector, etc.
We have a client tool, the Connector Migration tool which can be used to create new version of job with the older stages automatically converted to connector stages (you will still need to test the jobs).
Also, when exporting jobs from v7.5, export design only...all jobs need to be recompiled at new release level so exporting executable is waste of space in this case.
If you do have a need to also move hash files and datasets to new systems, there are technotes on IBM.com that dicuss how to do that, though I cannot guarantee the format of datasets have not changed between 7.5 and 11.7.
You will find that in more recent releases we have tightened error checking such that things which only received warnings in past may now be flagged as errors, or conditions not reported at all may now be warnings. Examples of this include changes to null handling, such as when field in source stage is nullable but target stage/database has field as not nullable. Also there are new warnings or errors for truncation and type mismatch (some of those warnings can be turned off by properties in the new connector stages)
Here are the recommended technotes to review:
Null Handling in a transformer for Information Server DataStage Version 8.5 or higher
https://www.ibm.com/support/pages/node/433863
Information Server Version 8.7 Compatibility
https://www.ibm.com/support/pages/node/435721
InfoSphere DataStage and QualityStage, Version 9.1 Job Compatibility
https://www.ibm.com/support/pages/node/221733
InfoSphere Information Server, Version 11.3 job compatibility
https://www.ibm.com/support/pages/node/514671
DataStage Parallel framework changes may require DataStage job modifications
https://www.ibm.com/support/pages/node/414877
Product manual documentation on deprecated database stages and link to Connector Migration Tool:
https://www.ibm.com/support/knowledgecenter/en/SSZJPZ_11.7.0/com.ibm.swg.im.iis.conn.migtool.doc/topics/removal_stages_palette.html
Thanks.

Database Migration with the flyway or dbup(.net library/dbup extension) with PostgreSQL

First of all, I am sorry because it might be a stupid question but after a day research I am confused and I have a very less time to decide.
We are using TFS as a CI tool and as an SCM. And Postgresql for DB.
Planning to Automate DB with Postgresql and TFS.
Please suggest a tool for this that I can go forward with running my SQL files on specific DB as I want.
Can anyone please tell me if I use DbUp Migration Extension of TFS is it supporting Postgresql? As this link shows it only works with Microsoft SQL Server or Microsoft SQL Azure and then another Document says DbUp supports a number of different databases including Postgresql.
also, Does Flyway have support for c# and TFS ?
Most popular tools to do what you want is Liquibase and Flyway.
As I know there is only one significant difference: Flyway - plain SQL based, Liquibase - provides abstraction level based on XML, JSON or YAML as well as plain SQL. You can use abstractions (provided by Liquibase) to increase portability of your scripts.

Lightweight ETL or database Sync - Sybase to SQL Server

I have been doing some investigations into some light weight database Syncing tools to trial. The initial task we want to perform is a simple data sync from a few tables on a Sybase ASE database (15) to a SQL Server database (2008 R2). Timing wise, I'd like to keep my options open, but ultimately, I would like to have the ability to sync every minute or less.
I have been looking at SymmetricDS, which at face value seems to do exactly what I want it to. The drama is I have hit a couple of roadblocks on the Sybase side of things, which is proving to be very frustrating (Jumpmind support are assisting). It appears that Java has a problem with the default collation we have on our server, being HP-roman8. Unfortunately, to change this charset is way bigger then this project itself.
I have also started investigating Talend, but have hit a few roadblocks in relation to requiring older versions of drivers for Sybase and downgrading the installed version of Java.
Without having to go to Replication Server, does anyone have any suggestions on a relatively lightweight ETL or database Syncing tool that will do what I want? The biggest gotcha thus far is Sybase support - I really need something that will seamlessly work without having to hack too much.
Cheers
You should try uniVocity. It is a Java-based ETL framework that certainly can help you do what you need. You can use any JDBC driver, define your mappings with a few lines of code and have this working faster than a traditional ETL tool.
Have a read through its tutorial and also check out a few sample projects here
Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).

How can Flyway respect version control and the SDLC?

We are thinking of integrating Flyway in our application but are concerned about the way it maintains its own versions and how that works with the Software development life cycle (SDLC).
In essence our problem with the approach is that you are maintaining a set of SQL scripts separated by version in the file name instead of maintaining a trunk in version control and releasing/tagging that trunk as a specific version. With Flyway a developer could go back and change an old migration script that relates to a released version of your application and break a version you've already integrated/tested/staged and shipped to a production environment.
What we are considering doing is maintaining the SQL migrations in a project under version control (i.e. my-app-db/trunk/migration.sql) and releasing/tagging from there when a SQL developer is stating it is ready as a release (V1.0.0__blah.sql). The trunk/migration.sql is then wiped out so that the next 1.0.1 or 1.1.0 script can be developed and tagged. A wrapper script will then export the SQL files from the tags, call Flyway with that directory to perform the migration, and clean up the export.
Does this seem like a valid point/approach? Will Flyway ever support something like version control?
Flyway 3.0 will open APIs that will make it possible for end users to extend it in this direction. Out of the box support for SCM integration is currently not on the agenda.

How do you manage your sqlserver database projects for new builds and migrations?

How do you manage your sql server database build/deploy/migrate for visual studio projects?
We have a product that includes a reasonable database part (~100 tables, ~500 procs/functions/views), so we need to be able to deploy new databases of the current version as well as upgrade older databases up to the current version. Currently we maintain separate scripts for creation of new databases and migration between versions. Clearly not ideal, but how is anyone else dealing with this?
This is complicated for us by having many customers who each have their own db instance, rather than say just having dev/test/live instances on our own web servers, but the processes around managing dev/test/live for others must be similar.
UPDATE: I'd prefer not to use any proprietary products like RedGate's (although I have always heard they're really good and will look into that as a solution).
We use Red-Gate SQLCompare and SQLDataCompare to handle this. The idea is simple. Both compare products let you maintain a complete image of the schema or data from selected tables (e.g. configuration tables) as scripts. You can then compare any database to the scripts and get a change script. We keep the scripts in our Mercurial source control and tag (label) each release. Support can then go get the script for any version and use the Redgate tools to either create from scratch or upgrade.
Redgate also has an API product that allows you to do the compare function from your code. For example, this would allow you to have an automatic upgrade function in your installer or in the product itself. We often use this for our hosted web apps as it allows us to more fully automate the rollout process. In our case, we have an MSBuild task that support can execute to do an automatic rollout and upgrade. If you distribute to third-parties, you have to pay a small additional license fee for each distribution that includes the API.
Redgate also has a tool that automatically packages a database install or upgrade. We don't use that one as we have found that the compare against scripts for a version gives us more flexibility.
The Redgate tools also help us in development because they make it trivial to source control the schema and configuration data in a very granular way (each database object can be placed in its own file)
The question was asked before SSDT projects appeared, but that's definitely the way I'd go nowadays, along with hand-crafting migration scripts for structural db changes where there is data that would be affected.
There's also the MS VSTS method (2008 description here), anyone got a good article on doing this with 2010 and the pros/cons of using these tools?

Resources