SQL server to postgres migration : timestamp conversion issue - sql-server

I have a data dump of a table from a SQL server, and I would like to import it into Postgres database. The generated file contains dates like this:
CAST(0x00009A78007FD81B AS DateTime)
This is limiting me from doing manual modifications to the SQL! Apparently that is the only column that is holding this simple migration process!
Other answers on the web don't show a work around! Here are the links I came across
SqlServer to Postgres DDL migration/conversion
http://www.postgresonline.com/journal/archives/46-Setting-up-PostgreSQL-as-a-Linked-Server-in-Microsoft-SQL-Server-64-bit.html
https://wiki.postgresql.org/wiki/Microsoft_SQL_Server_to_PostgreSQL_Migration_by_Ian_Harding

Related

SSIS, query Oracle table using ID's from SQL Server?

Here's the basic idea of what I want to do in SSIS:
I have a large query against a production Oracle database, and I need the following where clause that brings in a long list of ids from SQL Server. From there, the results are sent elsewhere.
select ...
from Oracle_table(s) --multi-join
where id in ([select distinct id from SQL_SERVER_table])
Alternatively, I could write the query this way:
select ...
from Oracle_table(s) --multi-join
...
join SQL_SERVER_table sst on sst.ID = Oracle_table.ID
Here are my limitations:
The Oracle query is large and cannot be run without the where id in (... clause
This means I cannot run the Oracle query, then join it against the ids in another step. I tried this, and the DBA's killed the temp table after it became 3 TB in size.
I have 160k id's
This means it is not practical to iterate through the id's one by one. In the past, I have run against ~1000 IDs, using a comma-separated list. It runs relatively fast - a few minutes.
The main query is in Oracle, but the ids are in SQL Server
I do not have the ability to write to Oracle
I've found many questions like this.
None of the answers I have found have a solution to my limitations.
Similar question:
Query a database based on result of query from another database
To prevent loading all rows from the Oracle table. The only way is to apply the filter in the Oracle database engine. I don't think this can be achieved using SSIS since you have more than 160000 ids in the SQL Server table, which cannot be efficiently loaded and passed to the Oracle SQL command:
Using Lookups and Merge Join will require loading all data from the Oracle database
Retrieving data from SQL Server, building a comma-separated string, and passing it to the Oracle SQL command cannot be done with too many IDs (160K).
The same issue using a Script Task.
Creating a Linked Server in SQL Server and Joining both tables will load all data from the Oracle database.
To solve your problem, you should search for a way to create a link to the SQL Server database from the Oracle engine.
Oracle Heterogenous Services
I don't have much experience in Oracle databases. Still, after a small research, I found something in Oracle equivalent to "Linked Servers" in SQL Server called "heterogeneous connectivity".
The query syntax should look like this:
select *
from Oracle_table
where id in (select distinct id from SQL_SERVER_table#sqlserverdsn)
You can refer to the following step-by-step guides to read more on how to connect to SQL Server tables from Oracle:
What is Oracle equivalent for Linked Server and can you join with SQL Server?
Making a Connection from Oracle to SQL Server - 1
Making a Connection from Oracle to SQL Server - 2
Heterogeneous Database connections - Oracle to SQL Server
Importing Data from SQL Server to a staging table in Oracle
Another approach is to use a Data Flow Task that imports IDs from SQL Server to a staging table in Oracle. Then use the staging table in your Oracle query. It would be better to create an index on the staging table. (If you do not have permission to write to the Oracle database, try to get permission to a separate staging database.)
Example of exporting data from SQL Server to Oracle:
Export SQL Server Data to Oracle using SSIS
Minimizing the data load from the Oracle table
If none of the solutions above solves your issue. You can try minimizing the data loaded from the Oracle database as much as possible.
As an example, you can try to get the Minimum and Maximum IDs from the SQL Server table, store both values within two variables. Then, you can use both variables in the SQL Command that loads the data from the Oracle table, like the following:
SELECT * FROM Oracle_Table WHERE ID > #MinID and ID < #MaxID
This will remove a bunch of useless data in your operation. In case your ID column is a string, you can use other measures to filter data, such as the string length, the first character.

Oracle to SQLServer export

I have to move data from existing database oracle to which I don't have direct access. The data is about 11 tables, 5GB each. The database admin can export the tables to some .csv or xml. The problem with csv is that some data is textual with lots of special characters. The problem with xml is that the markup is an overhead which will increase significantly the size of the files. The DBA admin is not competent enough to provide a working and neat solution. He uses toad as the database tool. Can you provide some ideas how to perform such a migration in the best possible way?
Please refer the below steps to migrate the data from Oracle to SQL server.
Recommended Migration Process
To successfully migrate objects and data from Oracle databases to SQL Server, Azure SQL DB, or Azure SQL Data Warehouse, use the following process:
1.Create a new SSMA project.
2.After you create the project, you can set project conversion, migration, and type mapping options. For information about project settings, see Setting Project Options (OracleToSQL). For information about how to customize data type mappings, see Mapping Oracle and SQL Server Data Types (OracleToSQL).
3.Connect to the Oracle database server.
4.Connect to an instance of SQL Server.
5.Map Oracle database schemas to SQL Server database schemas.
6.Optionally, Create assessment reports to assess database objects for conversion and estimate the conversion time.
7.Convert Oracle database schemas into SQL Server schemas.
8.Load the converted database objects into SQL Server.
You can do this in one of the following ways:
* Save a script and run it in SQL Server.
* Synchronize the database objects.
9. Migrate data to SQL Server.
10.If necessary, update database applications.
For more details :
[https://learn.microsoft.com/en-us/sql/ssma/oracle/migrating-oracle-databases-to-sql-server-oracletosql?view=sql-server-2017]
After the admin export data into CSV, try to convert it into a character set which will recognize all special characters.
Then, try to follow the steps from this link: link, it might work.
If after the import, there are still special characters, thy to manually convert them.
Get the DBA to export the tables using the ASCII delimiters which were designed for this purpose:
Row delimiter: Decimal 30 / 0x1E
Column delimiter: Decimal 31 / 0x1F
Then you can use BCP (or any other similar product) to upload the data to SQL Server.

Migrate PostgreSQL database into MS SQL Server [duplicate]

I have a PostgreSQL database that I want to move to SQL Server -- both schema and data. I am poor so I don't want to pay any money. I am also lazy, so I don't want to do very much work. Currently I'm doing this table by table, and there are about 100 tables to do. This is extremely tedious.
Is there some sort of trick that does what I want?
You should be able to find some useful information in the accepted answer in this Serverfault page: https://serverfault.com/questions/65407/best-tool-to-migrate-a-postgresql-database-to-ms-sql-2005.
If you can get the schema converted without the data, you may be able to shorten the steps for the data by using this command:
pg_dump --data-only --column-inserts your_db_name > data_load_script.sql
This load will be quite slow, but the --column-inserts option generates the most generic INSERT statements possible for each row of data and should be compatible.
EDIT: Suggestions on converting the schema follows:
I would start by dumping the schema, but removing anything that has to do with ownership or permissions. This should be enough:
pg_dump --schema-only --no-owner --no-privileges your_db_name > schema_create_script.sql
Edit this file to add the line BEGIN TRANSACTION; to the beginning and ROLLBACK TRANSACTION; to the end. Now you can load it and run it in a query window in SQL Server. If you get any errors, make sure you go to the bottom of the file, highlight the ROLLBACK statement and run it (by hitting F5 while the statement is highlighted).
Basically, you have to resolve each error until the script runs through cleanly. Then you can change the ROLLBACK TRANSACTION to COMMIT TRANSACTION and run one final time.
Unfortunately, I cannot help with which errors you may see as I have never gone from PostgreSQL to SQL Server, only the other way around. Some things that I would expect to be an issue, however (obviously, NOT an exhaustive list):
PostgreSQL does auto-increment fields by linking a NOT NULL INTEGER field to a SEQUENCE using a DEFAULT. In SQL Server, this is an IDENTITY column, but they're not exactly the same thing. I'm not sure if they are equivalent, but if your original schema is full of "id" fields, you may be in for some trouble. I don't know if SQL Server has CREATE SEQUENCE, so you may have to remove those.
Database functions / Stored Procedures do not translate between RDBMS platforms. You'll need to remove any CREATE FUNCTION statements and translate the algorithms manually.
Be careful about encoding of the data file. I'm a Linux person, so I have no idea how to verify encoding in Windows, but you need to make sure that what SQL Server expects is the same as the file you are importing from PostgreSQL. pg_dump has an option --encoding= that will let you set a specific encoding. I seem to recall that Windows tends to use two-byte, UTF-16 encoding for Unicode where PostgreSQL uses UTF-8. I had some issue going from SQL Server to PostgreSQL due to UTF-16 output so it would be worth researching.
The PostgreSQL datatype TEXT is simply a VARCHAR without a max length. In SQL Server, TEXT is... complicated (and deprecated). Each field in your original schema that are declared as TEXT will need to be reviewed for an appropriate SQL Server data type.
SQL Server has extra data types for UNICODE data. I'm not familiar enough with it to make suggestions. I'm just pointing out that it may be an issue.
I have found a faster and easier way to accomplish this.
First copy your table (or query) to a tab delimited file like so:
COPY (SELECT siteid, searchdist, listtype, list, sitename, county, street,
city, state, zip, georesult, elevation, lat, lng, wkt, unlocated_bool,
id, status, standard_status, date_opened_or_reported, date_closed,
notes, list_type_description FROM mlocal) TO 'c:\SQLAzureImportFiles\data_script_mlocal.tsv' NULL E''
Next you need to create your table in SQL, this will not handle any schema for you. The schema must match your exported tsv file in field order and data types.
Finally you run SQL's bcp utility to bring in the tsv file like so:
bcp MyDb.dbo.mlocal in "\\NEWDBSERVER\SQLAzureImportFiles\data_script_mlocal.tsv" -S tcp:YourDBServer.database.windows.net -U YourUserName -P YourPassword -c
A couple of things of note that I encountered. Postgres and SQL Server handle boolean fields differently. Your SQL Server schema need to have your boolean fields set to varchar(1) and the resulting data will be 'f', 't' or null. You will then have to convert this field to a bit. doing something like:
ALTER TABLE mlocal ADD unlocated bit;
UPDATE mlocal SET unlocated=1 WHERE unlocated_bool='t';
UPDATE mlocal SET unlocated=0 WHERE unlocated_bool='f';
ALTER TABLE mlocal DROP COLUMN unlocated_bool;
Another thing is the geography/geometry fields are very different between the two platforms. Export the geometry fields as WKT using ST_AsText(geo) and convert appropriately on the SQL Server end.
There may be more incompatibilities needing tweaks like this.
EDIT. So whereas this technique does technically work, I am trying to transfer several million records from 100+ tables to SQL Azure and bcp to SQL Azure is pretty flaky it turns out. I keep getting intermittent Unable to open BCP host data-file errors, the server is intermittently timing out and for some reason some records are not getting transferred with no indications of errors or problems. So this technique is not stable for transferring large amounts of data to Azure SQL.
You can use Navicate a powerful GUI tool for working with various databases including Postgres and SQL Server.
You can transfer both schema and data easily as follows:
Create two connection for source and target database
Go to Tools -> Data Transfer
Select source database and target database with its IP, database name and schema
as you can see in the option, if target table is not exist, it would create
Tada, it takes 10 mins to transfer whole my 63 tables and its data from Postgres to SQL Server.
Enjoy it!

Run PostgreSQL stored procedure from SQL Server

I have a database in SQL Server which contains collected data during one day, and a database in PostgreSQL with OSM data. I need to modify collected data in order to create reports for my users.
Now, I imagined that somehow call PostgreSQL procedure from SQL Server, pass collected data to PostgreSQL, do something with that data, and return another result set to SQL Server for creating reports.
What is the 'most efficient' way for achieving this? OR, better question atm, what is the way for achieving this functionality?
My idea is to connect SQL Server and PostgreSQL with PostgreSQL ODBC driver, then copy data from SQL Server to PostgreSQL table, run that stored procedure on PostgreSQL, and return data to SQL Server result table. But, it is not scheduled task. Data to be transferred to PostgreSQL contains latitude, longitude and bearing for about 2-3 million of rows and function which analyses them requires one per one record, not all at once.
My way is using C# and npgsql to connect to postgres.
I create an app:
check SQL Server every min.
check Postgres for the last id inserted.
create a dataset from MSQL with the news ids
insert the new records into Postgres.
run postgres store procedure to generata new data
create separated webservice to consume the report generated on postgres.

Compare millions of records from Oracle to SQL server

I have an Oracle database and a SQL Server database. There is one table say Inventory which contains millions of rows in both database tables and it keeps growing.
I want to compare the Oracle table data with the SQL Server data to find out which records are missing in the SQL Server table on daily basis.
Which is best approach for this?
Create SSIS package.
Create Windows service.
I want to consume less resource to achieve this functionality which takes less time and less resource.
Eg : 18 millions records in oracle and 16/17 millions in SQL Server
This situation of two different database arise because two different application online and offline
EDIT : How about connecting SQL server from oracle through Oracle Gateway to SQL server to
1) Direct query to SQL server from Oracle to update missing record in SQL server for 1st time.
2) Create a trigger on Oracle which gets executed when record is deleted from Oracle and it insert deleted record in new oracle table.
3) Create SSIS package to map newly created oracle table with SQL server to update SQL server record.This way only few records have to process daily through SSIS.
What do you think of this approach ?
I would create an SSIS package and load the data from the Oracle table use a Data Flow / OLE DB Data Source. If you have SQL Enterprise, the Attunity Connectors are a bit faster.
Then I would load key from the SQL Server table into a Lookup transformation, where I would match the 2 sources on the key, and direct unmatched rows into a separate output.
Finally I would direct the unmatched rows output to a OLE DB Command, to update the SQL Server table.
This SSIS package will require a lot of memory, but as the matching is done in memory with minimal IO, it will probably outperform other solutions for speed. It will need enough free memory to cache all the keys from the SQL Server Table.
SSIS also has the advantage that it has lots of other transformation functions available if you need them later.
What you basically want to do is replication from Oracle to SQL Server.
You could do this in SSIS, A windows Service or indeed a multitude of platforms.
The real trick is using the correct design pattern.
There are two general design patterns
Snapshot Replication
You take all records from both systems and compare them somewhere (so far we have suggestions to compare in SSIS or compare on Oracle but not yet a suggestion to compare on SQL Server, although this is valid)
You are comparing 18 million records here so this is a lot of work
Differential replication
You record the changes in the publisher (i.e. Oracle) since the last replication then you apply those changes to the subscriber (i.e. SQL Server)
You can do this manually by implementing triggers and log tables on the Oracle side, then use a regular ETL process (SSIS, command line tools, text files, whatever), probably scheduled in SQL Agent to apply these to the SQL Server.
Or you could do this by using the out of the box replication capability to set up Oracle as a publisher and SQL as a subscriber: https://msdn.microsoft.com/en-us/library/ms151149(v=sql.105).aspx
You're going to have to try a few of these and see what works for you.
Given this objective:
I want to consume less resource to achieve this functionality which takes less time and less resource
transactional replication is far more efficient but complicated. For maintenance purposes, which platforms (.Net, SSIS, Python etc.) are you most comfortable with?
Other alternatives:
If you can use Oracle gateway for SQL Server then you do not need to transfer data and can make the query directly.
If you can't use Oracle gateway, you can use Pentaho data integration or another ETL tool to compare tables and get results. Is easy to use.
I think the best approach is using oracle gateway.Just follow the steps. I have similar type of experience.
Install and Configure Oracle Database Gateway for SQL Server.
https://docs.oracle.com/cd/B28359_01/gateways.111/b31042/installsql.htm
Now you can create a dblink from oracle to sql server.
Create a procedure which compare the missing records in oracle database and insert into sql server database.
For example, you can use this statement inside your procedure.
INSERT INTO "dbo"."sql_server_table"#dblink_name("column1","column2"...."column5")
VALUES
(
select column1,column2....column5 from oracle_table
minus
select "column1","column2"...."column5" from "dbo"."sql_server_table"#dblink_name
)
Create a scheduler which execute the procedure daily.
When both databases are online, missing records will be inserted to sql server. Otherwise the scheduler fail or you can execute the procedure manually.
It takes minimum resource.
I will suggest having a homemade ETL solution.
Schedule an oracle job to export source table data (on a daily
manner based on the application logic ) to plain CSV format.
Schedule a SQL-Server job (with acceptable delay from first oracle job) to read this CSV file and import it
to a medium table inside sql-servter using BULK INSERT.
Last part of the SQL-Server job will be reading medium table data
and do the logic(insert, update target table). I suggest having another table to store reports of this daily job result.

Resources