Data migration from MS SQL to PostgreSQL using SQLAlchemy - sql-server

TL;DR
I want to migrate data from a MS SQL Server + ArcSDE to a PostgreSQL + PostGIS, ideally using SQLAlchemy.
I am using SQLAlchemy 1.0.11 to migrate an existing database from MS SQL 2012 to PostgreSQL 9.2 (upgrade to 9.5 planned).
I've been reading about this and found a couple of different sources (Tyler Lesmann, Inada Naoki, Stefan Urbanek, and Mathias Fussenegger) with a similar approach for this task:
Connect to both databases
Reflect the tables of the source database
Iterate over the tables and for each table
Create an equal table in the target database
Fetch rows in the source and insert them in the target database
Code
Here is a short example using the code from the last reference.
from sqlalchemy import create_engine, MetaData
src = create_engine('mssql://user:pass#host/database?driver=ODBC+Driver+13+for+SQL+Server')
dst = create_engine('postgresql://user:pass#host/database')
meta = MetaData()
meta.reflect(bind=src)
tables = meta.tables
for tbl in tables:
data = src.execute(tables[tbl].select()).fetchall()
if data:
dst.execute(tables[tbl].insert(), data)
I am aware that fetching all the rows at the same time is a bad idea, it can be done with an iterator or with fetchmany, but that is not my issue now.
Problem 1
All the four examples fail with my databases. One of the errors I get is related to a column of type NVARCHAR:
sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) type "nvarchar" does not exist
LINE 5: "desigOperador" NVARCHAR(100) COLLATE "SQL_Latin1_General_C...
^
[SQL: '\nCREATE TABLE "Operators" (\n\t"idOperador" INTEGER NOT NULL, \n\t"idGrupo" INTEGER, \n\t"desigOperador" NVARCHAR(100) COLLATE "SQL_Latin1_General_CP1_CI_AS", \n\t"Rua" NVARCHAR(200) COLLATE "SQL_Latin1_General_CP1_CI_AS", \n\t"Localidade" NVARCHAR(200) COLLATE "SQL_Latin1_General_CP1_CI_AS", \n\t"codPostal" NVARCHAR(10) COLLATE "SQL_Latin1_General_CP1_CI_AS", \n\tdataini DATETIME, \n\tdataact DATETIME, \n\temail NVARCHAR(50) COLLATE "SQL_Latin1_General_CP1_CI_AS", \n\turl NVARCHAR(50) COLLATE "SQL_Latin1_General_CP1_CI_AS", \n\tPRIMARY KEY ("idOperador")\n)\n\n']
My understanding from this error is that PostgreSQL doesn't have NVARCHAR but VARCHAR, which should be equivalent. I thought that SQLAlchemy would automatically map both of them to String in its layer of abstraction, but perhaps it doesn't work that way in this case.
Question: Should I define all the classes/tables beforehand, for instance, in models.py, in order to avoid errors like this? If so, how would that integrate with the given (or other) workflow?
In fact, this error was obtained running the code from Urbanek, where I can specify which tables I want to copy. Running the sample above, leads me to...
Problem 2
The MS SQL installation is a geodatabase that is using ArcSDE (Spatial Database Engine). For that reason, some of the columns are of a non-defaultGeometry type. On the PostgreSQL side, I am using PostGIS 2.
When trying to copy tables with those types, I get warnings like these:
/usr/local/lib/python2.7/dist-packages/sqlalchemy/dialects/mssql/base.py:1791: SAWarning: Did not recognize type 'geometry' of column 'geom'
(type, name))
/usr/local/lib/python2.7/dist-packages/sqlalchemy/dialects/mssql/base.py:1791: SAWarning: Did not recognize type 'geometry' of column 'shape'
Those are later followed by another error (this one was actually thrown when executing the provided code above):
sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) relation "SDE_spatial_references" does not exist
LINE 1: INSERT INTO "SDE_spatial_references" (srid, description, aut...
^
I think that it failed to create the columns referred in the warnings, but the error was thrown at a later step when those columns were needed.
Question: The question is an extension of the previous one: how to do the migration with custom (or defined somewhere else) types?
I know about GeoAlchemy2 that can be used with PostGIS. GeoAlchemy supports MS SQL Server 2008, but in that case I guess I'm stuck with SQLAlchemy 0.8.4 (perhaps with less nice features). Also, I found here that it is possible to do the reflection using types defined by GeoAlchemy. However, my questions remain.
Possibly related
https://stackoverflow.com/questions/34475241/how-to-migrate-from-mysql-to-postgressql-using-pymysql
SqlAlchemy: export table to new database
https://stackoverflow.com/questions/34956523/sqlalchemy-custom-column-type-use-bindparam-as-multiple-function-parameters
SQLAlchemy Reflection Using Metaclass with Column Override
Edit
When I saw the error referring SDE_spatial_references I thought that it could be something related to ArcSDE, because the same machine also has ArcGIS for Server installed. Then I've learned that MS SQL Server also has some Spatial Data Types, and then I confirmed this is the case. I was wrong with this edit: the database is indeed using ArcSDE.
Edit 2
Here are some more details that I forgot to include.
The migration doesn't have to be done with SQLAlchemy. I'd thought that would be a good idea because:
I prefer to work with Python
The solution has to be with FOSS
Ideally, it would be in a way easily reproducible, and possible to launch and wait
After the migration I'd like to use Alembic to conduct further schema migrations
Other things that I have tried and failed (can't remember now the exact reasons, but I'd go through them again if any answer refers them):
Kettle
Geokettle
ogr2ogr (still trying this approach)
Database details:
Small database, ± 3 GB
± 40 tables
There are tables with both spatial and non-spatial data
Both databases (SQL Server and PostgreSQL) in the same server, which is running Windows Server 2008
No big problem with downtime (up to 8 hours would be fine)

Here is my solution using SQLAlchemy. This is a long-blog-like post, I hope that it is something acceptable here, and useful to someone.
Possibly, this also works with other combinations of source and target databases (besides MS SQL Server and PostgreSQL, respectively), although they were not tested.
Workflow (sort of TL;DR)
Inspect the source automatically and deduce the existing table models (this is called reflection).
Import previously defined table models which will be used to create the new tables in the target.
Iterate over the table models (the ones existing in both source and target).
For each table, fetch chunks of rows from source and insert them into target.
Requirements
SQLAlchemy
GeoAlchemy2
sqlacodegen
Detailed steps
1. Connect to the databases
SQLAlchemy calls engine to the object that handles the connection between the application and the actual database. So, to connect to the databases, an engine must be created with the corresponding connection string. The typical form of a database URL is:
dialect+driver://username:password#host:port/database
You can see some example of connection URL's in the SQLAlchemy documentation.
Once created, the engine will not establish a connection until it is explicitly told to do so, either through the .connect() method or when an operation which is dependent on this method is invoked (e.g., .execute()).
con = ms_sql.connect()
2. Define and create tables
2.1 Source database
Tables in the source side are already defined, so we can use table reflection:
from sqlalchemy import MetaData
metadata = MetaData(source_engine)
metadata.reflect(bind=source_engine)
You may see some warnings if you try this. For example,
SAWarning: Did not recognize type 'geometry' of column 'Shape'
That is because SQLAlchemy does not recognize custom types automatically. In my specific case, this was because of an ArcSDE type. However, this is not problematic when you only need to read data. Just ignore those warnings.
After the table reflection, you can access the existing tables through that metadata object.
# see all the tables names
print list(metadata.tables)
# handle the table named 'Troco'
src_table = metadata.tables['Troco']
# see that table columns
print src_table.c
2.2 Target database
For the target, because we are starting a new database, it is not possible to use tables reflection. However, it is not complicated to create the table models through SQLAlchemy; in fact, it might be even simpler than writing pure SQL.
from sqlalchemy import Column, Integer, String
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class SomeClass(Base):
__tablename__ = 'some_table'
id = Column(Integer, primary_key=True)
name = Column(String(50))
Shape = Column(Geometry('MULTIPOLYGON', srid=102165))
In this example there is a column with spatial data (defined here thanks to GeoAlchemy2).
Now, if you have tenths of tables, defining so many tables may be baffling, tedious, or error prone. Luckily, there is sqlacodegen, a tool that reads the structure of an existing database and generates the corresponding SQLAlchemy model code. Example:
pip install sqlacodegen
sqlacodegen mssql:///some_local_db --outfile models.py
Because the purpose here is just to migrate the data, and not the schema, you can create the models from the source database, and just adapt/correct the generated code to the target database.
Note: It will generate mixed class models and Table models. Read here about this behavior.
Again, you will see similar warnings about unrecognized custom data types. That is one of the reasons why we now have to edit the models.py file and adjust the models. Here are some hints on things to adjust:
The columns with custom data types are defined with NullType. Replace them with the proper type, for instance, GeoAlchemy2's Geometry.
When defining Geometry's, pass the correct geometry type (linestring, multilinestring, polygon, etc.) and the SRID.
PostgreSQL character types are variable length capable, and SQLAlchemy will map String columns to them by default, so we can replace all Unicode and String(...) by String. Note that it is not required, nor advisable (don't quote me on this), to specify the number of characters in String, just omit them.
You will have to double check, but, probably, all BIT columns are in fact Boolean.
Most numeric types (e.g., Float(...), Numeric(...)), likewise for character types, can be simplified to Numeric. Be careful with exceptions and/or some specific case.
I have noticed some issues with columns defined as indexes (index=True). In my case, because the schema will be migrated, these should not be required now and could be safely removed.
Make sure the table and column names are the same in both databases (reflected tables and defined models), this is a requirement for a later step.
Now we can connect the models and the database together, and create all the tables in the target side.
Base.metadata.bind = postgres
Base.metadata.create_all()
Notice that, by default, .create_all() will not touch existing tables. In case you want to recreate or insert data into an existing table, it is required to DROP it beforehand.
Base.metadata.drop_all()
3. Get data
Now you are ready to copy data from one side and, later, paste it into the other. Basically, you just need to issue a SELECT query for each table. This is something possible and easy to do over the layer of abstraction provided by SQLAlchemy ORM.
data = ms_sql.execute(metadata.tables['TableName'].select()).fetchall()
However, this is not enough, you will need a little bit more of control. The reason for that is related to ArcSDE. Because it uses a proprietary format, you can retrieve the data but you cannot parse it correctly. You would get something like this:
(1, Decimal('0'), u' ', bytearray(b'\x01\x02\x00\x00\x00\x02\x00\x00\x00#\xb1\xbf\xec/\xf8\xf4\xc0\x80\nF%\x99(\xf9\xc0#\xe3\xa5\x9b\x94\xf6\xf4\xc0\x806\xab>\xc5%\xf9\xc0'))
The workaround here was to convert the geometric column to the Well Known Text (WKT) format. This conversion has to take place in the database side. ArcSDE is there, so it knows how to convert it. So, for example, in the TableName there is a column with spatial data called shape. The required SQL statement should look like this:
SELECT [TableName].[shape].STAsText() FROM [TableName]
This uses .STAsText(), a geometry data type method of the SQL Server.
If you are not working with ArcSDE, the following steps are not required:
iterate over the tables (only those that are defined in both the source and in the target),
for each table, look for a geometry column (list them beforehand)
build a SQL statement like the one above
Once a statement is built, SQLAlchemy can execute it.
result = ms_sql.execute(statement)
In fact, this does not actually get the data (compare with the ORM example -- notice the missing .fetchall() call). To explain, here is a quote from the SQLAlchemy docs:
The returned result is an instance of ResultProxy, which references a
DBAPI cursor and provides a largely compatible interface with that of
the DBAPI cursor. The DBAPI cursor will be closed by the ResultProxy
when all of its result rows (if any) are exhausted.
The data will only be retrieved just before it is inserted.
4. Insert data
Connections are established, tables are created, data have been prepared, now lets insert it. Similarly to getting the data, SQLAlchemy also allows to INSERT data into a given table through its ORM:
postgres_engine.execute(Base.metadata.tables['TableName'].insert(), data)
Again, this is easy, but because of non-standard formats and erroneous data, further manipulation will probably be required.
4.1 Matching columns
First, there were some issues with matching the source columns with the target columns (of the same table) -- perhaps this was related to the Geometry column. A possible solution is to create a Python dictionary, which maps the values from the source column to the key (name) of the target column.
This is performed row by row -- although, it is not so slow as one would guess, because the actual insertion will be by several rows at the same time. So, there will be one dictionary per row, and, instead of inserting the data object (which is a list of tuples; one tuple corresponds to one row), you will be inserting a list of dictionaries.
Here is an example for one single row. The fetched data is a list with one tuple, and values is the built dictionary.
# data
[(1, 6, None, None, 204, 1, True, False, 204, 1.0, 1.0, 1.0, False, None]
# values
[{'DateDeleted': None, 'sentidocirculacao': False, 'TempoPercursoMed': 1.0,
'ExtensaoTroco': 204, 'OBJECTID': 229119, 'NumViasSentido': 1,
'Deleted': False, 'TempoPercursoMin': 1.0, 'IdCentroOp': 6,
'IDParagemInicio': None, 'IDParagemFim': None, 'TipoPavimento': True,
'TempoPercursoMax': 1.0, 'IDTroco': 1, 'CorredorBusext': 204}]
Note that Python dictionaries are not ordered, that is why the numbers in both lists are not in the same position. The geometric column was removed from this example for simplification.
4.2 Fixing geometries
Probably, the previous workaround would not be required if this issue had not occurred: sometimes geometries are stored/retrieved with the wrong type.
In MSSQL/ArcSDE, the geometry data type does not specify which type of geometry it is being stored (i.e., line, polygon, etc.). It only cares that it is a geometry. This information is stored in another (system) table, called SDE_geometry_columns (see in the bottom of that page). However, Postgres (PostGIS, actually) requires the geometry type when defining a geometric column.
This leads to spatial data being stored with the wrong geometry type. By wrong I mean that it is different than what it should be. For instance, looking at SDE_geometry_columns table (excerpt):
f_table_name geometry_type
TableName 9
geometry_type = 9 corresponds to ST_MULTILINESTRING. However, there are rows in TableName table which are stored (or received) as ST_LINESTRING. This mismatch raises an error in Postgres side.
As a workaround, you can edit the WKT while creating the aforementioned dictionaries. For example, 'LINESTRING (10 12, 20 22)' is transformed to MULTILINESTRING ((10 12, 20 22))'.
4.3 Missing SRID
Finally, if you are willing to keep the SRID's, you also need to define them when creating geometric columns.
If there is a SRID defined in the table model, it has to be satisfied when inserting data in Postgres. The problem is that when fetching geometry data as WKT with the .STAsText() method, you lose the SRID information.
Luckily, PostGIS supports an Extended-WKT (E-WKT) format that includes the SRID.
The solution here is to include the SRID when fixing the geometries. With the same example, 'LINESTRING (10 12, 20 22)' is transformed to 'SRID=102165;MULTILINESTRING ((10 12, 20 22))'.
4.4 Fetch and insert
Once everything is fixed, you are ready to insert. As referred before, only now the data will be actually retrieved from the source. You can do this in chunks (a user defined amount) of data, for instance, 1000 rows at a time.
while True:
rows = data.fetchmany(1000)
if not rows:
break
values = [{key: (val if key.lower() != "shape" else fix(val, 102165))
for key, val in zip(keys, row)} for row in rows]
postgres_engine.execute(target_table.insert(), values)
Here fix() is the function that will correct the geometries and prepend the given SRID to geometric columns (which are identified, in this example, by the column name of "shape") -- like described above --, and values is the aforementioned list of dictionaries.
Result
The result is a copy of the schema and data, existing on a MS SQL Server + ArcSDE database, into a PostgreSQL + PostGIS database.
Here are some stats, from my use case, for performance analysis. Both databases are in the same machine; the code was executed from a different machine, but in the same local network.
Tables | Geometry Column | Rows | Fixed Geometries | Insert Time
---------------------------------------------------------------------------------
Table 1 MULTILINESTRING 1114797 702 17min12s
Table 2 None 460874 --- 4min55s
Table 3 MULTILINESTRING 389485 389485 4min20s
Table 4 MULTIPOLYGON 4050 3993 34s
Total 3777964 871243 48min27s

I faced the same problems trying to migrate from Oracle 9i to MySQL.
I built etlalchemy to solve this problem, and it has currently been tested migrating to and from MySQL, PostgreSQL, SQL Server, Oracle and SQLite. It leverages SQLAlchemy, and BULK CSV Import features of the aforementioned RDBMS's (and can be quite fast!).
Install (non El-capitan): pip install etlalchemy
Install (El-capitan): pip install --ignore-installed etlalchemy
Run:
from etlalchemy import ETLAlchemySource, ETLAlchemyTarget
# Migrate from SQL Server onto PostgreSQL
src = ETLAlchemySource("mssql+pyodbc://user:passwd#DSN_NAME")
tgt = ETLAlchemyTarget("postgresql://user:passwd#hostname/dbname",
drop_database=True)
tgt.addSource(src)
tgt.migrate()

I'd recommend this flow with two big steps to migrate:
Migrate schema
Dump source DB schema, preferably to some unified format across data tools like UML (this and next steps will need and be easier with toll like Toad Data Modeler or IBM Rational Rose).
Change tables definitions from source types to target types when needed with TDM or RR. E. g. get rid of varchar(n) and stick to text in postgres, unless you specifically need application to crash on data layer with strings longer than n. Omit (for now) complex types like geometry, if there is no way to convert them in data modeling tools.
Generate a DDL-file for target DB (mentioned tools are handy here, again).
Create (and add to tables) complex types as they should be handled by target RDBMS. Try to insert a couple of entries to be sure datatypes are consistent. Add these types to your DDL-file.
You may also want to disable checks like foreign key constraints here.
Migrate data
Dump simple tables (i. e. with scalar fields) to a CSV.
Import simple tables data.
Write a simple piece of code to select complex data from source and to insert this into target (it is easier than it sounds, just select -> map attributes -> insert). Do not write migration for all complex types in one code routine, keep it simple, divide and conquer.
If you have not disabled checks while you were migrating schema it is possible that you need to repeat steps 2 and 3 for different tables (that's why, well, disable checks :)).
Enable checks.
This way you will split your migration process in simple atomic steps, and failure on a step 3 of data migration will not cause you to move back to the schema migration, etc. You can just truncate a couple of tables, and rerun data import if something fail.

Related

Specify columns while appending Snowpark Python Dataframe to table

So right now, I have a Dataframe created using the session.createDataFrame() in Python. The intention is to append this Dataframe to an existing table object in Snowflake.
However the schema of the source dataframe doesn't match exactly with the schema of the target table. In Snowpark Scala, the DataFrameWriter object has the method option() Saving/Appending Dataframe to a table that allows the specification of column order, and hence allows for skipping columns from the dataframe as the columns could be matched by their names.
However, Snowpark Python lacks the option() for DataframeWriter at the moment. This forces Snowflake to look for the schemas and the count of columns (between source and target ) to match, else an error is thrown.
Not sure when Snowpark for Python would receive this feature, but in the interim, is there any alternative to this (apart from hardcoding columns names in the INSERT query) ?
You are right that Snowpark does not make inserting novel records easy. But it is possible. I did it with the Snowpark Java SDK that lacked any source/docs just banging my head on the desk until it worked.
I first did a select against the target table (see first line), then got the schema, then created a new Row object with the correct order and types. Use column "order" mode not the column "name" mode. It's also really finicky about types - doesn't like java.util.Dates but wants Timestamps, doesn't like Integers but need Longs, etc.
Then do an "append"->"saveAsTable". By some miracle it worked. Agreed it would be fantastic if they accepted a Map<String, Object> to insert a row or let you map columns using names. But they probably want to discourage this given the nature of warehouse performance for row based operations.
In Java...
DataFrame dfSchema = session.sql("select * from TARGET_TABLE limit 1");
StructType schema = dfSchema.schema();
System.out.println(schema);
Row[] rows = new Row[]{Row.fromArray(new Object[]{endpoint.getDatabaseTable(), statusesArr, numRecords, Integer.valueOf(filenames.size()).longValue(), filenamesArr, urlsArr, startDate, endDate})};
DataFrame df = session.createDataFrame(rows, schema);
System.out.println(df.showString(0, 120));
df.write().mode("Append").saveAsTable("TARGET_TABLE");
In the save_as_table method, use the parameter column_order="name". See Snowflake save_as_table docs. This should match the columns by name and allow you to omit missing columns without the column number mismatch error.
It's also good practice to include a schema when creating your session. See Snowflake create_dataframe docs on using the StructType class.

SQL Blob to Base 64 in Table for FileMaker

I have looked and found some instances there something similar is being done for websites etc....
I have a SQL table that I am accessing in FileMaker Pro (Through ESS) via an ODBC connection to the SQL database and I have everything I need except there is one field(LNL_BLOB) in one table (duo.MMOBJS) which is an image "(image, null)" which cannot be accessed via the ODBC connection.
What I am hopping to accomplish is find a way that when an image is placed in the field, it is ALSO converted to Base64 in another field in the same table. Also, the database creator has a "View" (Foreign Concept to us Filemaker Developers) with this same data called "dbo.VW_BLOB_IMAGES" if that is helpful.
If there is a field with Base64 text, within FileMaker I can decode it to get the image.
What thoughts do you all have? Is there and even better way?
NOTE: I am using many tables and lots of the data in the app that I have made, this image is not the only reason I have created the ODBC connection.
Table
View
Well, one way to get base64 out of SQL would be to trick the XML engine in SQL to convert your column to base64, then strip out the XML:
SELECT SUBSTRING(Q.Base64Data, 7, LEN(Q.Base64Data)-9)
FROM (SELECT
(
SELECT LNL_BLOB AS B
FROM duo.MMOBJS
FOR XML raw('r'), BINARY BASE64
) AS [Base64Data]) AS [Q]
You'd probably want to add that to your select statement or a view, rather than add it to the table; but, you could write a trigger that would maintain the field using that definition.

Bulk Update with SQL Server based on a list of primary keys

I am processing data from a database with millions of rows. I am pulling a batch of 1000 items from the database and processing them without a problem. I am not loading the whole entity and I am just pulling down a few columns of data for the batch.
What I want to do is mark the 1000 rows as processed with a single SQL command.
Something like:
UPDATE dbo.Clients
SET HasProcessed = 1
WHERE ClientID IN (...)
The ... is just a list of integers.
Context:
Azure SQL Server 2012
Entity Framework
Database.ExecuteSqlCommand
Ideas:
I know I could build the command as a pure string, but this would mean not using any SqlParameters and not benefiting from the query plan optimization.
Also, I found some information about table-valued parameters, but this requires creating a table type and some overhead which I would like to avoid. This is just a list of integers after all.
Question:
Is there an easy (performant) way to do this that I am overlooking either with Entity Framework or ExecuteSqlCommand?
If not and using table-valued parameters is the best way, could you provide a complete example of how to convert an integer list into the simplest type and running that with the above query?

Merging multiple Access databases into SQL Server

We have a program in which each user is given their own Access database. We'd like to merge these all together into a single SQL Server database.
The problem is that, using the SQL Server import/export wizard, the primary/foreign keys do not get updated. So for instance if one user has this table:
1 Apple
2 Banana
and another user has this:
1 Coconut
2 Cheeseburger
the resulting table looks like this:
1 Apple
2 Banana
1 Coconut
2 Cheeseburger
Similarly, anything that referenced Banana by its primary key (2) is now referencing both Banana and Cheeseburger, which will not make the vegans very happy.
Is there any way to automatically update the primary/foreign key references when importing, other than writing an extremely long and complex import-script?
If you need to keep them fully compartmentalized, you have to assign some kind of partitioning column to each table. Is there a reason you need your SQL Server to have the same referential integrity as Access? Are you just importing to SQL Server for read-only reporting? In that case, I would not bother with RI. The queries will all require a partitionid/siteid/customerid. You could enforce that for single-entity access by wrapping tables with a table-valued UDF which required the partitionid. For cross-site that doesn't work.
If you are just loading to SQL Server for reporting, I would also consider altering the data model to support reporting (i.e. a dimensional model is sometimes better than a normalized model) instead of worrying about transaction processing.
I think we need to know more about the underlying goals.
Need more information of requirements.
My basic question is 'Do you need to preserve the original record key?' e.g. 1:apple in table T of user-database A; 1:coconut in table T of user-database B. Table T is assumed to have the same structure in all database instances. Reasons I can suppose that you may want to preserve the original data: (a) you may have a requirement to the reference the original data (maybe a visual for previous reporting), and/or (b) there may be a data dependency in the application itself.
If the answer is 'no,' then you are probably interested only in preserving all of the distinct data values. Allow the SQL table to build using a new key and constrain the SQL table field such that it contains unique data. This approach seems to preserve the original table structure (but not the original key value or its 'location') and may suffice to meet your requirement.
If the answer is 'yes,' I do not see a way around creating an index that preserves a pointer to the original database and the key that was created in its table T. This approach would seem to require an application modification.
The best approach in this case is probably to split the incoming data into two tables: one to identify the database and original key, another to identify the distinct data values. For example: (database) table D has records such as 'A:1:a,' 'A:2:b,' 'B:1:c,' 'B:2:d,' 'B:15:a,' 'C:8:a'; (data) table T1 has records such as 'a:apple,' 'b:banana,' 'c:coconut,' 'd:cheeseburger' where 'A' describes the original database 'location,' 1 is the original value in location 'A,' and 'a' is a value that equates records in table D and table T1. (Otherwise you have a lot of redundant data in the one table; e.g. A:1:apple, B:15:apple, C:8:apple.) Also, T1 has a structure similar to the original T and is seems to be more directly useful in the application.
Ended up creating an SSIS project for this. SSIS is a visual programming tool made by Microsoft (and part of their "Business Integration Studio", which comes with SQL Server) designed for solving exactly these sorts of problems.
Why not let Access use its replication manager to merge the databases? This will allow you to identify the conflicts and resolve them before importing to SQL Server. I'm fairly confident it will retain the foreign key relationships. If I understand your situation correctly, and the databases are the same structure with different data, you could load the combined database to the application and verify the data before moving to SQL Server.
What version of Access are you using? Here's a link for Access 2000. Use the language to adjust search parameters to fit your version.
http://technet.microsoft.com/en-us/library/cc751054.aspx

Export tables from SQL Server to be imported to Oracle 10g

I'm trying to export some tables from SQL Server 2005 and then create those tables and populate them in Oracle.
I have about 10 tables, varying from 4 columns up to 25. I'm not using any constraints/keys so this should be reasonably straight forward.
Firstly I generated scripts to get the table structure, then modified them to conform to Oracle syntax standards (ie changed the nvarchar to varchar2)
Next I exported the data using SQL Servers export wizard which created a csv flat file. However my main issue is that I can't find a way to force SQL Server to double quote column names. One of my columns contains commas, so unless I can find a method for SQL server to quote column names then I will have trouble when it comes to importing this.
Also, am I going the difficult route, or is there an easier way to do this?
Thanks
EDIT: By quoting I'm refering to quoting the column values in the csv. For example I have a column which contains addresses like
101 High Street, Sometown, Some
county, PO5TC053
Without changing it to the following, it would cause issues when loading the CSV
"101 High Street, Sometown, Some
county, PO5TC053"
After looking at some options with SQLDeveloper, or to manually try to export/import, I found a utility on SQL Server management studio that gets the desired results, and is easy to use, do the following
Goto the source schema on SQL Server
Right click > Export data
Select source as current schema
Select destination as "Oracle OLE provider"
Select properties, then add the service name into the first box, then username and password, be sure to click "remember password"
Enter query to get desired results to be migrated
Enter table name, then click the "Edit" button
Alter mappings, change nvarchars to varchar2, and INTEGER to NUMBER
Run
Repeat process for remaining tables, save as jobs if you need to do this again in the future
Use the SQLDeveloper migration tools
I think quoting column names in oracle is something you should not use. It causes all sort of problems.
As Robert has said, I'd strongly advise agains quoting column names. The result is that you'd have to quote them not only when importing the data, but also whenever you want to reference that column in a SQL statement - and yes, that probably means in your program code as well. Building SQL statements becomes a total hassle!
From what you're writing, I'm not sure if you are referring to the column names or the data in these columns. (Can SQLServer really have a comma in the column name? I'd be really surprised if there was a good reason for that!) Quoting the column content should be done for any string-like columns (although I found that other characters usually work better as the need to "escape" quotes becomes another issue). If you're exporting in CSV that should be an option .. but then I'm not familiar with the export wizard.
Another idea for moving the data (depending on the scale of your project) would be to use an ETL/EAI tool. I've been playing around a bit with the Pentaho suite and their Kettle component. It offered a good range of options to move data from one place to another. It may be a bit oversized for a simple transfer, but if it's a big "migration" with the corresponding volume, it may be a good option.

Resources