SQL Server: importing from Excel, only want the new entries - sql-server

The task is to have SQL Server read an Excel spreadsheet, and import only the new entries into a table. The entity is called Provider.
Consider an Excel spreadsheet like this:
Its target table is like this:
The task is to:
using 2008 Express toolset
import into an existing table in SQL Sever 2000
existing data in the table! Identity with increment is PK. This is used as FK in another table, with references made.
import only the new rows from the spreadsheet!
ignore rows who don't exist in spreadsheet
Question:
How can I use the SQL 2008 toolset (Import and Export wizard likely) to achieve this goal? I suspect I'll need to "Write a query to specify the data to transfer".
Problem being is that I cannot find the query as the tool would be generating to make fine adjustments.

What I'd probably do is bulk load the excel data into a separate staging table within the database and then run an INSERT on the main table to copy over the records that don't exist.
e.g.
INSERT MyRealTable (ID, FirstName, LastName,.....)
SELECT ID, FirstName, LastName,.....
FROM StagingTable s
LEFT JOIN MyRealTable r ON s.ID = r.ID
WHERE r.ID IS NULL
Then drop the staging table when you're done.

You can run some updates on that stage table before you load it, to clean it up, if you need to, as well. UPDATE SET NAME = RTROM(LTRIM(Name))
FROM YOUR.STAGE.TABlE for example

Related

Updating one column in Oracle

I have an Oracle table which contains a column called dt_code, first_name, last_name, and user_id. I need to update dt_code with a list of codes that was given to me in an excel file. What would be the best way to update the column and maintain the relationships.
as simple as
update your_table
set dt_code = new_code
where id = specific_id;
this won't break any relationships.
Note that Oracle allow you to import xls datas, but since I have no idea of your syntax it is hard to tell you how to do it.
If there is a lot of update to do, you should import all the datas in a temporary table, then do the update based on this table.
If you choose this option and you are not used to this kind of update statement, have a look at this thread Update statement with inner join on Oracle.

Selectively dumping and inserting into new database with a new primary key

I have a database has been running on a server. I also have a database running for about a month on a new server which has data based on the old server. Since these both have been running this past month, the data on them are not equal.
We want to move selective data from two tables in the old database to the new one. This is the select I want to move, one month of data:
select * from table1 left join table2 on table1.keyID = table2.keyID
where table2.updated between '2013-08-01' and '2013-08-31';
From my understanding I would probably need to dump each table on its own. However when inserting this data into the new database, I would need to give these entries new keyID (this is autogenerated). How can I do this while keeping the connection between these two tables?
Take a dump of table1 & table2 from the old server and restore it in the new server under the database name oldDB
I Assume your database name in new server is newdb
Insert into newdb.table1 (Field1,Field2,field3)
select Field1,Field2,field3
from olddb.table1
left join olddb.table2
on (olddb.table1.keyID = olddb.table2.keyID)
Where olddb.table2.updated between '2013-08-01' and '2013-08-31';
Please note that you have to specify all the fields in select statement except your KeyID field. KeyID number will be autogenerated by the database
I am assuming that the KeyID Field is Auto Increment field other wise this solution will not work

Archiving Production DB Insert/Update with SQL Server 2008

I have a production database and an archive database in a second SQL Server instance.
When I insert or update (NOT DELETE) data in the production database, I need to insert or update the same data in the archive database.
What is the good way for do that?
Thanks
If they are in the same db instance, a trigger would be trivial assuming it's not a lot of tables.
If the size of this grows, you'll probably want to look into SQL Server replication. Microsoft has spent a lot of time and money to do it right.
If you are considering using triggers for this, then you may want to take into account the load sizes for your production database. If it is very intensive database, consider using some high availability solution such as Replication or Mirroring or Log shipping. Depending on your needs, either of the solution could serve you right.
Also at the same time, you should consider your "cold" recovery solutions which would need to be changed in accordance to what you implement.
Replication will replicate your deletions as well. However, not deleting the deletions from your archive database may cause problems down the line on unique indexes, where a value is valid in the production database but not valid in the archive database because the values already exist there. If your design means that this is not an issue, then a simple trigger in the production table will do this for you:
CREATE TRIGGER TR_MyTable_ToArchive ON MyTable FOR INSERT, UPDATE AS
BEGIN
SET ROW_COUNT OFF
-- First inserts
SET IDENTITY_INSERT ArchiveDB..MyTable ON -- Only if identity column is used
INSERT INTO ArchiveDB..MyTable(MyTableKey, Col1, Col2, Col3, ...)
SELECT MyTableKey, Col1, Col2, Col3, ...
FROM inserted i LEFT JOIN deleted d ON i.MyTableKey = d.MyTableKey
WHERE d.MyTableKey IS NULL
SET IDENTITY_INSERT ArchiveDB..MyTable OFF -- Only if identity column is used
-- then updates
UPDATE t SET Col1 = i.col1, col2 = i.col2, col3 = i.col3, ...
FROM ArchiveDB..MyTable t INNER JOIN inserted i ON t.MyTableKey = i.MyTableKey
INNER JOIN deleted d ON i.MyTableKey = d.MyTableKey
END
This assumes that your archive database resides on the same server as your production database. If this is not the case, you'll need to create a linked server entry, and then replace ArchiveDB..MyTable with ArchiveServer.ArchiveDB..MyTable, where ArchiveServer is the name of the linked server.
If there is a lot of load on your production database already, however, bear in mind that this will double it. To circumvent this, you can add an update flag field in each of your tables, and run a scheduled task at a time when the database load is at a minimum, like 1am. Your trigger would then set the field to I for an insert or U for an update in the production database, and the scheduled task would perform then update or insert in the archive database, depending on the value of this field, and then reset the field to NULL once it has finished.

Populating a table with fields from two other tables

I have two tables in Filemaker:
tableA (which includes fields idA (e.g. a123), date, price) and
tableB (which includes fields idB (e.g. b123), date, price).
How can I create a new table, tableC, with field id, populated with both idA and idB, (with the other fields being used for calculations on the combined data of both tables)?
The only way is to script it (for repeating uses) or do it 'manually', if this is an ad-hoc process. Details depend on the situation, so please clarify.
Update: Sorry, I actually forgot about the question. I assume the ID fields do not overlap even across tables and you do not need to add the same record more than once, but update it instead. In such a case the simplest script would be like that:
Set Variable[ $self, Get( FileName ) ]
Import Records[ $self, Table A -> Table C, sync on ID, update and add new ]
Import Records[ $self, Table B -> Table C, sync on ID, update and add new ]
The Import Records step is managed via rather elaborate dialog, but the idea is that you import from the same file (you can just type file:<YourFileName> there), the format is FileMaker Pro, and then set the field mapping. Make sure to choose the Update matching records and Add remaining records options and select the ID fields as key files to sync by.
It would be a FileMaker script. It could be run as a script trigger, but then it's not going to be seamless to the user. Your best bet is to create the tables, then just run the script as needed (manually) to build Table C. If you have FileMaker Server, you could schedule this script to be run periodically to keep Table C up-to-date.
Maybe you can use the select into statement.
I'm unsure if you wish to use calculated field from TableA and TableB or if your intension was to only calculate fields from the same table?
If tableA.IdA exists also in tableB.IdA, you could join the two tables and select into.
Else, you run the statement once for each table.
Select into statement
Select tableA.IdA, tableA.field1A, tableA.field2A, tableA.field1A * tableB.field2A
into New_Table from tableA
Edit: missed the part where you mentioned FileMaker.
But maybe you could script this on the db and just drop the table.

order hint for openquery?

I need to execute the following SQL (SQL Server 2008) in a scheduled job periodically. The Query plan shows 53% cost is sort after the data is pulled from the oracle server. However, I've ordered the data in the openquery. How to force the query not to sort when merge joining?
merge target as t
using (select * from openquery(oracle, '
select * from t1 where UpdateTime > ''....'' order by k1, k2')
) as s on s.k1=t.k1 and s.k2=t.K2 -- the clustered PK of "target" is K1,k2
when matched then ......
when not matched then ......
Is there something like bulk insert's "with (order( { column [ ASC | DESC ] } [ ,...n ] ))"? will it help improve the query plan of the merge statement if it exists?
If the oracle table already have PK on K1,K2, will just using oracle.db.owner.tablename as target better? (will SQL Server figure out the index from oracle meta information?)
Or the best I can do is stored the oracle data in a local temp table and create a clustered primary key on K1,k2? I am trying to avoid to create a temp table because sometime the returned openquery data set can be large.
I think a table is the best way to go because then you can create whatever indexes you need, but there's no reason why it should be temporary; why not create a permanent staging table? A local join using local indexes will probably be much more efficient than a join on the results of a remote query, although the only way to know for sure is to test it and see.
If you're worried about the large number of rows, you can look into only copying over new or changed rows. If the Oracle table already has columns for row creation and update times, that would be quite easy.
Alternatively, you could consider using SSIS instead of a scheduled job. I understand that if you're not already using SSIS you may not want to invest time in learning it, but it's a very powerful tool and it's designed for moving large amounts of data into MSSQL. You would create a package with the following workflow:
Delete existing rows from the staging table (only if you can't populate it incrementally)
Copy the data from Oracle
Execute the MERGE statement

Resources