The process is one where I would get 28 fixed width files and combine it into one table. In the past, this was done via FoxPro. As I have learned today, there were duplicates for which FoxPro did not reject or have any issues with. I have discovered that I need to write a merge statement in order to import the 28 and not get tripped up by duplicate primary key errors when I try to import each one separately using the Import Wizard.
I use Management Studio with a SQL Server Express front end and therefore can't create SSIS packages.
I am going to break this up into two questions so as to not make this too convoluted. First, I have since converted the fixed width files into tab-delimited text files by using Excel.
First question: Can one construct a merge statement that brings the files (tab-delimited) into SQL Server from the C drive? I could import each using the import wizard but that is cumbersome. I know how to write a merge statement but it demands that the data already exist in SQL Server. Below is an example. The question is how would I bring it in from outside.
Merge Industry as TARGET
Using Table1 as SOURCE
On (TARGET.Primary keys 1-9 = SOURCE.Primary keys 1-9)
No, you can't import data during or as part of a MERGE statement. The MERGE operation is purely for the 'upsert' situation; constructing logic on combining two result sets with criteria for matches and mismatches.
To get data into SQL Server you can either work via the UI (which is pretty boring and error prone when you have 28 files), or you can use some of the built in commands such as BULK INSERT.
Perhaps you could BULK INSERT the files one by one, and merge after each import.
If you wanted to continue using Foxpro but eliminate the duplicate records the first piece of advice would be to quit using the Import Wizard.
Wizards may be convenient to use, but they come with their own set of 'baggage' which can be problematic.
Aside from saying that they are in fixed field length format, you don't indicate which format(s) the 28 import files are in (CSV, SDF, TXT, ect.). Regardless you can farily easily write Foxpro code to handle all of the importing without the use of a 'Wizard'.
Then once all of the records have been imported you can readily eliminate the duplicates with something like the following:
SELECT ImportDBF && Assuming it is used EXCLUSIVELY
DELETE ALL
INDEX ON <primary key> UNIQUE TAG Uniq && Create an Index on only UNIQUE instances of your Primary key field
RECALL ALL && Recall only those UNIQUE records
DELETE TAG Uniq && Eliminate the temporary Index
PACK && PACK out the duplicate records
Now your Foxpro data table should be ready to go.
Good Luck
Related
I'm doing an exercise in which I have to clean data from a Flat File Source and write it on my Database. I have already managed to clean all of the fields by using some data quality rules for each field and also generate error codes which I write to a different table when a rule is broken.
My problem is that for the final step of the exercise I have to generate some Power BI graphics in which it shows how many fields were fixed from the source and which fields where cleansed. The only thing that I have thought compares the DB table to the flat file source or maybe do something with script components but I don't really think that those are really good solutions.
Has anybody encountered this problem? if somebody could point me out for info for something like this, it would be great. Thanks!
If I am facing a similar issue, I will do this in three steps:
Importing data without any transformation to a staging table
Cleaning data and loading it into the destination table
Comparing staging and destination table to get how many values were fixed.
From design standpoint - establishing a key is central before starting to clean.
Use could use SSIS derived column transformation to create a business key that is a concatenation of available fields to create a unique key, using FindString function and string functions.
Similar to the above step add a column in your staging table or use a derived column (depending on if you are using sql cleanup or ssis tasks to cleanup) to indicate if it was cleaned or not.
I have an SSIS package in which I'm reading the records from a Flat File and storing them in a recordset. Is it possible to compare the values in the recordset with the values in a database table and update the table?
I'm Using SQL Server 2008 R2 and Same version of SSIS.
Leran2002's answer in general is right, the most straight forward way is to have a lookup component set up to Redirect rows to no match output and use a destination and a OLE DB Command afterwards.
However depending on the size of the result sets, this might be slow, since the lookup component will check each row one-by-one and if your destination table has lots of records, this will take some time. Furthermore, depending on your cache settings in the lookup component, it can use lots of memory.
There are two more ways to achieve this:
Merge Join
Using your file source and your destination table as a source, you can use a Merge Join. The logic in the DFT is a bit more complex, but this more a set-based approach and with large result sets it is working better.
You'll have to implement the logic which record has to be updated, inserted, deleted or discarded from the file using a conditional split component.
I highly recommend this question (not exactly your problem, but a good comparison in my opinion): What are the differences between Merge Join and Lookup transformations in SSIS?
Staging table
Another way is to use a staging table to temporarily store the records from a file. In this case, your DFT just loads the records from a file into the staging table, then with one or more Execute SQL Task you can do the merging of the two data sets. (UPDATE, INSERT, DELETE, MERGE, you can use what fits your needs).
Usualy I use Lookup-component with option Redirect rows to no match output.
And after that you can use two rowsets which named Lookup No Match Output and Lookup Match Output.
PS. I have three articles about SSIS, but they in Russian (but there is a lot of SQL-scripts and pictures).
If it's interesting you, you can look the following link - https://habrahabr.ru/post/330618/
I have one database with an image table that contains just over 37,000 records. Each record contains an image in the form of binary data. I need to get all of those 37,000 records into another database containing the same table and schema that has about 12,500 records. I need to insert these images into the database with an IF NOT EXISTS approach to make sure that there are no duplicates when I am done.
I tried exporting the data into excel and format it into a script. (I have doe this before with other tables.) The thing is, excel does not support binary data.
I also tried the "generate scripts" wizard in SSMS which did not work because the .sql file was well over 18GB and my PC could not handle it.
Is there some other SQL tool to be able to do this? I have Googled for hours but to no avail. Thanks for your help!
I have used SQL Workbench/J for this.
You can either use WbExport and WbImport through text files (the binary data will be written as separate files and the text file contains the filename).
Or you can use WbCopy to copy the data directly without intermediate files.
To achieve your "if not exists" approache you could use the update/insert mode, although that would change existing row.
I don't think there is a "insert only if it does not exist mode", but you should be able to achieve this by defining a unique index and ignore errors (although that wouldn't be really fast, but should be OK for that small number of rows).
If the "exists" check is more complicated, you could copy the data into a staging table in the target database, and then use SQL to merge that into the real table.
Why don't you try the 'Export data' feature? This should work.
Right click on the source database, select 'Tasks' and then 'Export data'. Then follow the instructions. You can also save the settings and execute the task on a regular basis.
Also, the bcp.exe utility could work to read data from one database and insert into another.
However, I would recommend using the first method.
Update: In order to avoid duplicates you have to be able to compare images. Unfortunately, you cannot compare images directly. But you could cast them to varbinary(max) for comparison.
So here's my advice:
1. Copy the table to the new database under the name tmp_images
2. use the merge command to insert new images only.
INSERT INTO DB1.dbo.table_name
SELECT * FROM DB2.dbo.table_name
WHERE column_name NOT IN
(
SELECT column_name FROM DB1.dbo.table_name
)
I'm working on a data conversion utility which can push data from one master database out to a number of different databases. The utility its self will have no knowledge of how data is kept in the destination (table structure), but I would like to provide writing a SQL statement to return data from the destination using a complex SQL query with multiple join statements. As long as the data is in a standardized format that the utility can recognize (field names) in an ADO query.
What I would like to do is then modify the live data in this ADO Query. However, since there are multiple join statements, I'm not sure if it's possible to do this. I know at least with BDE (I've never used BDE), it was very strict and you had to return all fields (*) and such. ADO I know is more flexible, but I don't know quite how flexible in this case.
Is it supposed to be possible to modify data in a TADOQuery in this manner, when the results include fields from different tables? And even if so, suppose I want to append a new record to the end (TADOQuery.Append). Would it append to two different tables?
The actual primary table I'm selecting from has a complimentary table which is joined by the same primary key field, one is a "Small" table (brief info) and the other is a "Detail" table (more info for each record in Small table). So, a typical statement would include something like this:
select ts.record_uid, ts.SomeField, td.SomeOtherField from table_small ts
join table_detail td on td.record_uid = ts.record_uid
There are also a number of other joins to records in other tables, but I'm not worried about appending to those ones. I'm only worried about appending to the "Small" and "Detail" tables - at the same time.
Is such a thing possible in an ADO Query? I'm willing to tweak and modify the SQL statement in any way necessary to make this possible. I have a bad feeling though that it's not possible.
Compatibility:
SQL Server 2000 through 2008 R2
Delphi XE2
Editing these Fields which have no influence on the joins is usually no problem.
Appending is ... you can limit the Append to one of the Tables by
procedure TForm.ADSBeforePost(DataSet: TDataSet);
begin
inherited;
TCustomADODataSet(DataSet).Properties['Unique Table'].Value := 'table_small';
end;
but without an Requery you won't get much further.
The better way will be setting Values by Procedure e.g. in BeforePost, Requery and Abort.
If your View would be persistent you would be able to use INSTEAD OF Triggers
Jerry,
I encountered the same problem on FireBird, and from experience I can tell you that it can be made(up to a small complexity) by using CachedUpdates . A very good resource is this one - http://podgoretsky.com/ftp/Docs/Delphi/D5/dg/11_cache.html. This article has the answers to all your questions.
I have abandoned the original idea of live ADO query updates, as it has become more complex than I can wrap my head around. The scope of the data push project has changed, and therefore this is no longer an issue for me, however still an interesting subject to know.
The new structure of the application consists of attaching multiple "Field Links" on various fields from the original set of data. Each of these links references the original field name and a SQL Statement which is to be executed when that field is being imported. Multiple field links can be on one single field, therefore can execute multiple statements, placing the value in various tables, etc. The end goal was an app which I can easily and repeatedly export a common dataset from an original source to any outside source with different data structures, without having to recompile the app.
However the concept of cached updates was not appealing to me, simply for the fact pointed out in the link in RBA's answer that data can be changed in the database in the mean-time. So I will instead integrate my own method of customizable data pushes.
I'm trying to export some tables from SQL Server 2005 and then create those tables and populate them in Oracle.
I have about 10 tables, varying from 4 columns up to 25. I'm not using any constraints/keys so this should be reasonably straight forward.
Firstly I generated scripts to get the table structure, then modified them to conform to Oracle syntax standards (ie changed the nvarchar to varchar2)
Next I exported the data using SQL Servers export wizard which created a csv flat file. However my main issue is that I can't find a way to force SQL Server to double quote column names. One of my columns contains commas, so unless I can find a method for SQL server to quote column names then I will have trouble when it comes to importing this.
Also, am I going the difficult route, or is there an easier way to do this?
Thanks
EDIT: By quoting I'm refering to quoting the column values in the csv. For example I have a column which contains addresses like
101 High Street, Sometown, Some
county, PO5TC053
Without changing it to the following, it would cause issues when loading the CSV
"101 High Street, Sometown, Some
county, PO5TC053"
After looking at some options with SQLDeveloper, or to manually try to export/import, I found a utility on SQL Server management studio that gets the desired results, and is easy to use, do the following
Goto the source schema on SQL Server
Right click > Export data
Select source as current schema
Select destination as "Oracle OLE provider"
Select properties, then add the service name into the first box, then username and password, be sure to click "remember password"
Enter query to get desired results to be migrated
Enter table name, then click the "Edit" button
Alter mappings, change nvarchars to varchar2, and INTEGER to NUMBER
Run
Repeat process for remaining tables, save as jobs if you need to do this again in the future
Use the SQLDeveloper migration tools
I think quoting column names in oracle is something you should not use. It causes all sort of problems.
As Robert has said, I'd strongly advise agains quoting column names. The result is that you'd have to quote them not only when importing the data, but also whenever you want to reference that column in a SQL statement - and yes, that probably means in your program code as well. Building SQL statements becomes a total hassle!
From what you're writing, I'm not sure if you are referring to the column names or the data in these columns. (Can SQLServer really have a comma in the column name? I'd be really surprised if there was a good reason for that!) Quoting the column content should be done for any string-like columns (although I found that other characters usually work better as the need to "escape" quotes becomes another issue). If you're exporting in CSV that should be an option .. but then I'm not familiar with the export wizard.
Another idea for moving the data (depending on the scale of your project) would be to use an ETL/EAI tool. I've been playing around a bit with the Pentaho suite and their Kettle component. It offered a good range of options to move data from one place to another. It may be a bit oversized for a simple transfer, but if it's a big "migration" with the corresponding volume, it may be a good option.