Simple SSIS from Oracle to SQL Server drops rows - sql-server

I have a very simple SQL query in my SSIS (VS 2017) Data Flow. It connects to Oracle via Native OLE DB\Oracle Provider for OLE DB and uses SQL Command to query the Oracle view. The destination table is a SQL Server 2017 table. If I query only the first 20 columns or so (I am querying 57 columns), I get all 1,060,000ish records. As I start to add more columns, the rowcount drops. I have already removed any date fields from both tables, and have done quite a few data conversions (source table has several varchar2(4000) fields that need to be SUBSTR to reasonable lengths in the SQL destination table. All fields in the destination table are nullable. When I pull the SQL out of SSIS and run it in SQL Developer, I get the right row count. When I run it in SSIS, it drops from 1.06 M rows to around 28k. I already tried the SQLChick hack (https://www.sqlchick.com/entries/2012/9/2/resolving-missing-records-in-ssis-from-oracle-source.html) doesn't work and causes connection errors (I had to use VS Code to add that property to my Oracle connection, then when I went back to VS, the connection was broken. When opening it back up to re-enter connection credentials, the extra property gets dropped.) I have reduced and increased the Rows per Batch and Maximum insert commit size values to zero avail. I have also set the RetainSameConnection property to True for all the Connection Managers. I'm at a loss! (As you can see from the pics, both jobs finish "successfully".)
This code returns all records:
SELECT
PIDM,
STUDENT_ID,
LAST_NAME,
FIRST_NAME,
MIDDLE_NAME,
LFM_NAME,
FML_NAME,
SORT_NAME,
GENDER,
ETHNIC_CODE,
ETHNIC_CODE_DESC,
LEGACY_CODE,
LEGACY_CODE_DESC,
ADDR_STR_LINE1,
ADDR_STR_LINE2,
ADDR_STR_LINE3,
ADDR_CITY,
ADDR_COUNTY,
ADDR_STATE,
ADDR_NATION,
ADDR_ZIPCODE,
ADDR_AREA_CODE,
ADDR_PHONE
FROM <TABLE_NAME>
This code returns only 28k:
SELECT
PIDM,
STUDENT_ID,
LAST_NAME,
FIRST_NAME,
MIDDLE_NAME,
LFM_NAME,
FML_NAME,
SORT_NAME,
GENDER,
ETHNIC_CODE,
ETHNIC_CODE_DESC,
LEGACY_CODE,
LEGACY_CODE_DESC,
ADDR_STR_LINE1,
ADDR_STR_LINE2,
ADDR_STR_LINE3,
ADDR_CITY,
ADDR_COUNTY,
ADDR_STATE,
ADDR_NATION,
ADDR_ZIPCODE,
ADDR_AREA_CODE,
ADDR_PHONE,
ORIGIN_STR_LINE1,
ORIGIN_STR_LINE2,
ORIGIN_STR_LINE3,
ORIGIN_CITY,
ORIGIN_COUNTY,
ORIGIN_NATION,
ORIGIN_STATE,
ORIGIN_ZIPCODE,
EMAIL,
HIGH_SCHOOL_CODE,
HIGH_SCHOOL_CODE_DESC,
HIGH_SCHOOL_CITY,
HIGH_SCHOOL_STATE,
HIGH_SCHOOL_GPA,
HIGH_SCHOOL_RANK,
PRIOR_COLLEGE_CODE,
PRIOR_COLLEGE_CODE_DESC,
PRIOR_COLLEGE_DEGREE_CODE,
PRIOR_COLLEGE_DEGREE_CODE_DESC,
PRIOR_COLLEGE_CITY,
PRIOR_COLLEGE_STATE,
ADMIT_FLAG,
GENERAL_STUDENT_FLAG,
CURRENT_ENROLLMENT_FLAG,
LETTER_CODES,
CONTACT_CODES,
COMMENT_CODES,
DIRECTORY_EMAIL,
ADDR_DIVISION_CODE,
HIGH_SCHOOL_CLASS_SIZE,
ETHNICITY,
RACE_CODE,
REGULATORY_RACE,
INT_LANG
FROM <TABLE_NAME>
Troubleshooting steps from the comments
If you run the all column version of the query in sql developer (whatever the Oracle query tool is) using the same credentials as the SSIS package, do you get 28k rows or 1M?
1M records are returned in SQL Developer when I use the same credentials SSIS is using. –
As painful as it may be, I would add 1 column, run, observe results. The first time you see a drop in row count, interrogate the heck of the source data (data type, collation, whether some permission thing is at play). If nothing seems out of place, edit the question to include the full table definition and identify what the first source column is that is throwing the results off.
I've done that. Column by column. I've even added a column that already existed (ADDR_STR_LINE1) as ORIGIN_STR_LINE1 and just aliased it, knowing that ADDRR_STR_LINE1 had already worked and both fields shared the exact datatypes/lengths etc. I just ran it with this code:SELECT PIDM, ORIGIN_STR_LINE1, ORIGIN_STR_LINE2, ORIGIN_STR_LINE3, ORIGIN_CITY, ORIGIN_COUNTY, ORIGIN_NATION, ORIGIN_STATE, ORIGIN_ZIPCODE FROM ODSMGR.RECRUIT_PERSON_OSU and it returned 1m records.
While little, consolation, you hitting all the troubleshooting steps I'd employ. I suppose the next item I would try to rule out is some bizarre row width issue/bug. Add a new data flow. As your source query, take one of your varchar2(4000) fields and duplicate it 60 times i.e. SELECT ADDR_STR_LINE1 AS Col0, ADDR_STR_LINE1 AS Col1, ..., ADDR_STR_LINE1 As Col59 FROM Owner.Table and connect that to a Derived Column task (it doesn't need to do anything, just serve as an anchor point) and run it. Do you get 1M or 28k?
Adding more of my troubleshooting steps. 1) Created a view off the original table, casting all of the fields that would need to be truncated as VARCHAR(proper length based on dest table). 2) Added/substracted fields piecemeal, until I thought I had a stable query, knowing that if I added <this fields>, <this many rows> would be dropped. But, for instance, I added PRIOR_COLLEGE_CITY and the first time, my counts dropped from 1063202 to 952755, but then later, I ran it again, and the counts dropped from 1063202 to 953989, so even if it was a data issue (it's not) it's not a consistent one.
Once I got my 953989 rows into the destination table, I compared which PRIOR_COLLEGE_CITY records were missing. In the Source Data Flow, I explicitly queried for those records, and they loaded fine, so again, not a data issue.

According to the picture you provided, when Source component output the records, some records have lost, so we could determine that this problem occurs in Source component.
In this case, please try to check the following thing in your case.
1.Run the query(not the views but the query inside the view) in your Oracle environment when execute the query in Source component, then check whether the number of records(returned from Oracle environment)is equal to the number of records(returned from SSIS Source component). Do this on a separate data task.
2. Check if there are some changes on the source table.
3. If the returned results is correct when running the query in Oracle environment, please try to compare the correct results with the SSIS Source returned results, and analyze the missing data.

I had a similar problem, mostly with odbc driver for oracle!
The problem not only lies on the volume of or rows that returns but in my case, for some reason it grouped the values of the first column also.
The only solution I ve found is to use another driver besides odbc and oledb.
Using the native Oracle Destination and Oracle Source in VS2017 it worked perfect and also the performance was better than odbc and ole db.
enter image description here

I was having a similar issue: 1,470,491 rows in the Oracle view that I was querying, all would come across when I run the package in Visual Studio, but only 377,257 rows would be read when I ran the package from SQL Agent. I tried the SQLChick "UseSessionFormat" hack that you mentioned. While editing the connection string used by the job (it comes in from configuration) I noticed that the connection string in the package had a "USERNAME" parameter as well as a "user id" paramter, but the configuration used by SQL Agent only had "USERNAME". I added "user id" parameter to the configuration used by SQL Agent and after that, the job retrieves all 1,470,491 rows.

Related

Extract from Oracle to SQL Server causing failures and incorrect row counts

I have a SQL Server extract that is pulling data from an Oracle Database. The issue I'm having seems a bit off.
There is a single column in the query based on a left join that once I added it in, either
A) The extract just fails without giving any type of an error message in the job history
B) The wrong number of records is imported (i.e. running the query directly though oracle shows a return count of 2000 rows and I get 1895 on import).
Removing the join and the column resolves the issue, but unfortunately it is information I have to have. I've researched this some online and about the only information I found (something about adding to the connection string) did not solve the issue. I have also tried to account for any problems that could arise due to having null values in the column by adding a CASE statement in the extract query like this (is it a GUID value that is causing the problem, please note the pseudo code):
,CASE
WHEN oracle.guid IS NULL
THEN '00000000-0000-0000-0000-000000000000'
ELSE REPLACE(REPLACE(oracle.guid, '{', ''), '}', '')
END AS sql.guid
As you can see here I am performing a replace in the query as the Source value is bracketed and the destination values needs to have these removed.
This is coming from Oracle 12C to SQL Server 2016. The connection is being made with the OraOLEDB12.dll provided from the ODAC installer package.

DB tables technical info in SAP Data Dictionary

I was given a task to develop a report that would present the following details (as separate columns in ALV):
1) Name of the DB table (like MSEG, EKPO etc.)
2) Size of the DB table in megabytes
3) Number of records
4) Number of read requests performed on the table
5) Number of write requests performed on the table
There are DB* tables that contain such kind of info. Specifically I am referring to DB6PMHST and DB6HISTBS. When I try to view them via SE11 or SE16, system reports that these tables do not hold any records. I tried all three development, testing and production landscapes.
Please may you provide a guidance on what I am doing wrong? Maybe there are some other system tables that would contain the necessary info?
P.S. I tried to debug ST04 transaction in order to see the tables from which the report selects data, but wasn't able to find those lines of the source code.
I would deeply appreciate your kind assistance.
P.S.S. Checked the table MSSDBSTATT - it is empty as well (our enterprise uses MS SQL Database)
Go to SE38 and run this report RSTABLESIZE, enter table ID and see the magic.
The number of reads and writes on a table is a subject specific to the type of database (MSSQL) -> please tag your question accordingly.
If you get an answer by an MSSQL expert, which says that the data is stored in some MSSQL tables, then you cannot query those tables using "Open SQL" but you may query them using "native SQL" (i.e. EXEC SQL or ADBC for instance).

How can I minimize validation intervals when changing the SQL in ADO NET Source Tasks

Part of an SSIS package is the data import from an external database via a SQL command embedded into an ADO.NET Source Data Flow Source. Whenever I make even the slightest adjustment to the query (such as changing a column name) it takes ages (in that case 1-2 hours) until the program has finished validation. The query itself returns around 30,000 rows with 20 columns each.
Is there any way to cut these long intervals or is this something I have to live with?
I usually store the source queries in a table and the first part of my package would execute a select and store the query returned from the table in a package variable, which would then be used by the ADO.NET Source Data Flow. So In my package for the default value of the variable I usually have the query that is stored in the database along with a "where 1=2" at the end. Hence during design time it does execute the query but just returns the column metadata. Let me know if you have any questions.

Export tables from SQL Server to be imported to Oracle 10g

I'm trying to export some tables from SQL Server 2005 and then create those tables and populate them in Oracle.
I have about 10 tables, varying from 4 columns up to 25. I'm not using any constraints/keys so this should be reasonably straight forward.
Firstly I generated scripts to get the table structure, then modified them to conform to Oracle syntax standards (ie changed the nvarchar to varchar2)
Next I exported the data using SQL Servers export wizard which created a csv flat file. However my main issue is that I can't find a way to force SQL Server to double quote column names. One of my columns contains commas, so unless I can find a method for SQL server to quote column names then I will have trouble when it comes to importing this.
Also, am I going the difficult route, or is there an easier way to do this?
Thanks
EDIT: By quoting I'm refering to quoting the column values in the csv. For example I have a column which contains addresses like
101 High Street, Sometown, Some
county, PO5TC053
Without changing it to the following, it would cause issues when loading the CSV
"101 High Street, Sometown, Some
county, PO5TC053"
After looking at some options with SQLDeveloper, or to manually try to export/import, I found a utility on SQL Server management studio that gets the desired results, and is easy to use, do the following
Goto the source schema on SQL Server
Right click > Export data
Select source as current schema
Select destination as "Oracle OLE provider"
Select properties, then add the service name into the first box, then username and password, be sure to click "remember password"
Enter query to get desired results to be migrated
Enter table name, then click the "Edit" button
Alter mappings, change nvarchars to varchar2, and INTEGER to NUMBER
Run
Repeat process for remaining tables, save as jobs if you need to do this again in the future
Use the SQLDeveloper migration tools
I think quoting column names in oracle is something you should not use. It causes all sort of problems.
As Robert has said, I'd strongly advise agains quoting column names. The result is that you'd have to quote them not only when importing the data, but also whenever you want to reference that column in a SQL statement - and yes, that probably means in your program code as well. Building SQL statements becomes a total hassle!
From what you're writing, I'm not sure if you are referring to the column names or the data in these columns. (Can SQLServer really have a comma in the column name? I'd be really surprised if there was a good reason for that!) Quoting the column content should be done for any string-like columns (although I found that other characters usually work better as the need to "escape" quotes becomes another issue). If you're exporting in CSV that should be an option .. but then I'm not familiar with the export wizard.
Another idea for moving the data (depending on the scale of your project) would be to use an ETL/EAI tool. I've been playing around a bit with the Pentaho suite and their Kettle component. It offered a good range of options to move data from one place to another. It may be a bit oversized for a simple transfer, but if it's a big "migration" with the corresponding volume, it may be a good option.

Truncation error SQL server 2005 Delete/Insert

So I am trying to do a bulk insert with SSIS and continually get:
"Microsoft SQL Native Client" Hresult: 0x80004005 Description: "String or binary data would be truncated."
Even though I already have a data conversion for every column into the exact same type as the table that the rows are getting inserted into. I used a view and the data looks like it supposed to just before the DB insert step. Still get the error.
Next I went into sql server management studio and setup an insert query into that damned table and still get the same truncation error. I then did a set ANSI_WARNINGS OFF and the insert works data looks good in the table. Now when I try to delete this row I get the Truncation error.
My question besides any basic input to the situation is how can I turn off the ANSI_WARNINGS within SSIS so that the bulk load can go though?
It sounds like you have a column that is too narrow to accept the data you are submitting.
Can you verify if this is or isn't the case?
I had a very similar issue arise frequently while we were nailing down a schema with a third party.
Can you select a LEN of all of the columns in the view? That could help find the issue.
Other than that, the only way I have found is to print out a report of the actual lengths of the source data columns.
Sounds like you've got one row (possibly more, but it only takes one!) where your data value exceeds the length of the table columns. Doing a data conversion to the shorter type will MOVE the error to whatever transform does the conversion from the Destination. What I'd recommend is creating a Flat File Destination, and tying the error output of your transforms to it. Change the error result to 'Redirect Row'. This will allow all the valid rows to go through, and provide you with a copy of the row(s) that are getting truncated for you to manually handle.
Are there triggers on the table you're inserting into? Then the error may come from an action that the trigger takes.
Turns out that in SSIS you can setup the OLE DB Destination with "Data Access Mode > Table or view: Fast Mode". When I chose this setting the bulk insert went through without any warnings or errors and the data looks perfect in the database. Not sure what this change did exactly but it worked and after 16hours on one SSIS insert I'm happy with results.
Thanks for the suggestions.

Resources