I have a very simple SQL query in my SSIS (VS 2017) Data Flow. It connects to Oracle via Native OLE DB\Oracle Provider for OLE DB and uses SQL Command to query the Oracle view. The destination table is a SQL Server 2017 table. If I query only the first 20 columns or so (I am querying 57 columns), I get all 1,060,000ish records. As I start to add more columns, the rowcount drops. I have already removed any date fields from both tables, and have done quite a few data conversions (source table has several varchar2(4000) fields that need to be SUBSTR to reasonable lengths in the SQL destination table. All fields in the destination table are nullable. When I pull the SQL out of SSIS and run it in SQL Developer, I get the right row count. When I run it in SSIS, it drops from 1.06 M rows to around 28k. I already tried the SQLChick hack (https://www.sqlchick.com/entries/2012/9/2/resolving-missing-records-in-ssis-from-oracle-source.html) doesn't work and causes connection errors (I had to use VS Code to add that property to my Oracle connection, then when I went back to VS, the connection was broken. When opening it back up to re-enter connection credentials, the extra property gets dropped.) I have reduced and increased the Rows per Batch and Maximum insert commit size values to zero avail. I have also set the RetainSameConnection property to True for all the Connection Managers. I'm at a loss! (As you can see from the pics, both jobs finish "successfully".)
This code returns all records:
SELECT
PIDM,
STUDENT_ID,
LAST_NAME,
FIRST_NAME,
MIDDLE_NAME,
LFM_NAME,
FML_NAME,
SORT_NAME,
GENDER,
ETHNIC_CODE,
ETHNIC_CODE_DESC,
LEGACY_CODE,
LEGACY_CODE_DESC,
ADDR_STR_LINE1,
ADDR_STR_LINE2,
ADDR_STR_LINE3,
ADDR_CITY,
ADDR_COUNTY,
ADDR_STATE,
ADDR_NATION,
ADDR_ZIPCODE,
ADDR_AREA_CODE,
ADDR_PHONE
FROM <TABLE_NAME>
This code returns only 28k:
SELECT
PIDM,
STUDENT_ID,
LAST_NAME,
FIRST_NAME,
MIDDLE_NAME,
LFM_NAME,
FML_NAME,
SORT_NAME,
GENDER,
ETHNIC_CODE,
ETHNIC_CODE_DESC,
LEGACY_CODE,
LEGACY_CODE_DESC,
ADDR_STR_LINE1,
ADDR_STR_LINE2,
ADDR_STR_LINE3,
ADDR_CITY,
ADDR_COUNTY,
ADDR_STATE,
ADDR_NATION,
ADDR_ZIPCODE,
ADDR_AREA_CODE,
ADDR_PHONE,
ORIGIN_STR_LINE1,
ORIGIN_STR_LINE2,
ORIGIN_STR_LINE3,
ORIGIN_CITY,
ORIGIN_COUNTY,
ORIGIN_NATION,
ORIGIN_STATE,
ORIGIN_ZIPCODE,
EMAIL,
HIGH_SCHOOL_CODE,
HIGH_SCHOOL_CODE_DESC,
HIGH_SCHOOL_CITY,
HIGH_SCHOOL_STATE,
HIGH_SCHOOL_GPA,
HIGH_SCHOOL_RANK,
PRIOR_COLLEGE_CODE,
PRIOR_COLLEGE_CODE_DESC,
PRIOR_COLLEGE_DEGREE_CODE,
PRIOR_COLLEGE_DEGREE_CODE_DESC,
PRIOR_COLLEGE_CITY,
PRIOR_COLLEGE_STATE,
ADMIT_FLAG,
GENERAL_STUDENT_FLAG,
CURRENT_ENROLLMENT_FLAG,
LETTER_CODES,
CONTACT_CODES,
COMMENT_CODES,
DIRECTORY_EMAIL,
ADDR_DIVISION_CODE,
HIGH_SCHOOL_CLASS_SIZE,
ETHNICITY,
RACE_CODE,
REGULATORY_RACE,
INT_LANG
FROM <TABLE_NAME>
Troubleshooting steps from the comments
If you run the all column version of the query in sql developer (whatever the Oracle query tool is) using the same credentials as the SSIS package, do you get 28k rows or 1M?
1M records are returned in SQL Developer when I use the same credentials SSIS is using. –
As painful as it may be, I would add 1 column, run, observe results. The first time you see a drop in row count, interrogate the heck of the source data (data type, collation, whether some permission thing is at play). If nothing seems out of place, edit the question to include the full table definition and identify what the first source column is that is throwing the results off.
I've done that. Column by column. I've even added a column that already existed (ADDR_STR_LINE1) as ORIGIN_STR_LINE1 and just aliased it, knowing that ADDRR_STR_LINE1 had already worked and both fields shared the exact datatypes/lengths etc. I just ran it with this code:SELECT PIDM, ORIGIN_STR_LINE1, ORIGIN_STR_LINE2, ORIGIN_STR_LINE3, ORIGIN_CITY, ORIGIN_COUNTY, ORIGIN_NATION, ORIGIN_STATE, ORIGIN_ZIPCODE FROM ODSMGR.RECRUIT_PERSON_OSU and it returned 1m records.
While little, consolation, you hitting all the troubleshooting steps I'd employ. I suppose the next item I would try to rule out is some bizarre row width issue/bug. Add a new data flow. As your source query, take one of your varchar2(4000) fields and duplicate it 60 times i.e. SELECT ADDR_STR_LINE1 AS Col0, ADDR_STR_LINE1 AS Col1, ..., ADDR_STR_LINE1 As Col59 FROM Owner.Table and connect that to a Derived Column task (it doesn't need to do anything, just serve as an anchor point) and run it. Do you get 1M or 28k?
Adding more of my troubleshooting steps. 1) Created a view off the original table, casting all of the fields that would need to be truncated as VARCHAR(proper length based on dest table). 2) Added/substracted fields piecemeal, until I thought I had a stable query, knowing that if I added <this fields>, <this many rows> would be dropped. But, for instance, I added PRIOR_COLLEGE_CITY and the first time, my counts dropped from 1063202 to 952755, but then later, I ran it again, and the counts dropped from 1063202 to 953989, so even if it was a data issue (it's not) it's not a consistent one.
Once I got my 953989 rows into the destination table, I compared which PRIOR_COLLEGE_CITY records were missing. In the Source Data Flow, I explicitly queried for those records, and they loaded fine, so again, not a data issue.
According to the picture you provided, when Source component output the records, some records have lost, so we could determine that this problem occurs in Source component.
In this case, please try to check the following thing in your case.
1.Run the query(not the views but the query inside the view) in your Oracle environment when execute the query in Source component, then check whether the number of records(returned from Oracle environment)is equal to the number of records(returned from SSIS Source component). Do this on a separate data task.
2. Check if there are some changes on the source table.
3. If the returned results is correct when running the query in Oracle environment, please try to compare the correct results with the SSIS Source returned results, and analyze the missing data.
I had a similar problem, mostly with odbc driver for oracle!
The problem not only lies on the volume of or rows that returns but in my case, for some reason it grouped the values of the first column also.
The only solution I ve found is to use another driver besides odbc and oledb.
Using the native Oracle Destination and Oracle Source in VS2017 it worked perfect and also the performance was better than odbc and ole db.
enter image description here
I was having a similar issue: 1,470,491 rows in the Oracle view that I was querying, all would come across when I run the package in Visual Studio, but only 377,257 rows would be read when I ran the package from SQL Agent. I tried the SQLChick "UseSessionFormat" hack that you mentioned. While editing the connection string used by the job (it comes in from configuration) I noticed that the connection string in the package had a "USERNAME" parameter as well as a "user id" paramter, but the configuration used by SQL Agent only had "USERNAME". I added "user id" parameter to the configuration used by SQL Agent and after that, the job retrieves all 1,470,491 rows.
I am trying to migrate the entire backend of an Access application onto SQL Server. The first part of my project involves moving all of the tables whilst making minimum changes after the migration (no SQL Views, Pass-through queries etc. yet).
I have two queries in particular that I am using here:
ProductionSystemUnAllocatedPurchases - Which executes and returns a resultset successfully.
This is the full formula (sorry its extremely complex) for QtyAvailableOnPurchase:
QtyAvailableOnPurchase: I believe this field could be the problem here?
IIf((IIf([Outstanding Qty]>([P-ORDER-T with Qty Balance]![QTY]-[SumOfQty]),
([P-ORDER-T with Qty Balance]![QTY]-[SumOfQty]),[Outstanding Qty]))>0,
(IIf([Outstanding Qty]>([P-ORDER-T with Qty Balance]![QTY]-[SumOfQty]),([P-
ORDER-T with Qty Balance]![QTY]-[SumOfQty]),[Outstanding Qty])),0)
ProductionSystemUnAllocatedPurchasesTotal - Which gives an 'Invalid Operation' error message
Now the strange thing for me is that the first query works perfectly fine, but the second which uses the first as a source table, gives me this error message when executing. This query works perfectly fine with an access backend, but fails with SQL Server tables.
Any Ideas?
Can QtyAvailableOnPurchase be NULL? That would explain why Sum fails. Use Nz(QtyAvailableOnPurchase,0) instead.
My approach is to decompose queries. Create two queries :
First query selects needed data
Second query applies group operations (e.g. Sum)
You'll get easy way to check every step.
I have managed to find a solution to this error.
It seems that the problem is not so much with the query but rather the data type on SQL Server. SQL Server Migration Assistant (SSMA) automatically maps any Number (Double) fields to float on SQL Server. This mapping needed manually changing to Decimal.
Now according to this SO post, Decimal is the preferred for its precision up to 38 points (which is more than enough for my application), While float allows more than this, the data is stored in approximates.
Source: Difference between numeric, float and decimal in SQL Server
I'm converting a client's SSIS packages from DTS to SSIS. In one of their packages they have an execute SQL task that has a query similar to this:
SELECT * FROM [SOME_TABLE] AS ReturnValues
ORDER BY IDNumber
FOR XML AUTO, ELEMENTS
This query seems to return in a decent amount of time on the old system but on the new box it takes up to 18 minutes to run the query in SSMS. Sometimes if I run it it will generate an XML link and if I click on it to view the XML its throwing a 'System.OutOfMemoryException' and suggests increasing the number of characters retrieved from the server for XML data. I increased the option to unlimited and still getting error.
The table itself contains 220,500 rows but the query rows returned is showing 129,810 before query stops. Is this simply a matter of not having enough memory available to the system? This box has 48 GB (Win 2008 R2 EE x64), instance capped to 18GB because its shared dev environment. Any help/insight would be greatly appreciated as I don't really know XML!
When you are using SSMS to do XML queries FOR XML, it will generate all the XML and then put it into the grid and allow you to click on it. There are limits to how much data it brings back and with 220,000 rows, depending on how wide the table is, is huge and produces a lot of text.
The out of memory is the fact that SQL Server is trying to parse all of it and that is a lot of memory consumption for SSMS.
You can try to execute to a file and see what you get for size. But the major reason for running out of memory, is because that is a lot of XML and returning it to the grid, you will not get all the results all the time with this type of result set (size wise).
DBADuck (Ben)
The out of memory exception you're hitting is due to the amount of text a .net grid control can handle. 220k lines is huge! the setting in SSMS to show unlimited data is only as good as the .net control memory cap.
You coul look at removing the ELEMENTS option and look at the data in attribute format. That will decreate the amount XML "string space" returned. Personally, I prefer attributes over elements for that reason alone. Context is king, so it depends on what you're trying to accomplish (look at the data or use the data). Could youp pipe the data into an XML variable? When all is said & done, DBADuck is 100% correct in his statement.
SqlNightOwl
I'm trying to export some tables from SQL Server 2005 and then create those tables and populate them in Oracle.
I have about 10 tables, varying from 4 columns up to 25. I'm not using any constraints/keys so this should be reasonably straight forward.
Firstly I generated scripts to get the table structure, then modified them to conform to Oracle syntax standards (ie changed the nvarchar to varchar2)
Next I exported the data using SQL Servers export wizard which created a csv flat file. However my main issue is that I can't find a way to force SQL Server to double quote column names. One of my columns contains commas, so unless I can find a method for SQL server to quote column names then I will have trouble when it comes to importing this.
Also, am I going the difficult route, or is there an easier way to do this?
Thanks
EDIT: By quoting I'm refering to quoting the column values in the csv. For example I have a column which contains addresses like
101 High Street, Sometown, Some
county, PO5TC053
Without changing it to the following, it would cause issues when loading the CSV
"101 High Street, Sometown, Some
county, PO5TC053"
After looking at some options with SQLDeveloper, or to manually try to export/import, I found a utility on SQL Server management studio that gets the desired results, and is easy to use, do the following
Goto the source schema on SQL Server
Right click > Export data
Select source as current schema
Select destination as "Oracle OLE provider"
Select properties, then add the service name into the first box, then username and password, be sure to click "remember password"
Enter query to get desired results to be migrated
Enter table name, then click the "Edit" button
Alter mappings, change nvarchars to varchar2, and INTEGER to NUMBER
Run
Repeat process for remaining tables, save as jobs if you need to do this again in the future
Use the SQLDeveloper migration tools
I think quoting column names in oracle is something you should not use. It causes all sort of problems.
As Robert has said, I'd strongly advise agains quoting column names. The result is that you'd have to quote them not only when importing the data, but also whenever you want to reference that column in a SQL statement - and yes, that probably means in your program code as well. Building SQL statements becomes a total hassle!
From what you're writing, I'm not sure if you are referring to the column names or the data in these columns. (Can SQLServer really have a comma in the column name? I'd be really surprised if there was a good reason for that!) Quoting the column content should be done for any string-like columns (although I found that other characters usually work better as the need to "escape" quotes becomes another issue). If you're exporting in CSV that should be an option .. but then I'm not familiar with the export wizard.
Another idea for moving the data (depending on the scale of your project) would be to use an ETL/EAI tool. I've been playing around a bit with the Pentaho suite and their Kettle component. It offered a good range of options to move data from one place to another. It may be a bit oversized for a simple transfer, but if it's a big "migration" with the corresponding volume, it may be a good option.